Extended Matrix

Extended Matrix Development Projects

← Back to Roadmap
DP-61

EM Mappings Registry & Builder

Infrastructure v1.7 StratiGraph ↗ EMtoolss3Dgraphys3D config (rules)

Description

The EM Tools 1.6 PostgreSQL/PostGIS backend (PR #28) made connecting to a live archaeological database a one-click affair. Where the friction moved next is unsurprising: which mapping JSON do I pick, and what do I do when none of the built-in ones fits my project’s schema? Today s3dgraphy ships a handful of mappings — pyarchinit_us_table, pyarchinit_uss_table, pyarchinit_us_update, EMdb variants, a generic Excel adapter — and the long tail lives in private folders on lab machines. DP-61 turns that long tail into a public, discoverable, citable artifact while keeping the contribution barrier low enough that an archaeologist with a custom column layout can publish a mapping without learning git.

Architecture, in plain words. A new GitHub repo extendedmatrix/em-mappings is the single source of truth. Each accepted mapping lives in its own folder under mappings/<slug>/ and ships four files: manifest.json (metadata: title, summary, author ORCID when DP-59 is online, source DB kind, target node types, EM language version, mapping schema version, licence, keywords), mapping.json (the s3dgraphy mapping itself), README.md (one-screen narrative covering what the mapping is for and a worked example), and an optional screenshots/ folder. The repo’s main is built by a static-site generator (Astro, consistent with the main and dev sites) into mappings.extendedmatrix.org: a homepage with a filterable grid of mapping cards (by source kind, EM language version, node-type coverage, contributor), per-mapping detail pages (rendered from the README + manifest with a prominent Download mapping.json button and a copy-paste install snippet for EM Tools), and an embedded Mapping Builder that takes a CSV/SQLite/PostgreSQL schema upload and walks the user through column-to-node mapping with live validation against the s3dgraphy mapping JSON Schema. On the final step the Builder emits a downloadable .zip and a pre-filled GitHub PR URL that opens github.com/extendedmatrix/em-mappings/new/main?filename=mappings/<slug>/manifest.json&value=...; the user lands on GitHub with the manifest already filled in, reviews, and clicks Create pull request. CI on the PR runs the JSON Schema validator, checks the slug isn’t taken, asserts the s3dgraphy mapping format is recognised, and produces a comment summarising what the mapping covers. A maintainer reviews and merges; a Zenodo DOI is minted automatically (DP-57 dependency) and added back to the manifest in a follow-up commit. The published mapping is now discoverable from the website, from s3dgraphy.discover_mappings(), and from the EM Tools Browse community mappings panel (parallels DP-57’s Browse Zenodo).

Why this shape, and why not alternatives. A few options were considered and discarded in favour of the GitHub-as-registry hybrid. (Option A: bare GitHub repo, no website.) Works, but a card-less repo browsing experience is too cold for archaeologists who don’t normally read JSON files. (Option D: standalone web app with its own database and ORCID-authenticated direct upload.) Solves the friction of PR-based publishing, but introduces hosting, backup, abuse handling, and a single point of failure that competes with the rest of the EM ecosystem (already happily living on GitHub, Zenodo and static Astro sites). The GitHub-as-registry hybrid wins on three axes: zero hosting cost (GitHub + GitHub Pages), already-known review chain (PR + CI + maintainer ack), and natural fit with DP-57 (Zenodo mirror) and DP-59 (ORCID-attributed authorship). The Builder’s pre-filled-PR trick is the lever that makes this acceptable for non-developers: a non-git user goes from “I have a CSV with weird columns” to “my PR is open” in roughly three minutes without touching git locally, because the Builder produces the file in-browser and GitHub’s ?filename=&value= URL parameters do the rest.

Versioning, compatibility, evolution. The s3dgraphy mapping JSON format itself is versioned (the version field on each mapping). DP-61 elevates that into an explicit JSON Schema versioned independently from s3dgraphy proper, with a migration policy: a mapping submitted under schema v1.x stays valid forever; a mapping submitted under v2.x is annotated in the registry and EM Tools refuses to load it on s3dgraphy < the required version with a clear error pointing to the registry page. The registry index is a mappings/index.json regenerated by CI on every merge, listing every mapping with its manifest and the resolved Zenodo DOI; s3dgraphy.discover_mappings() reads this index, caches it under the user CONFIG dir, and matches against the running EM language version. Anonymous submissions remain accepted (no-account-required is non-negotiable, mirroring DP-59’s contract); ORCID-signed submissions get a small verified-author badge on the card. Discussion at review time is welcome to harden any of the open decisions called out in the notes — subdomain choice, PR review policy, builder layout — but the architecture above is the recommended starting point.

Where private mappings live, and why that matters. A failure mode of the current setup — flagged in conversation 2026-06-04 — is that EM-tools and the s3dgraphy wheel ship a small set of built-in mappings under their respective install directories; updating either piece nukes any custom mapping the user dropped alongside them. The registry is one half of the fix (community-shared mappings survive in the cloud); the other half is a local user-mapping directory outside the install tree that survives every reinstall: ~/Library/Application Support/s3dgraphy/mappings/ on macOS, ~/.config/s3dgraphy/mappings/ on Linux (XDG-compliant), %APPDATA%\s3dgraphy\mappings\ on Windows. s3dgraphy.discover_mappings(include_user_dir=True) unions in-wheel built-ins, this user-dir, and registry results into a single list; the EM Tools picker shows three badges (built-in / user / community) so the source of every choice is visible. The two complement each other — the registry is the community shelf, the user-dir is the private locker — and together they close the “I updated my add-on and lost everything” hole.

Status

Concept

Target EM Version

1.7

Impacts

EMtoolss3Dgraphys3D config (rules)

Components

  • mappings.extendedmatrix.org — Astro static site with per-mapping cards (filterable by source-DB kind, target-node coverage, EM version)
  • extendedmatrix/em-mappings — canonical GitHub repo, one folder per mapping (manifest.json + mapping.json + README.md + screenshots/)
  • Mapping manifest schema (manifest.json): title, summary, author (ORCID, when DP-59 lands), source kind (sqlite/postgres/csv/xlsx), target node types covered, EM language version, mapping schema version, licence (default CC-BY-4.0), keywords, screenshots, demo data link
  • Web-based Mapping Builder — client-side single-page tool that loads a CSV/SQLite schema, walks the user through column → node-type mapping, validates against the s3dgraphy mapping JSON schema, and emits a downloadable .zip + a pre-filled GitHub PR URL
  • Three-tier mapping discovery — built-in (ships with s3dgraphy wheel), user-dir (local cross-platform config folder that survives EM-tools reinstall), registry-fetched (cached + remote from the registry site). EM Tools picker shows the three tiers with separate badges (built-in / user / community). Reinstalling EM-tools or the s3dgraphy wheel never touches the user-dir.
  • User-mapping directory — cross-platform convention: macOS `~/Library/Application Support/s3dgraphy/mappings/`, Linux `$XDG_CONFIG_HOME/s3dgraphy/mappings/` (default `~/.config/s3dgraphy/mappings/`), Windows `%APPDATA%\s3dgraphy\mappings\`. Created lazily on first user-dir add. Documented as 'where to drop your private mappings so they survive every update'.
  • s3dgraphy mappings/ directory restructured into an in-repo subset (built-in mappings) + a `discover_mappings(include_user_dir=True, include_registry=...)` API that unions the three tiers
  • discover_mappings() in s3dgraphy: in-wheel built-ins, optional HTTP fetch of registry index with local cache, optional walk of the user-mapping dir; semver-aware compatibility matching against the running EM language version
  • EM Tools 'Browse community mappings' panel — parallels DP-57's 'Browse Zenodo'. Lists registry mappings, filters by source kind / language version, one-click 'install into project'. Also surfaces a 'Open user mappings folder' shortcut that pops the local user-dir in Finder/Explorer.
  • Per-mapping Zenodo DOI minting on registry merge (DP-57 dependency) — each accepted mapping gets a citable archive entry under the EM community, with its em_metadata.json
  • ORCID-attributed authorship of submitted mappings (DP-59 dependency) — anonymous submissions allowed via PR, ORCID-signed submissions get a verified-author badge
  • Mapping-schema versioning and validator: a JSON Schema for the s3dgraphy mapping format itself, executed on every PR by GitHub Actions; bumps require explicit schema-version bumps with a migration note
  • Optional mirror to a single .zip pack for offline / air-gapped use (downloadable from the registry site)

Key Study

Needed

Notes

Triggered by EM Tools 1.6 PR #28 (PostgreSQL/PostGIS backend for pyArchInit imports, Enzo Cocca) and the Sub-3 reverse-export work that will follow on the same umbrella issue #27. Now that connecting to *live* archaeological databases is one click away, the bottleneck moves to the *mapping JSONs* that describe how to translate each project's source schema into EM graph nodes. Today those mappings ship as a tiny built-in set inside s3dgraphy (pyarchinit_us_table, pyarchinit_uss_table, pyarchinit_us_update, EMdb variants, generic Excel) — the long tail (regional pyArchInit forks, custom QGIS layouts, lab-specific Excel templates) lives in private folders. The registry turns that long tail into a discoverable, citable, peer-reviewable artifact. Dependencies and alignments: (a) DP-57 Zenodo automation — the registry mirrors accepted mappings to Zenodo for citation and FAIR archival. (b) DP-59 ORCID identity — verified-author badge, anonymous fallback always allowed. (c) DP-08 Subjectivity Project / DP-09 Vocabulary Project — the mapping JSONs are the place where vocabulary alignment between EM and external schemas becomes machine-readable; the registry is the natural place to surface alignment statistics across mappings. Open decisions (call out in conversation when this DP is reviewed): (1) subdomain — proposing `mappings.extendedmatrix.org`; alternatives `registry.extendedmatrix.org` (broader, could also host yEd palettes / XLSX templates / s3dgraphy plugins later) or `bridge.extendedmatrix.org` (since mappings *bridge* external data into EM). (2) hosting — GitHub Pages with custom domain (zero-cost, mirrors the main site flow) vs. SSR on Cloudflare Pages (allows server-side mapping validation as a paid feature for unverified PRs). (3) PR review policy — auto-merge after CI passes vs. one maintainer ack required vs. ORCID-verified author only. (4) builder embed — single-page Astro island vs. separate static page. Recommendation: start with GitHub-as-registry + static cards site + embedded builder (Option B in the body) before considering anything heavier.