Metabox

A minimal envelope format for content-addressed records.


Abstract

Metabox is a minimal envelope format for content-addressed records. It defines eight fixed fields that answer "who said what, when" plus a body object for domain-specific payload. Records are JSONL, IDs are BLAKE3 hashes of a canonical form.

The format is designed for append-only, VCS-friendly record streams — quality attestations, dependency declarations, audit logs, or any structured signal that benefits from content addressing and a uniform envelope.

1. Envelope Fields

Every Metabox record is a JSON object with exactly eight top-level fields, in this canonical order:

# Field Type Required Description
1 metabox string yes Envelope version. Always "1".
2 type string yes Body schema identifier.
3 subject string yes What this record is about.
4 issuer string yes Who or what created this record (URI).
5 issuer_type string no Issuer classification (see 1.5).
6 created_at string yes RFC 3339 timestamp.
7 id string yes Content-addressed BLAKE3 hash (see section 3).
8 body object yes Type-specific payload.

Seven fields are required. issuer_type is optional. All eight are present in the canonical field order.

1.1 metabox

Always the string "1". This field is self-identifying — any JSON object with a metabox field is a Metabox record. Parsers SHOULD use this field to detect Metabox records in mixed streams.

The value is a string, not an integer, to allow future non-numeric versioning schemes and to avoid ambiguity with other version fields (like Qualifier's "v": 3).

1.2 type

A string that identifies the schema of the body object. Metabox itself does not define any types — they are the domain of consuming projects.

The type field is opaque to Metabox. It carries no semantics at the envelope level beyond identifying which body fields to expect.

1.3 subject

What this record is about — a file path, URL, package name, service endpoint, or any other addressable thing. The string is opaque to Metabox. Naming conventions are a project-level decision.

Examples: "src/parser.rs", "pkg:npm/lodash@4.17.21", "service/health", "https://example.com/api/v2".

1.4 issuer

Who or what created this record (URI). Typically a mailto: URI, https: service URL, or other URI-scheme identifier. The string is opaque to Metabox.

1.5 issuer_type

Optional classification of the issuer entity. When present, it is one of: "human", "ai", "tool", or "unknown". Omitted when not applicable.

This field lives in the envelope (not the body) because it answers an envelope-level question — "what kind of entity issued this record?" — and is uniform across all record types.

1.6 created_at

An RFC 3339 timestamp indicating when the record was created.

1.7 id

A lowercase hex-encoded BLAKE3 hash of the record's Metabox Canonical Form (section 3), 64 characters. Content-addressed: the same record always produces the same ID.

1.8 body

A JSON object containing the type-specific payload. The body is always present. Types with no fields use an empty object {}.

The envelope never looks inside the body. Generic Metabox tooling (indexers, replicators, filters) can operate on the envelope fields without understanding or parsing the body.

2. File Format

Metabox records are stored as JSONL — one JSON object per line, UTF-8 encoded.

Rules:

File extensions are a project-level decision. Metabox does not mandate an extension. (Qualifier uses .qual; other projects may use .jsonl, .metabox, or whatever fits their ecosystem.)

3. Metabox Canonical Form (MCF)

To ensure that every implementation — regardless of language or JSON library — produces identical bytes for the same record, the canonical serialization MUST obey the following rules:

3.1 Normalization

Before serialization:

3.2 Field Order

  1. Envelope fields appear in the fixed order defined in section 1: metabox, type, subject, issuer, issuer_type, created_at, id, body. Optional envelope fields (issuer_type) are omitted when absent.
  2. Body fields are sorted lexicographically by key. Sorting is recursive: nested objects also have their keys sorted lexicographically.

3.3 Absent Optional Body Fields

Optional body fields whose value is absent (null, None, etc.) MUST be omitted entirely from the body. Array-valued fields MUST be omitted when the array is empty. The hash changes only when a field is actually present.

3.4 Whitespace

No whitespace between tokens. No space after : or ,. No trailing newline. The output is a single compact JSON line.

3.5 String Encoding

Standard JSON escaping (RFC 8259 Section 7). Implementations MUST NOT add escapes beyond what JSON requires.

3.6 Number Encoding

Integers serialize as bare decimal with no leading zeros, no decimal point, no exponent. Negative values use a leading -.

4. Content Addressing

The record ID is computed as:

  1. Serialize the record in Metabox Canonical Form (section 3), with id set to "".
  2. Compute the BLAKE3 hash of the MCF UTF-8 bytes.
  3. Hex-encode the hash, lowercase. The result is 64 characters.

This makes IDs deterministic and content-addressed. The same logical record always produces the same ID, regardless of which implementation generates it.

5. Type System

Types are opaque strings. Metabox does not define any types, validate body schemas, or constrain what goes in body. That is the domain of consuming projects.

Conventions:

Forward compatibility: Implementations MUST preserve records with unrecognized types. Unknown records are opaque pass-through data — they MUST NOT be dropped, rewritten, or rejected.

A type specification (defined by the consuming project, not by Metabox) SHOULD define:

  1. The body fields, their types, and which are required.
  2. How the type interacts with any domain-specific logic (scoring, etc.).

6. Examples

A Qualifier attestation in Metabox format:

{"metabox":"1","type":"attestation","subject":"src/parser.rs","issuer":"mailto:alice@example.com","issuer_type":"human","created_at":"2026-02-24T10:00:00Z","id":"a1b2c3d4e5f6a7b8a1b2c3d4e5f6a7b8a1b2c3d4e5f6a7b8a1b2c3d4e5f6a7b8","body":{"kind":"concern","ref":"git:3aba500","score":-30,"summary":"Panics on malformed input"}}

Note that body fields are sorted lexicographically: kind, ref, score, summary. The issuer_type field is in the envelope, between issuer and created_at.

A minimal record with an empty body:

{"metabox":"1","type":"ping","subject":"service/health","issuer":"https://monitor.example.com","created_at":"2026-02-25T12:00:00Z","id":"f9e8d7c6b5a4f9e8d7c6b5a4f9e8d7c6b5a4f9e8d7c6b5a4f9e8d7c6b5a4f9e8","body":{}}

A dependency declaration:

{"metabox":"1","type":"dependency","subject":"bin/server","issuer":"https://build.example.com","created_at":"2026-02-25T10:00:00Z","id":"1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b3c4d5e6f1a2b","body":{"depends_on":["lib/auth","lib/http","lib/db"]}}

7. Qualifier Mapping

Qualifier v3 records map to Metabox as follows:

7.1 Frame → Envelope

Qualifier v3 Metabox Notes
v: 3 metabox: "1" Version field changes name/value
type type Unchanged
artifact subject Renamed for generality
author issuer Renamed to issuer (URI-based identity)
created_at created_at Unchanged
id id Unchanged

7.2 Body Fields → body

All non-frame fields move into the body object:

Attestation (type: "attestation"):

span, kind, score, summary, detail, suggested_fix, tags, ref, supersedesbody.*

Epoch (type: "epoch"):

span, score, summary, refsbody.*

Dependency (type: "dependency"):

depends_onbody.*

7.3 Canonical Form Differences

Qualifier (QCF) Metabox (MCF)
Per-type field order for body fields Lexicographic body field order
Body fields inlined at top level Body fields nested in body object
v: 3 in position 1 metabox: "1" in position 1
artifact in position 3 subject in position 3

8. Design Rationale

Why body, not inlined? Separation of concerns. The envelope answers "who said what, when" — generic questions that every tool can answer. The body answers domain-specific questions that only type-aware tools understand. Inlining conflates the two, which means generic tooling has to know which fields are "envelope" and which are "body" for each type. A nested body makes the boundary explicit and mechanical.

Why metabox, not v? Self-identification. Any JSON object with a metabox field is a Metabox record. This doesn't collide with existing version fields (Qualifier's v, npm's version, etc.) and lets parsers detect Metabox records in mixed or unknown streams without prior knowledge of the schema.

Why subject, not artifact? Generality. Not all records describe code artifacts. A health check pings a service endpoint. An audit record references a user action. subject is the most neutral term for "the thing this record is about."

Why lexicographic body ordering? Simplicity. Per-type canonical field orders (as in Qualifier's QCF) require every implementation to know the field order for every type. Lexicographic ordering is universal — a generic implementation can canonicalize any body without type-specific knowledge. This is the key enabler for generic Metabox tooling.

Why BLAKE3? Fast, secure, widely available, and already proven in Qualifier. The 256-bit output (64 hex chars) provides strong collision resistance while keeping IDs compact enough to display and copy.

Why JSONL? Append-only JSONL is the simplest format that is human-readable, VCS-friendly (clean diffs, useful blame), and trivial to parse in any language. It is the right format for a record stream that grows over time.

Why comments? Lines starting with // are ignored by parsers. This lets humans annotate record files with context, section headers, or notes without affecting the data. Comments are stripped during compaction.


Metabox: put your records in a box.