Document Type: Architecture Specification
Context: Entity Truth Layer & Machine Interface Layer
Status: Public Standard Validity: Aivis-OS Core Pipeline
1. Architectural Principle
Global Identity vs. Local Mention
In the Aivis-OS architecture, the identity of an entity is strictly separated from its mention.
The core problem of traditional SEO or schema approaches is the generation of data at the URL level (Per-URL Inventory). This approach inevitably leads to Identity Drift, as it implicitly treats local mentions as independent identities.
Instead, Aivis-OS enforces a Cluster-Level Inventory. The entity inventory is not an artifact of a single page. It is an artifact of the knowledge domain.
Definitions
- Entity (Identity): A stable, canonical object (e.g., “Allianz SE”, “John William Doe”, “2024 Annual Report”) that exists globally within the cluster.
- Mention (Occurrence): A localized reference within a URL (e.g., “The Group”, “Mr. Doe”, “The Report”).
- Stable Anchor: An externally verifiable identifier to reduce ambiguity (Wikidata QID, LEI, ISIN, ORCID).
2. The Anti-Pattern: Risks of Per-URL Inventories
When systems attempt to build entities in isolation per URL, structural defects arise that prevent a stable AI representation:
2.1 Fragmentation of Identity (Duplicate Identities)
Scenario: An organization appears on 40 pages under variations such as “Allianz”, “Allianz SE”, and “Allianz Group”. Error: A per-URL approach generates 40 competing “truths.” AI models cannot merge these deterministically.
2.2 Instability of IDs
If IDs are generated (minted) per URL, they cannot remain stable cluster-wide. This destroys graph cohesion and makes long-term diffing (change tracking) impossible.
2.3 Dispersion of Verification (Anchor Verification Drift)
Different URLs often assign contradictory or missing anchors (e.g., incorrect social profiles or QIDs) to the same entity. The result is an inconsistent knowledge graph that LLMs devalue due to inconsistency (“Ambiguity Penalty”).
3. The Aivis-OS Solution: Cluster-Level Governance
The Aivis-OS inventory functions as a Single Source of Truth (“Golden Record”). Structured data at the URL level is merely a projection of this truth.
System Benefits
- Deterministic Entity Resolution: Normalization and deduplication are performed centrally and once.
- Managed Stable Anchors: External references are verified once and inherited globally.
- Aggregation: Attributes from various sources are enriched into a rich object.
- Governance & Versioning: Changes to an entity (e.g., name change) are propagated atomically, not distributed manually across thousands of pages.
4. Data Model & Implementation
4.1 Cluster-Level Tables (Identity Layer)
This is the storage location of the truth.
| Attribute | Description |
entity_id | Globally unique ID (see schema below) |
canonical_name | The official designation (e.g., “Allianz SE”) |
schema_type | Schema.org type (e.g., Corporation, Person) |
stable_anchors | Wikidata QID, LEI, ISIN, DOI |
provenance | Origin of data / Verification Status |
version | Hash of the current state |
4.2 URL-Level Tables (Context Layer)
This is the storage location of the reference.
| Attribute | Description |
url_id | Reference to the Page |
entity_id | Foreign Key to the Cluster Entity |
role_hint | Semantic role (e.g. mainEntity, author, mentions) |
4.3 ID-Minting Convention
Aivis-OS never generates new entity IDs during the JSON-LD generation of a single page. IDs follow a deterministic format:
entity://{cluster_id}/{schema_type}/{slug}-{short_hash}
5. Implementation Example
Instead of defining variants of a person (“John Doe”, “J. Doe”) multiple times, Aivis-OS references the central object.
Cluster Inventory (Backend): There is only one record for “John William Doe” with the linked Wikidata ID Q123456.
URL Projection (Output JSON-LD): The team page does not define a new person, but references the existing one:
JSON
{
"@context": "https://schema.org",
"@graph": [
{
"@id": "entity://cluster123/Person/john-william-doe-a1b2c3",
"@type": "Person",
"name": "John William Doe",
"sameAs": ["https://www.wikidata.org/wiki/Q123456"]
},
{
"@type": "WebPage",
"@id": "https://example.com/team/john-doe",
"about": {
"@id": "entity://cluster123/Person/john-william-doe-a1b2c3"
}
}
]
}
Note: No matter which URL this person appears on, the @id remains mathematically identical. This allows AI systems to assemble the graph flawlessly.
6. Operational Flow (Pipeline)
The Aivis-OS software processes data in a strict sequence to avoid contamination:
- Ingest & Extraction: Scanning all URLs for entity candidates.
- Normalization (Staging): Cluster-wide cleaning of name variants.
- Merge & Deduplication: Merging identical entities into a Golden Record.
- Anchor Verification: Validation of external IDs (Wikidata, etc.) against trusted sources.
- Freeze: Versioning of the inventory.
- Projection: Generation of the JSON-LD for the individual pages based on the frozen inventory.
7. Decision Criteria (Acceptance)
A correctly implemented Aivis-OS cluster fulfills the following metrics:
ID Stability: A repeated run of the pipeline does not change existing IDs.
Deduplication Rate: Variants converge towards 1 (n:1 Mapping).
Anchor Uniqueness: 1 external anchor (e.g. QID) is assigned to a maximum of one entity.
Referential Integrity: Every
@idoutput in JSON-LD exists in the verified inventory.
Summary
A URL is merely a context-specific interface (“Canvas”) where entities are mentioned. It is not the place to define identity. Aivis-OS shifts the authority over identity to the cluster layer to guarantee a maintainable, consistent knowledge graph.
Architecture Overview

Cluster-Level Entity Inventory Strategy

Semantic Graph Layer

Semantic Graph Engineering

Machine Interface Layer & Projection Strategy

Transport-Safe Content Layer

Transport-Safe Content Engineering
