1. Architectural Principle

Global Identity vs. Local Mention

In the Aivis-OS architecture, the identity of an entity is strictly separated from its mention.

The core problem of traditional SEO or schema approaches is the generation of data at the URL level (Per-URL Inventory). This approach inevitably leads to Identity Drift, as it implicitly treats local mentions as independent identities.

Instead, Aivis-OS enforces a Cluster-Level Inventory. The entity inventory is not an artifact of a single page. It is an artifact of the knowledge domain.

Definitions

  • Entity (Identity): A stable, canonical object (e.g., “Allianz SE”, “John William Doe”, “2024 Annual Report”) that exists globally within the cluster.
  • Mention (Occurrence): A localized reference within a URL (e.g., “The Group”, “Mr. Doe”, “The Report”).
  • Stable Anchor: An externally verifiable identifier to reduce ambiguity (Wikidata QID, LEI, ISIN, ORCID).

2. The Anti-Pattern: Risks of Per-URL Inventories

When systems attempt to build entities in isolation per URL, structural defects arise that prevent a stable AI representation:

2.1 Fragmentation of Identity (Duplicate Identities)

Scenario: An organization appears on 40 pages under variations such as “Allianz”, “Allianz SE”, and “Allianz Group”. Error: A per-URL approach generates 40 competing “truths.” AI models cannot merge these deterministically.

2.2 Instability of IDs

If IDs are generated (minted) per URL, they cannot remain stable cluster-wide. This destroys graph cohesion and makes long-term diffing (change tracking) impossible.

2.3 Dispersion of Verification (Anchor Verification Drift)

Different URLs often assign contradictory or missing anchors (e.g., incorrect social profiles or QIDs) to the same entity. The result is an inconsistent knowledge graph that LLMs devalue due to inconsistency (“Ambiguity Penalty”).

3. The Aivis-OS Solution: Cluster-Level Governance

The Aivis-OS inventory functions as a Single Source of Truth (“Golden Record”). Structured data at the URL level is merely a projection of this truth.

System Benefits

  1. Deterministic Entity Resolution: Normalization and deduplication are performed centrally and once.
  2. Managed Stable Anchors: External references are verified once and inherited globally.
  3. Aggregation: Attributes from various sources are enriched into a rich object.
  4. Governance & Versioning: Changes to an entity (e.g., name change) are propagated atomically, not distributed manually across thousands of pages.

4. Data Model & Implementation

4.1 Cluster-Level Tables (Identity Layer)

This is the storage location of the truth.

AttributeDescription
entity_idGlobally unique ID (see schema below)
canonical_nameThe official designation (e.g., “Allianz SE”)
schema_typeSchema.org type (e.g., Corporation, Person)
stable_anchorsWikidata QID, LEI, ISIN, DOI
provenanceOrigin of data / Verification Status
versionHash of the current state

4.2 URL-Level Tables (Context Layer)

This is the storage location of the reference.

AttributeDescription
url_idReference to the Page
entity_idForeign Key to the Cluster Entity
role_hintSemantic role (e.g. mainEntity, author, mentions)

4.3 ID-Minting Convention

Aivis-OS never generates new entity IDs during the JSON-LD generation of a single page. IDs follow a deterministic format:

entity://{cluster_id}/{schema_type}/{slug}-{short_hash}

5. Implementation Example

Instead of defining variants of a person (“John Doe”, “J. Doe”) multiple times, Aivis-OS references the central object.

Cluster Inventory (Backend): There is only one record for “John William Doe” with the linked Wikidata ID Q123456.

URL Projection (Output JSON-LD): The team page does not define a new person, but references the existing one:

JSON

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@id": "entity://cluster123/Person/john-william-doe-a1b2c3",
      "@type": "Person",
      "name": "John William Doe",
      "sameAs": ["https://www.wikidata.org/wiki/Q123456"]
    },
    {
      "@type": "WebPage",
      "@id": "https://example.com/team/john-doe",
      "about": {
          "@id": "entity://cluster123/Person/john-william-doe-a1b2c3"
      }
    }
  ]
}

Note: No matter which URL this person appears on, the @id remains mathematically identical. This allows AI systems to assemble the graph flawlessly.

6. Operational Flow (Pipeline)

The Aivis-OS software processes data in a strict sequence to avoid contamination:

  1. Ingest & Extraction: Scanning all URLs for entity candidates.
  2. Normalization (Staging): Cluster-wide cleaning of name variants.
  3. Merge & Deduplication: Merging identical entities into a Golden Record.
  4. Anchor Verification: Validation of external IDs (Wikidata, etc.) against trusted sources.
  5. Freeze: Versioning of the inventory.
  6. Projection: Generation of the JSON-LD for the individual pages based on the frozen inventory.

7. Decision Criteria (Acceptance)

A correctly implemented Aivis-OS cluster fulfills the following metrics:

  • ID Stability: A repeated run of the pipeline does not change existing IDs.

  • Deduplication Rate: Variants converge towards 1 (n:1 Mapping).

  • Anchor Uniqueness: 1 external anchor (e.g. QID) is assigned to a maximum of one entity.

  • Referential Integrity: Every @id output in JSON-LD exists in the verified inventory.

Summary

A URL is merely a context-specific interface (“Canvas”) where entities are mentioned. It is not the place to define identity. Aivis-OS shifts the authority over identity to the cluster layer to guarantee a maintainable, consistent knowledge graph.

Identity & Definition Cluster-Level Entity Inventory Strategy
Cluster-Level Entity Inventory Strategy

Cluster-Level Entity Inventory Strategy

Context & Meaning Semantic Graph Engineering
Semantic Graph Layer

Semantic Graph Layer

Semantic Graph Engineering
Semantic Graph Engineering

Semantic Graph Engineering

API & Exposition Machine Interface Layer
Machine Interface Layer & Projection Strategy

Machine Interface Layer & Projection Strategy

Transport-Safe Content Layer
Transport-Safe Content Layer

Transport-Safe Content Layer

Retrieval Resilience Transport-Safe Content Strategy
Transport-Safe Content Engineering

Transport-Safe Content Engineering

Observability Evidence Monitoring & Visibility
Evidence Monitoring & AI Visibility Observability

Evidence Monitoring & AI Visibility Observability