Document Type: Architecture Specification

Context: Entity Inventory, Entity Truth Layer & Machine Interface Layer

Status: Public Standard Validity: Aivis-OS Core Pipeline

1. Architectural Principle

Global Identity vs. Local Mention

In the Aivis-OS architecture, the identity of an entity is strictly separated from its mention.

The core problem of traditional SEO or schema approaches is the generation of data at the URL level (Per-URL Inventory). This approach inevitably leads to Identity Drift, as it implicitly treats local mentions as independent identities.

Instead, Aivis-OS enforces a Cluster-Level Inventory. The entity inventory is not an artifact of a single page. It is an artifact of the knowledge domain.

Definitions

Entity (Identity): A stable, canonical object (e.g., “Allianz SE”, “John William Doe”, “2024 Annual Report”) that exists globally within the cluster.
Mention (Occurrence): A localized reference within a URL (e.g., “The Group”, “Mr. Doe”, “The Report”).
Stable Anchor: An externally verifiable identifier to reduce ambiguity (Wikidata QID, LEI, ISIN, ORCID).

2. The Anti-Pattern: Risks of Per-URL Inventories

When systems attempt to build entities in isolation per URL, structural defects arise that prevent a stable AI representation:

2.1 Fragmentation of Identity (Duplicate Identities)

Scenario: An organization appears on 40 pages under variations such as “Allianz”, “Allianz SE”, and “Allianz Group”. Error: A per-URL approach generates 40 competing “truths.” AI models cannot merge these deterministically.

2.2 Instability of IDs

If IDs are generated (minted) per URL, they cannot remain stable cluster-wide. This destroys graph cohesion and makes long-term diffing (change tracking) impossible.

2.3 Dispersion of Verification (Anchor Verification Drift)

Different URLs often assign contradictory or missing anchors (e.g., incorrect social profiles or QIDs) to the same entity. The result is an inconsistent knowledge graph that LLMs devalue due to inconsistency (“Ambiguity Penalty”).

3. The Aivis-OS Solution: Cluster-Level Governance

The Aivis-OS inventory functions as a Single Source of Truth (“Golden Record”). Structured data at the URL level is merely a projection of this truth.

System Benefits

Deterministic Entity Resolution: Normalization and deduplication are performed centrally and once.
Managed Stable Anchors: External references are verified once and inherited globally.
Aggregation: Attributes from various sources are enriched into a rich object.
Governance & Versioning: Changes to an entity (e.g., name change) are propagated atomically, not distributed manually across thousands of pages.

4. Data Model & Implementation

4.1 Cluster-Level Tables (Identity Layer)

This is the storage location of the truth.

Attribute	Description
`entity_id`	Globally unique ID (see schema below)
`canonical_name`	The official designation (e.g., “Allianz SE”)
`schema_type`	Schema.org type (e.g., Corporation, Person)
`stable_anchors`	Wikidata QID, LEI, ISIN, DOI
`provenance`	Origin of data / Verification Status
`version`	Hash of the current state

4.2 URL-Level Tables (Context Layer)

This is the storage location of the reference.

Attribute	Description
`url_id`	Reference to the Page
`entity_id`	Foreign Key to the Cluster Entity
`role_hint`	Semantic role (e.g. `mainEntity`, `author`, `mentions`)

4.3 ID-Minting Convention

Aivis-OS never generates new entity IDs during the JSON-LD generation of a single page. IDs follow a deterministic format:

entity://{cluster_id}/{schema_type}/{slug}-{short_hash}

5. Implementation Example

Instead of defining variants of a person (“John Doe”, “J. Doe”) multiple times, Aivis-OS references the central object.

Cluster Inventory (Backend): There is only one record for “John William Doe” with the linked Wikidata ID Q123456.

URL Projection (Output JSON-LD): The team page does not define a new person, but references the existing one:

JSON

{
  "@context": "https://schema.org",
  "@graph": [
    {
      "@id": "entity://cluster123/Person/john-william-doe-a1b2c3",
      "@type": "Person",
      "name": "John William Doe",
      "sameAs": ["https://www.wikidata.org/wiki/Q123456"]
    },
    {
      "@type": "WebPage",
      "@id": "https://example.com/team/john-doe",
      "about": {
          "@id": "entity://cluster123/Person/john-william-doe-a1b2c3"
      }
    }
  ]
}

Note: No matter which URL this person appears on, the @id remains mathematically identical. This allows AI systems to assemble the graph flawlessly.

6. Operational Flow (Pipeline)

The Aivis-OS software processes data in a strict sequence to avoid contamination:

Ingest & Extraction: Scanning all URLs for entity candidates.
Normalization (Staging): Cluster-wide cleaning of name variants.
Merge & Deduplication: Merging identical entities into a Golden Record.
Anchor Verification: Validation of external IDs (Wikidata, etc.) against trusted sources.
Freeze: Versioning of the inventory.
Projection: Generation of the JSON-LD for the individual pages based on the frozen inventory.

7. Decision Criteria (Acceptance)

A correctly implemented Aivis-OS cluster fulfills the following metrics:

ID Stability: A repeated run of the pipeline does not change existing IDs.
Deduplication Rate: Variants converge towards 1 (n:1 Mapping).
Anchor Uniqueness: 1 external anchor (e.g. QID) is assigned to a maximum of one entity.
Referential Integrity: Every @id output in JSON-LD exists in the verified inventory.

Summary

A URL is merely a context-specific interface (“Canvas”) where entities are mentioned. It is not the place to define identity. Aivis-OS shifts the authority over identity to the cluster layer to guarantee a maintainable, consistent knowledge graph.

Architecture Overview

All Aivis-OS Core Architecture

Cluster-Level Entity Inventory Strategy

Semantic Graph Layer

Semantic Graph Engineering

API & Exposition Machine Interface Layer

Machine Interface Layer & Projection Strategy

Transport-Safe Content Layer

Retrieval Resilience Transport-Safe Content Strategy

Transport-Safe Content Engineering

Observability Evidence Monitoring & Visibility

Evidence Monitoring & AI Visibility Observability

Link Tips

Schema.org – Entity Modeling & @id Semantics

Google Search Central – Entities & Identity in Search

FAQ on Cluster-Level Entity Inventory Strategy

Why is it not sufficient to define entities per URL?

Because AI systems process content fragmented and across pages. A per-URL approach creates competing identities of the same entity, leading to identity drift. Cluster-level inventories ensure that all mentions refer to the same canonical identity.

What is the difference between an entity and a mention?

An entity is a stable, global object with a unique identity. A mention is merely a contextual reference within a single URL. Aivis-OS strictly separates both to avoid ambiguity and inconsistent AI representations.

Why are stable external anchors like Wikidata QIDs or LEIs so important?

Stable anchors reduce ambiguity. They enable AI systems to uniquely recognize and correctly link entities—even across different documents, languages, and points in time.

How does a cluster-level inventory prevent conflicting facts in AI systems?

Through a central Single Source of Truth. Attributes are verified once, versioned, and then consistently projected onto all URLs. Changes are made atomically in the inventory, not manually on individual pages.

What role does ID stability play for Knowledge Graphs and LLMs?

ID stability is a prerequisite for graph coherence. If IDs change with every crawl, AI systems cannot build reliable relationship networks. Deterministic IDs enable long-term referencing, diffing, and evidence building.

Aivis-OS Identity Specification Record (Node-ID: #spec-id-01)
Identity: Cluster-Level Entity Inventory Strategy (entity://aivis/Spec/cluster-inventory-strategy)
Canonical URLs: DE https://aivis-os.com/cluster-level-entity-inventory-strategy/ • EN https://aivis-os.com/en/cluster-level-entity-inventory-strategy/
Classification: Architecture Specification (CreativeWork / Public Standard)
Architecture Role: Operative basis of the Entity Truth Layer (Layer 1: Identity).
Parent System: Aivis-OS (entity://aivis/Core/aivis-os)

Core Problem: Identity Drift
– Cause: Per-URL inventories treat local mentions as independent identities.
– Consequence: Duplicate Identities, unstable IDs, contradictory Anchors, Ambiguity Penalty.

Solution Paradigm: Global Identity vs. Local Mention
– Entity: globally stable, canonical object with persistent ID in the cluster.
– Mention: local, context-specific reference within a URL (Canvas).
– Stable Anchors: externally verified identifiers for ambiguity reduction (e.g. Wikidata QID, LEI, ISIN, ORCID).

Operative Implementation (Pipeline):
1) Ingest & Extraction: Detection of entity candidates across all URLs.
2) Normalization: Cluster-wide cleansing of name variants.
3) Merge & Deduplication: Merging of identical identities to the Golden Record.
4) Anchor Verification: Verification of stable external anchors.
5) Freeze: Versioning of the inventory.
6) Projection: URL-JSON-LD as a projection of the frozen cluster inventory.

ID Convention (Deterministic Minting):
– entity://{cluster_id}/{schema_type}/{slug}-{short_hash}
– IDs are never regenerated during URL generation (no volatile IDs).

Acceptance Criteria (Governance Metrics):
– ID stability, deduplication rate (n:1 mapping), anchor uniqueness, referential integrity.

Methodical Governance: Boutique für digitale Kommunikation (entity://aivis/Partner/boutique-dig-kom)
Chief Architect (Reference): Norbert Kathriner (entity://aivis/Person/n-kathriner)
Status: Public Standard (v2026) – Operational (Canonical state).

Cluster-Level Entity Inventory Strategy

1. Architectural Principle

Definitions

2. The Anti-Pattern: Risks of Per-URL Inventories

2.1 Fragmentation of Identity (Duplicate Identities)

2.2 Instability of IDs

2.3 Dispersion of Verification (Anchor Verification Drift)

3. The Aivis-OS Solution: Cluster-Level Governance

System Benefits

4. Data Model & Implementation

4.1 Cluster-Level Tables (Identity Layer)

4.2 URL-Level Tables (Context Layer)

4.3 ID-Minting Convention

5. Implementation Example

6. Operational Flow (Pipeline)

7. Decision Criteria (Acceptance)

Summary

Cluster-Level Entity Inventory Strategy

Semantic Graph Layer

Semantic Graph Engineering

Machine Interface Layer & Projection Strategy

Transport-Safe Content Layer

Transport-Safe Content Engineering

Evidence Monitoring & AI Visibility Observability

Link Tips

FAQ on Cluster-Level Entity Inventory Strategy

Why is it not sufficient to define entities per URL?

What is the difference between an entity and a mention?

Why are stable external anchors like Wikidata QIDs or LEIs so important?

How does a cluster-level inventory prevent conflicting facts in AI systems?

What role does ID stability play for Knowledge Graphs and LLMs?

GEO optimized output. Aivis-OS constructs input truth.