1. Architectural Problem

Probabilistic Output & the Ranking Fallacy

Conventional monitoring approaches (rankings, share of voice, position tracking) are based on the assumption of deterministic result lists.

However, generative AI systems (LLMs, Answer Engines) do not generate lists, but probabilistic answers based on vector space proximity, evidence density, and contextual coherence.

It follows that:

  • “Positions” do not exist.
  • Repeatability is not guaranteed.
  • Visibility is a state, not a place.

Monitoring that exclusively analyzes the output textually (e.g., keyword matching) is subject to three systematic blind spots:

  • Evidence Blindness: Correct answers may be based on guessing rather than knowledge.
  • Semantic Blindness: Structural errors (incorrect relations) remain undetected as long as entities are named.
  • Numerical Blindness: Numbers, time periods, and quotas are not reliably validated.

Conclusion: Output is a symptom, not a foundation. Aivis-OS defines monitoring not as ranking control, but as Structural Integrity Testing.

2. Monitoring Objective

The goal of Evidence Monitoring is not visibility, but semantic stability under probabilistic retrieval.

The measurement is not whether a company is mentioned, but how stable, correct, and verifiable its digital representation is retrievable.

3. The four dimensions of visibility

(4 Dimensions of AI Visibility)

Aivis-OS measures visibility along four qualitative states of entity representation.

3.1 Attribution Stability

(Identity Check)

Definition: The ability of the model to assign a fact to the correct entity without the entity being explicitly mentioned in the prompt (Zero-Mention Prompting).

Test: “Who offers a solution for problem X?”

Success: The correct entity is named.

Warning signal:

  • Competitors are named
  • Generic actors are hallucinated

Architectural Significance: Indicator of the strength of semantic vectorization and identity anchoring.

3.2 Entity Logic Integrity

(Relationship Check)

Definition: The correctness of the relations between entities reconstructed in the model.

Test:
“Which products belong to [brand]?”
“Who is a partner in the joint venture [name]?”

Success: Correct resolution of the edges modeled in the Semantic Graph.

Warning signal:

  • Identity Drift
  • Mixing with competitors
  • Disambiguation errors

3.3 Evidence Consistency

(Proof Check)

Definition: The ability of the model to support statements with explicit, verifiable sources.

Test: “Name the source for this statement.”

Success: The model provides a URL or document that is defined as a Source of Truth in the Inventory.

Warning signal:

  • Correct statement without source
  • Hallucinated sources
  • Non-existent or outdated URLs

3.4 Temporal & Numerical Precision

(Fact Check)

Definition:
Accuracy with non-linguistic data such as numbers, dates, quotas, or time periods.

Test:
“What was the revenue in 2023?”
“When was product X launched?”

Success:
Exact match with the Transport-Safe Content.

Warning signal:

  • Approximated values
  • Outdated data
  • Statistically plausible but factually incorrect numbers (Token Hallucinations)

4. Test Methodology

The Iceberg Model

Aivis-OS uses a Dual-Layer Probing System to differentiate superficial visibility from structural resilience.

4.1 Layer A – User Simulation Prompts

(Surface)

Objective: Simulation of real usage scenarios.

Characteristic:

  • Short
  • Unclear
  • Context-poor

Metric: Recall Rate (is the entity found at all?)

Example: “Best software for compliance?”

4.2 Layer B – Forensic Prompts

(Foundation)

Objective:
Verification of the semantic mechanics.

Characteristic:

  • Structured
  • Evidence-focused
  • Adversarial

Metrics:

  • Accuracy
  • Citation Rate

Example: “List all compliance modules from [brand] with release date and link the documentation.”

4.3 The Integrity Gap

The difference between Layer A and Layer B is the central KPI.

  • Case 1: User good · Forensic bad → Bubble Visibility (unstable)
  • Case 2: User bad · Forensic good → Hidden Potential (architecture present, transport weak)
  • Case 3: Both good → Aivis Certified Visibility

5. Scoring Model

Source Anchoring Score (SAS)

Linear rankings are replaced by the Source Anchoring Score (0.0 – 1.0).

Calculation:

SAS = Attribution_Weight × Integrity_Weight × Citation_Rate

Interpretation:

  • SAS < 0.5
    Critical instability – the model is guessing.
  • SAS ≥ 0.9
    Deterministic anchoring – the model “knows.”

6. Feedback Loop

Monitoring as Remediation Trigger

In Aivis-OS, monitoring is not a reporting artifact, but a trigger for architectural corrections.

Error patternArchitectural correction
Incorrect sourceVerification of the sameAs links in the Semantic Graph
Incorrect numbersRevision of the Transport-Safe Content Structure
Missing hierarchyHardening of the JSON-LD @graph nesting in the MIL

Each monitoring finding can be traced back to a specific layer.

Summary

The concept of ranking is epistemically unusable in LLM systems. Aivis-OS replaces the hunt for positions with the securing of source anchoring. Evidence Monitoring does not check whether a brand is “at the top”, but whether its digital representation structurally survives probabilistic retrieval unscathed.

Identity & Definition Cluster-Level Entity Inventory Strategy
Cluster-Level Entity Inventory Strategy

Cluster-Level Entity Inventory Strategy

Context & Meaning Semantic Graph Engineering
Semantic Graph Layer

Semantic Graph Layer

Semantic Graph Engineering
Semantic Graph Engineering

Semantic Graph Engineering

API & Exposition Machine Interface Layer
Machine Interface Layer & Projection Strategy

Machine Interface Layer & Projection Strategy

Transport-Safe Content Layer
Transport-Safe Content Layer

Transport-Safe Content Layer

Retrieval Resilience Transport-Safe Content Strategy
Transport-Safe Content Engineering

Transport-Safe Content Engineering

Observability Evidence Monitoring & Visibility
Evidence Monitoring & AI Visibility Observability

Evidence Monitoring & AI Visibility Observability