Global Knowledge for AI: A Database-First Approach

By G. Sawatzky, embedded-commerce.com

Sept 2025

Prefer prose? Read the full article

Key Idea: Agents Query Structured Facts

Not all knowledge belongs in a vector index or prompt.
For authoritative, current facts, agents should query DB-backed APIs at inference time.
Typed contracts (GraphQL/REST) + permissions + provenance make results reliable.

Why a Database-First Approach

Enterprise AI requires performance, integrity, and clarity of semantics.
Separation of concerns: physical, logical, conceptual layers.
Rigor at the core, flexible interfaces at the edge.

Domain-Specific Challenges

Structured data with complex constraints and policy.
Controlled exposure for partners and external systems.
Scalability trade-offs vs. web-scale, document-first approaches.

Architectural Trade-offs

Document model: flexible linking for human-readable content.
Database model: predictable performance and enforceable constraints.
Outcome: pair flexibility outside with rigor inside.

Principles: Data Independence

Clear schema evolution without breaking consumers.
Stable semantics across physical changes.
Traceability and provenance.

ORM as the Semantic Layer

Human-readable verbalizations and constraint primacy.
Conceptual clarity first; implementation agnostic.
Foundation to generate precise encodings (e.g., RDF/OWL) if needed.

LLMs + ORM Verbalizations (Hypothesis)

Verbalizations align with LLM training in natural language.
Potentially easier grounding, fewer errors vs opaque formats.
Human and machine readability converge.

Person has Name.
Employee works for Company.
Employee uses Machine on Project.   # ternary example

# Why these help LLMs
- Plain, structured sentences (predicate logic in NL)
- Stable terminology mirrors schema concepts
- Easy to map to facts, constraints, and queries
        

          From the article’s “Evidence for ORM’s Preference in AI” section: examples of ORM verbalizations used as precise, human-readable statements that LLMs can parse more reliably than opaque encodings.
        

GraphQL as a Public-Facing Interface

Strong typing, precise queries, minimal over/under-fetching.
External agents fetch fresh facts at inference time.
Contracts and permissions enforce safety and trust.

At inference time, agents query typed APIs for authoritative facts rather than relying on stale, embedded prompts.

Agent Workflow at Inference Time

Discover & understand: Locate endpoint and schema via introspection or registry; parse types, fields, constraints.
Interpret semantics: Use ORM specification (JSON-LD or verbalizations) to map task intent to fact types and roles.
Plan the query: Construct minimal query for task and permissions; prefer persisted queries or allowlists.
Execute securely: Apply depth limiting, rate limiting, field-level permissions, service-to-service auth.
Ground the answer: Use returned semantics and facts; include citations, provenance, and cache short-lived results.

GraphQL Benefits for Knowledge Systems

Precision & Type Safety: Clients request only needed fields; schemas enforce strong contracts, reducing errors.
Interoperability & Identity: ORM-based schemas provide shared semantics; federation preserves optimal internal IDs while mapping to stable external IRIs.
Federation & Distribution: Plan and execute queries across services; support cross-service joins, batching, and streaming.
Security & Provenance: Depth limiting, rate limiting, field-level permissions; audit trails and reputation systems sustain trust.

Knowledge Discoverability

Schema registries: Publish minimal metadata (fingerprints, domain tags, endpoints) so agents can locate and cluster related models.
DNS-based discovery: Advertise schema metadata via DNS TXT records or standard subdomains.
Crawling and beacons: Expose predictable endpoint patterns or HTML beacons (e.g., ``).
Conceptual alignment: Organize domains by family resemblance to promote interoperability without universal schemas.

Hybrid Intelligence

Use LLMs for language understanding and retrieval.
Use symbolic layers for constraints, logic, and explanations.
Together: reduced hallucination, better traceability, and domain expertise where training data is limited.

Centralized vs Decentralized

Decentralized web-scale data (Semantic Web) is complementary.
Authoritative single interface prioritizes integrity and performance for enterprise AI.
Both approaches are valid and serve different purposes.

Pick the right tool for the context: web-scale federation vs. high-performance authoritative sources.

Voices: Database & KR Experts

John F. Sowa: Creator of Conceptual Graphs; advocates disciplined, logically sound knowledge modeling aligned with ORM principles.
Michael Stonebraker: Turing Award winner; critiques "one size fits all" philosophy; champions specialized, high-performance database architectures.
E.S.H. Kuhn: "RDF is an encoding, not a model." Argues conceptual model must come first; RDF is one of many possible encodings.

These experts validate that robust data modeling, logical integrity, and performance are foundational for knowledge-intensive AI.

Voices: GraphQL Practitioners

Apollo GraphQL: Apollo Federation provides architectural blueprint for unified "supergraph" from disparate microservices.
Lee Byron, Nick Schrock, Dan Schafer (Facebook): Co-creators of GraphQL; established principles of strong typing and precise queries for mobile data fetching and API evolution.
Netflix & Airbnb engineers: Published extensively on using GraphQL to manage complex, distributed data landscapes with consistent interfaces.

Practical adoption at scale validates GraphQL as a reliable knowledge interface for AI agents.

Conclusion

Database-first with an ORM semantic layer is a practical path for enterprise AI.
Expose knowledge through typed GraphQL interfaces for agents to query at inference time.
Combine rigor (database principles) with flexibility (discovery, federation).
Hybrid intelligence: LLMs + symbolic knowledge = reduced hallucination and explainable AI.

Read more detail in the article: Global Knowledge for AI: A Database-First Approach