The Relational Model Still Matters
By G. Sawatzky, embedded-commerce.com
August 15, 2025
Why the Relational Model Still Matters
- As AI reshapes data access through natural language queries, a key question emerges: Will SQL and the relational model remain relevant?
- This deck takes a "back to fundamentals" approach, examining Codd's original design decisions and their relevance for AI systems.
- Focus: The formal relational model, not just SQL implementations.
The relational model's mathematical foundations make it uniquely suited as a semantic compilation target for AI-generated queries.
Codd's Foundational Choices
- First-Order Logic (FOL): Queries have a strong mathematical base; results are algorithmically determinable; enables automated query optimization.
- Set Theory: Relations as sets of tuples; clean, abstract data representation; removes navigational complexity.
- Mathematical Closure: Operations on relations always yield relations; crucial for building complex, compositional queries.
These choices weren't just about data storage—they aimed to build a mathematically sound information system.
Logical-Physical Independence
- Separates what you query (logical view) from how the system stores and retrieves it (physical implementation).
- Database implementers can innovate with storage and indexing without changing application code.
- Enables automated query reasoning: optimizer becomes a theorem prover finding the most efficient execution plan.
- Creates formal bridges between human intent and mechanical execution.
This separation is critical as AI systems grow in complexity and need to reason about data access patterns.
Codd's Pragmatic Choice: FOL over Second-Order Logic
- Second-order logic is undecidable: No general algorithm guarantees termination or validity determination.
- Computational complexity: Even decidable fragments often demand exponential time or space.
- FOL maps naturally to computable operations: Relations align with finite sets, quantifiers with loops, predicates with computable functions.
Note: Full FOL is only semi-decidable. The relational model uses relational algebra and relational calculus, which are decidable.
Codd prioritized computational tractability over maximum expressive power—a pragmatic choice that remains crucial for AI.
Why the Relational Model Thrives in the AI Era
Declarative Reasoning
- Express what relationships should exist without specifying how to find them.
- Clear separation between intent and computation.
- AI systems can reason about data relationships mathematically: prove query equivalences, infer constraints, optimize access patterns automatically.
Compositional Closure
- Every operation produces a result of the same type (a relation).
- Automated reasoning systems can build complex queries from simple, well-defined parts and transform them algebraically.
- Many NoSQL systems lack this mathematical closure, complicating automated processing.
Why the Relational Model Thrives in the AI Era
Formal Optimization Theory
- Query optimization isn't just heuristic—it's rooted in mathematics.
- Cost-based optimizers formally analyze equivalent expressions and select the most efficient execution strategy.
- Critical for AI systems that generate queries programmatically for complex analytical workflows.
Mathematically Rigorous Intermediate Representation
- When LLMs translate natural language to database queries, the target language's semantic clarity is paramount.
- Translating to FOL/set-based language (relational algebra) is fundamentally more reliable than translating to less-structured navigational graph query languages.
- Well-defined compositional semantics, equivalence testing, and logical completeness make it an ideal "semantic compilation target."
Refining the Model: The Third Manifesto
- Proper type system: Types aren't mere implementation details—they're logical constructs essential for formal reasoning. Distinction between scalar and nonscalar types with proper type inheritance.
- Eliminating NULL ambiguities: SQL's three-valued logic creates semantic ambiguities. For AI systems relying on precise, consistent data semantics, eliminating these is essential.
- True relational closure: Every operator strictly produces a relation, cutting out SQL's special cases and inconsistent return types.
- Orthogonality: Vital for AI systems that need to compose operations predictably.
Date and Darwen address SQL's deviations from Codd's purist vision, strengthening the relational model's relevance for automated systems.
Third Manifesto: Key Extensions
- Logical-physical independence for data modification: Treating assignment as a logical operation creates a cleaner base for reasoning about state changes.
- Tutorial D: A language that truly reflects relational principles, showing how far SQL has drifted.
- Relation-Valued Attributes (RVAs): Operate within an enriched type system where relation types are first-class citizens. Logical operations remain first-order while allowing increased expressiveness.
By preventing implementation details from "leaking" into the logical model, the Third Manifesto provides the rigorous, consistent framework AI systems need.
Overcoming Limitations: The "Bridge" Approach
- Problem: SQL has practical limits, especially regarding composability. Queries can be verbose and hard to nest, impeding automated reasoning.
- Solution: Create higher-level abstractions that compile down to SQL.
- Benefits: Leverage decades of query optimization while offering a more principled logical interface.
Stonebraker's insight: Successful database innovations often re-enter the SQL ecosystem, showing SQL's strong market pull.
Logica: A Bridge Language
- What it is: Declarative logic programming language from Google, part of the Datalog family.
- How it works: Extends classical logic programming with features like aggregation; compiles queries to SQL.
- Benefits:
- Datalog's compositional advantages: define complex queries through logical rules that combine naturally.
- Uses mature, performant SQL engines—no need to build a new database engine.
- Synergy with modern systems like DuckDB (embedded deployment, columnar performance).
The value lies in borrowing underlying ideas and principles to enhance relational systems, not wholesale adoption of specific languages.
The Relational Model: A Semantic Compass for AI's Future
- The relational model's clarity and formal properties (rooted in FOL and Set Theory) make it an ideal target for natural language to query translations.
- Contrast with graph databases: query languages often expose navigational details, making verification and optimization much harder.
- The relational model provides a mathematically rigorous intermediate representation for AI systems.
Its relevance isn't just enduring—it's growing because of AI's demands for formal, verifiable query semantics.
Training AI with a Mathematical Common Language
- Challenge: Training AI systems to generate queries for new or specialized functions, even those not yet developed.
- Solution: Instead of training LLMs on limited examples of new query languages, they can learn to translate natural language into universal mathematical expressions (set operations, logical quantifiers).
- Why it works: Mathematical concepts are abundant in academic literature and formal specifications LLMs have already seen.
- Result: This "mathematical common language" becomes a precise intermediate representation that can be algorithmically translated to relational constructs.
Pure Functions in the Relational Framework
- Any pure function (deterministic mapping from input to output without side effects) integrates seamlessly into the relational framework.
- Example: AI image recognition function
ImageClassifier(image, criteria) → label can be thought of relationally.
- AI-native operations can combine with traditional relational queries, benefiting from the model's compositional properties and optimization.
- The mathematical bridge acts as a universal adapter, letting the relational model serve as a unifying semantic layer for hybrid AI-database systems.
The Future: LLMs + Relational Model
- While SQL will remain dominant for the foreseeable future, its underlying principles could inspire new, more precise and compositionally powerful implementations.
- Codd's vision, refined by Date and Darwen, offers a semantic compass for navigating data complexity in the AI age.
- The relational model's relevance is growing because of AI's demands for:
- Formal, verifiable query semantics
- Compositional reasoning
- Automated optimization
- Mathematical rigor
References
- Codd's Original Paper: Codd, E. F. (1970). "A Relational Model of Data for Large Shared Data Banks." Communications of the ACM, 13(6), 377–387.
- The Third Manifesto: Date, C. J., & Darwen, H. (2006). Databases, Types and the Relational Model: The Third Manifesto (3rd ed.). Addison-Wesley.
- Logica Project: GitHub Repository
- Stonebraker: Stonebraker, M., et al. (2024). "What Goes Around Comes Around Redux." SIGMOD Record, 53(1).