Data Modeling Guide

Identity semantics, schema options, and update patterns.

Identity Model

InputLayer uses pure multiset semantics by default, where the entire tuple is the identity. This is the native model for Differential Dataflow (DD).

Default Behavior (No Schema)

Without an explicit schema, tuples are identified by all their values:

+person("alice", 30)     // Insert tuple ("alice", 30)
+person("alice", 31)     // Insert different tuple ("alice", 31)

Both tuples coexist because they are different values. There is no concept of "alice" as an entity with a mutable "age" attribute.

Implications

AspectBehavior
Tuple identityALL columns (entire tuple)
Duplicate handlingMultiset - same tuple can exist multiple times
UpdatesMust know all column values to delete

Schema Declarations

Schemas define the structure and constraints for relations.

Basic Schema

Declare a schema using typed arguments:

+person(id: int, name: string, age: int)

Update Patterns

Pattern 1: Exact Delete (Know All Values)

When you know the exact tuple to delete:

-person("alice", 30)
+person("alice", 31)

Pattern 2: Conditional Delete (Unknown Values)

When you don't know all column values, use a conditional delete:

// Delete alice regardless of age
-person("alice", Age) <- person("alice", Age)
+person("alice", 31)

Pattern 3: Atomic Update

Combine delete and insert in one atomic operation:

-person(Name, OldAge), +person(Name, NewAge) <-
  person(Name, OldAge),
  Name = "alice",
  NewAge = OldAge + 1

This executes at the same logical timestamp, ensuring atomicity.

Deletion Patterns

Delete Specific Tuple

-edge(1, 2)

Delete All Matching Tuples

// Delete all edges from node 5
-edge(5, Y) <- edge(5, Y)

// Delete all high earners
-employee(Name, Dept, Salary) <-
  employee(Name, Dept, Salary),
  Salary > 100000

Delete a Rule

To delete a persistent rule:

-reachable

This drops the rule named reachable. To delete all facts from a relation, use a conditional delete:

-person(X, Y, Z) <- person(X, Y, Z)  // Delete all tuples

To drop a relation entirely (schema + data), use the meta command:

.rel drop person

Schema Inference

When no schema is declared, it's inferred from the first insert:

+person("alice", 30)          // Inferred: person(string, int)
+person("bob", 25)            // OK: matches inferred schema
+person("charlie", "young")   // ERROR: type mismatch (string vs int)

Transient vs Persistent

Persistent Schema (+ prefix)

Stored in the database catalog:

+person(id: int, name: string, age: int)

Transient Schema (no prefix)

Session-only, cleared on database switch:

temp(x: int, y: int)
temp(1, 2)
temp(3, 4)
// Cleared when switching databases

Use transient schemas for:

  • REPL exploration with type safety
  • Temporary working data
  • Testing schema designs before persisting

Rules (Views)

Rule Identity

A view (derived relation) is identified by its head predicate name. A view contains one or more rules:

+reachable(X, Y) <- edge(X, Y)                   // Creates view, adds rule 1
+reachable(X, Y) <- reachable(X, Z), edge(Z, Y)  // Adds rule 2 to same view

Deleting Views

Delete an entire view with:

-reachable

Individual rule clauses can be removed using .rule remove:

.rule remove reachable 1   // Remove first clause of 'reachable' rule
.rule drop reachable       // Remove entire 'reachable' rule (all clauses)

Or use file-based workflow:

.rule clear reachable
.load views/reachable.idl

Session Rules

Rules without + are transient:

temp(X, Y) <- edge(X, Y), X < Y

Session rules:

  • Are not persisted
  • Are cleared on database switch
  • Support recursion (full fixed-point iteration)

File-Based Workflow

For complex views with many rules, use .idl script files:

// views/reachable.idl
+reachable(X, Y) <- edge(X, Y)
+reachable(X, Y) <- reachable(X, Z), edge(Z, Y)

Example Workflow

// Initial load
.load views/access_control.idl

// After modifying the file, clear rules first then reload
.rule clear access_control
.load views/access_control.idl

Best Practices

1. Use Explicit Schemas

Explicit schemas catch type errors early:

+employee(id: int, name: string, salary: float)

2. Use Conditional Deletes for Unknown Values

// Update all employees in a department
-employee(Id, OldDept, Name), +employee(Id, "Engineering", Name) <-
  employee(Id, OldDept, Name),
  OldDept = "Legacy"

3. Use File-Based Workflow for Complex Rules

Keep rule definitions in version-controlled files:

views/
  access_control.idl
  graph_analysis.idl
  reporting.idl

4. Use Persistent Rules for Automatic Materialization

Persistent rules are automatically materialized and updated when base data changes:

// Session rules compute fresh each query:
reachable(X, Y) <- edge(X, Y)
reachable(X, Y) <- reachable(X, Z), edge(Z, Y)

// Persistent rules materialize and cache results:
+reachable(X, Y) <- edge(X, Y)
+reachable(X, Y) <- reachable(X, Z), edge(Z, Y)

Both session and persistent rules support full recursion with fixed-point iteration.