Whitepaper — SAGE: Decentralized Collective Intelligence via Neural Cellular Automata

Abstract
The Problem
SAGE Architecture
Distribution Protocol
Security & Privacy
The Emergent Intelligence Experiment
Roadmap
References

1. Abstract

SAGE (Shared Adaptive Growing Experience) is a decentralized AI system where each node maintains a Neural Cellular Automata (NCA) grid as a living knowledge store. Knowledge is encoded into compact grid states (~128KB), synchronized across a peer-to-peer network as tiny diffs (~1KB), and used to provide context to small local language models. The result: an AI that runs on commodity hardware, requires no GPU, costs nothing, and gets smarter the more people run it.

This whitepaper presents the architecture, distribution protocol, security model, and early experimental results demonstrating that NCA grids can learn real statistical structure from natural language — with signal ratios up to 238.9× random baseline using only ~5,000 parameters.

The key insight is that NCA dynamics enable lossy knowledge compression into tiny grid states that form emergent associative structure. These states sync efficiently as sparse diffs, provide knowledge context to small transformer models, and — critically — enable emergent collective intelligence: when independent nodes learn different domains and synchronize, the merged knowledge enables answers that no single node could produce alone.

2. The Problem

2.1 AI Centralization

The current AI landscape is defined by centralization. A handful of corporations control the most capable models, gate access behind API keys and subscriptions, and treat users as data sources without agency. This creates multiple failure modes:

Cost barriers: GPT-4-class models cost $20–200/month per user, or per-token API fees that scale unpredictably
Privacy violations: Every conversation traverses corporate servers, subject to logging, training, and policy review
Single points of failure: API outages, policy changes, and corporate shutdowns can eliminate access overnight
Static knowledge: Models are frozen at training time; they don't learn from ongoing interactions
GPU dependency: State-of-the-art models require hardware that costs tens of thousands of dollars

2.2 Why Existing Alternatives Fall Short

Local LLMs (Ollama, llama.cpp): These solve the privacy and cost problems but create new ones. Models are large (4–70GB), static (no learning), GPU-hungry for acceptable performance, and fundamentally isolated — your local model is an island that never benefits from what others learn.

Federated Learning: The academic gold standard for distributed ML, but impractical for real deployment. Federated learning requires a central parameter server to coordinate rounds, suffers from gradient leakage attacks that can reconstruct training data [1], and demands synchronous communication that doesn't survive real-world network conditions.

Blockchain AI (Bittensor, etc.): These conflate AI with cryptocurrency, introducing mining overhead, token speculation, and enormous complexity. The AI is secondary to the financial instrument. Most users want intelligence, not a portfolio.

The gap: No existing system provides AI that is simultaneously free, private, local-first, continuously learning, and collectively intelligent. SAGE fills this gap.

3. SAGE Architecture

3.1 System Overview

Each SAGE node consists of three primary components working in concert: the NCA Knowledge Grid (knowledge storage and retrieval), a small transformer model (text generation), and the gossip protocol layer (knowledge distribution).

┌─────────────────────────────────────────────────────────┐ │ SAGE NODE │ │ │ │ ┌──────────────┐ ┌──────────────┐ ┌───────────┐ │ │ │ NCA Grid │◄──►│ Transformer │◄──►│ Chat UI │ │ │ │ 256 × 256 │ │ (~100M) │ │ OpenAI │ │ │ │ 32 channels │ │ │ │ compatible│ │ │ └──────┬───────┘ └──────────────┘ └───────────┘ │ │ │ │ │ ┌──────▼───────┐ │ │ │ Gossip │◄──────── libp2p ────────► Other Nodes │ │ │ Protocol │ diffs (~1KB each) │ │ └──────────────┘ │ └─────────────────────────────────────────────────────────┘

3.2 The NCA Knowledge Grid

At the heart of SAGE is a Neural Cellular Automata grid — a 2D grid of cells where each cell contains multiple channels of information. Cells perceive their neighbors through Sobel filters and update their state through tiny neural networks, creating self-organizing, self-healing knowledge structures.

Grid specification:

Dimensions: 256 × 256 cells (65,536 cells total)
Channels per cell: 32 (24 shared + 8 private)
Total state size: ~128KB (256 × 256 × 32 × ½ byte quantized)
Topology: Toroidal (edges wrap around)
Parameters: ~5,000 (the NCA update rules, not the grid state)

The 32 channels are allocated across specialized roles:

Channels	Role	Shared?	Purpose
0–3	Structural	Yes	Cell activation, health, connectivity
4–11	Semantic	Yes	Token embeddings, meaning vectors
12–17	Association	Yes	Cross-token relationships, co-occurrence
18–21	Temporal	Yes	Recency, confidence, decay dynamics
22–23	Provenance	Yes	Source hashing, reputation metadata
24–31	Private	No	Local-only knowledge, PII-adjacent data

Why NCA? Four properties make NCA ideal for distributed knowledge:

Compression: 128KB captures rich associative structure that would require megabytes in a vector database.

Locality: Knowledge interactions are local (cells only see neighbors), enabling efficient partial updates.

Robustness: NCA self-repairs from partial damage — analogous to handling network churn where nodes join and leave.

Differentiability: The entire system is end-to-end trainable via standard gradient methods.

3.3 Knowledge Encoding (Text → Grid)

When a user has a conversation, SAGE encodes the knowledge into the NCA grid through a multi-step process:

Tokenization: Text is tokenized using a BPE vocabulary (1024 tokens in current experiments)
Spatial projection: Each token is mapped to 2D grid coordinates via a learned semantic hash. Semantically related tokens land in nearby cells
Gaussian write: Knowledge is written to the grid as a Gaussian-weighted activation centered on the projected coordinates, with gating to prevent overwriting high-confidence existing knowledge
NCA integration: The grid runs several NCA update steps, allowing the new knowledge to interact with and integrate into the existing knowledge landscape through local cellular dynamics

Text Input: "Coral reefs are threatened by ocean acidification" ┌─ Tokenize ──► [coral, reefs, threatened, ocean, acidification] │ ├─ Project ───► coral→(42,87) reefs→(44,89) ocean→(50,91) │ acidification→(52,93) │ ├─ Write ────► Gaussian activations at each coordinate │ σ=2.0, gated by existing confidence │ └─ Integrate ► 10 NCA steps: new knowledge merges with existing patterns through local dynamics

3.4 Knowledge Retrieval (Grid → Context)

When answering a question, SAGE extracts relevant knowledge from the grid:

Query encoding: The user's question is tokenized and projected to grid coordinates, identifying regions of interest
Attention readout: An attention mechanism reads from cells in the query's neighborhood, weighted by activation strength and confidence
Context generation: The readout produces a context vector that is decoded into a natural language prefix
Confidence gating: Low-confidence knowledge is suppressed — the system says "I don't know" rather than hallucinate

3.5 Hybrid Generation Pipeline

SAGE uses a hybrid architecture where the NCA grid provides knowledge and a small transformer (~100M parameters) provides language fluency. This is fundamentally different from RAG (Retrieval-Augmented Generation) in that the knowledge isn't stored as text chunks — it's encoded as emergent grid patterns that capture associative relationships.

The transformer receives the NCA context vector via cross-attention layers, allowing it to attend to grid-derived knowledge while generating fluent text. The transformer handles grammar, coherence, and style; the NCA handles facts, associations, and domain knowledge.

The long-term goal: Progressively replace the transformer with NCA computation. Our reservoir computing experiments show that a simple linear readout on frozen NCA dynamics achieves 100% top-5 next-token prediction — proving the NCA can perform the computation currently delegated to the transformer.

4. Distribution Protocol

4.1 Network Layer

SAGE nodes communicate over libp2p, the same networking stack used by IPFS and Ethereum 2.0. This provides battle-tested transport security (Noise protocol), stream multiplexing (Yamux), and NAT traversal via relay nodes.

Peer discovery uses three complementary mechanisms:

Bootstrap nodes: Well-known entry points for initial network connection
mDNS: Local network discovery for LAN peers (zero configuration)
Kademlia DHT: Distributed hash table for global peer discovery

4.2 Knowledge Diffs

Instead of syncing entire grid states, SAGE nodes exchange sparse knowledge diffs — compact representations of what changed since the last sync.

Knowledge Diff Format: ┌────────────────────────────────────────────────┐ │ Header (48 bytes) │ │ ├─ Author: Ed25519 public key (32 bytes) │ │ ├─ Timestamp: Unix millis (8 bytes) │ │ ├─ Parent: blake3 hash of prior diff (4 bytes)│ │ └─ Cell count: u16 (4 bytes) │ │ │ │ Cells (variable, typically 20-100 cells) │ │ ├─ Position: (x: u8, y: u8) = 2 bytes │ │ ├─ Channel mask: u24 = 3 bytes │ │ └─ Delta values: i8[] per channel = ~24 bytes │ │ │ │ Signature: Ed25519 (64 bytes) │ │ │ │ Typical total: 500–2,000 bytes │ └────────────────────────────────────────────────┘

A typical diff is ~1KB — three orders of magnitude smaller than even the smallest LLM weight update. This means SAGE can synchronize knowledge over dial-up-class bandwidth.

4.3 Gossip Protocol

Diffs propagate through the network via GossipSub — a pubsub protocol designed for attack resilience [2]. When a node generates a knowledge diff, it publishes to the SAGE topic. GossipSub ensures the diff reaches all subscribed nodes in O(log N) rounds, where N is the network size.

For a network of 10,000 nodes, a diff reaches full propagation in ~13 gossip rounds — typically under 30 seconds.

4.4 Merge Semantics

When a node receives a diff from the network, it must merge the incoming knowledge with its existing grid state. SAGE uses a confidence-weighted merge:

Non-conflicting updates: If the incoming diff touches cells that the local node hasn't recently modified, the diff is applied directly with the sender's confidence values
Conflicting updates: If both the local node and the incoming diff have modified the same cells, values are blended using a weighted average: merged = (local × local_confidence + remote × remote_confidence × reputation) / (local_confidence + remote_confidence × reputation)
Post-merge integration: After applying the diff, the NCA runs several integration steps, allowing the new knowledge to organically merge with existing patterns through local dynamics

Knowledge history is maintained as a Merkle DAG (similar to Git), enabling efficient sync between nodes that have been offline — they compare heads, identify divergence points, and exchange only missing diffs.

5. Security & Privacy

5.1 The Lossy Compression Argument

SAGE's strongest privacy guarantee is architectural: the NCA encoding is a many-to-one function. Many different input texts produce the same or similar grid state changes. This is not a bug — it's the feature.

Consider: a 256×256 grid with 24 shared channels has 256 × 256 × 24 = 1,572,864 values. A typical diff modifies 50–100 cells across ~24 channels — roughly 1,200–2,400 values. But the source text that produced that diff contained 500–5,000 tokens of natural language, each carrying far more information. The encoding is lossy by design.

Information-theoretic argument: A 1KB diff contains ~8,000 bits of information. The source text that produced it contains 50,000–500,000 bits. Even with perfect analysis, an attacker cannot recover more information than the diff contains. The reconstruction problem is fundamentally underdetermined.

5.2 Anti-Poisoning Defenses

SAGE employs a four-layer defense against knowledge poisoning:

Cryptographic identity: Every diff is signed with the author's Ed25519 key. Sybil attacks require generating many long-lived identities that build reputation independently
Diff validation: Receiving nodes check magnitude bounds (no single cell can change by more than threshold), coverage limits (a diff can't modify more than 5% of the grid), and stability checks (the grid must remain stable after applying the diff)
Reputation system: Nodes accumulate reputation through consistent, validated contributions. New nodes start with low trust. Reputation decays if diffs are frequently rejected by peers
Knowledge consensus: Knowledge that is corroborated by multiple independent nodes receives a confidence boost. Isolated claims from single sources remain low-confidence and are gated during retrieval

5.3 Differential Privacy

Beyond the inherent lossyness of NCA encoding, SAGE applies calibrated Laplace noise to diffs before publication. This provides formal ε-differential privacy guarantees — an observer cannot determine whether any specific piece of information was in the training data, even with access to all published diffs.

The private channels (24–31) are never shared over the network. Users can configure additional channel filtering to control exactly what knowledge categories are shared with the network.

6. The Emergent Intelligence Experiment

The most compelling demonstration of SAGE's architecture is an experiment in emergent collective intelligence — knowledge that no single node possesses, arising from the combination of independently-learned domains.

6.1 Setup

Node A Node B Node C │ │ │ │ Learns: │ Learns: │ Learns: │ "Climate change │ "Coral reefs are │ (nothing — │ causes ocean │ built by tiny │ empty grid) │ acidification │ organisms that │ │ and rising sea │ are sensitive │ │ temperatures" │ to pH and │ │ │ temperature" │ │ │ │ └────── sync ────────┴────── sync ────────┘ │ After merge: Node C's grid contains BOTH knowledge domains │ Question to Node C: "How does climate change affect coral reefs?"

6.2 The Key Question

Node A learned about climate change. Node B learned about coral reefs. Neither node was ever told the relationship between them. After gossip sync, can Node C — which learned nothing directly — answer the combined question?

This is the acid test for emergent intelligence. The answer must synthesize information from two independent sources, creating an association that was never explicitly taught.

6.3 Why It Should Work

The NCA architecture makes this possible through spatial proximity in the knowledge grid. When Node A encodes "ocean acidification" and Node B encodes "sensitive to pH," these concepts land in nearby grid regions (because the semantic hash maps related concepts to nearby coordinates). After merge and NCA integration steps, the local dynamics create new associations between the merged knowledge — associations that emerge from the cellular automata rules, not from any explicit programming.

This is emergent collective intelligence: knowledge that exists in the network but in no single node's training data. It arises from the interaction of independently-learned patterns through NCA dynamics. The whole becomes greater than the sum of its parts.

6.4 Early Results

Our initial experiments on smaller grids demonstrate the foundation. NCA grids trained on literary corpora show signal ratios of 25–239× random baseline [3], proving the architecture learns real statistical structure. Reservoir computing experiments show 100% top-5 prediction accuracy with a simple linear readout [4], proving the grid state encodes computationally useful representations.

Full multi-node emergent intelligence benchmarks are planned for Q2 2026, with results to be published in the forthcoming academic paper.

7. Roadmap

✓ Phase 1 — Foundation

Complete

Local NCA training, knowledge encoding/retrieval, OpenAI-compatible API, single-node chat. Reservoir computing proof-of-concept.

✓ Phase 2 — Networking

Complete

libp2p transport, gossip protocol, knowledge diff format, peer discovery, basic merge semantics. Multi-node sync operational.

⟳ Phase 3 — Kill the LLM

Q1–Q3 2026

Progressively replace transformer parameters with NCA computation. Hierarchical NCA grids, criticality-driven training, channel partitioning. Target: 1.7B → 500M → 100M parameter transformer.

◇ Phase 4 — Pure NCA Intelligence

2027+

Full text generation from NCA dynamics alone. Multi-modal knowledge (images, audio). Specialized sub-networks. Mobile nodes (phones as SAGE peers).

Research Milestones

Milestone	Target
Multi-node emergent intelligence benchmark	Q2 2026
Formal differential privacy proof	Q2 2026
1,000-node simulation study	Q3 2026
Academic paper (arXiv preprint)	August 2026
Conference submission (NeurIPS/ICML)	September 2026

8. References

[1] Zhu, L., Liu, Z., & Han, S. "Deep Leakage from Gradients." NeurIPS, 2019. — Demonstrates gradient leakage attacks that can reconstruct training data from shared gradients in federated learning.
[2] Vyzovitis, D., et al. "GossipSub: Attack-Resilient Message Propagation in the Filecoin and ETH2.0 Networks." 2020. — The gossip protocol SAGE uses for knowledge diff propagation.
[3] SAGE Project. "Can Neural Cellular Automata Learn Language?" whatssage.ai/research, 2026. — NCA training experiments showing 238.9× signal ratio on literary corpora.
[4] SAGE Project. "NCA as a Dynamical Reservoir." whatssage.ai/blog/reservoir-computing, 2026. — 100% top-5 prediction with linear readout on frozen NCA dynamics.
[5] Mordvintsev, A., et al. "Growing Neural Cellular Automata." Distill, 2020. — The foundational work on differentiable NCA for self-organizing systems.
[6] McMahan, B., et al. "Communication-Efficient Learning of Deep Networks from Decentralized Data." AISTATS, 2017. — FedAvg and the foundation of federated learning.
[7] Dwork, C., & Roth, A. "The Algorithmic Foundations of Differential Privacy." Foundations and Trends in Theoretical Computer Science, 2014. — The theoretical foundation for SAGE's privacy guarantees.
[8] Ramsauer, H., et al. "Hopfield Networks is All You Need." ICLR, 2021. — Modern Hopfield networks as associative memory, informing NCA grid design.
[9] Lian, X., et al. "Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent." NeurIPS, 2017. — Theoretical foundation for gossip-based distributed optimization.
[10] Benet, J. "IPFS — Content Addressed, Versioned, P2P File System." arXiv:1407.3561, 2014. — Content-addressed storage and Merkle DAG versioning that inspires SAGE's knowledge history.

🌿 SAGE is open source and free forever. Install it with curl -fsSL https://whatssage.ai/install.sh | bash or join the Discord to follow the research. Source code: github.com/Caryyon/sage

SAGE: Emergent Collective Intelligence through Gossip-Synchronized Neural Cellular Automata

Contents

1. Abstract

2. The Problem

2.1 AI Centralization

2.2 Why Existing Alternatives Fall Short

3. SAGE Architecture

3.1 System Overview

3.2 The NCA Knowledge Grid

3.3 Knowledge Encoding (Text → Grid)

3.4 Knowledge Retrieval (Grid → Context)

3.5 Hybrid Generation Pipeline

4. Distribution Protocol

4.1 Network Layer

4.2 Knowledge Diffs

4.3 Gossip Protocol

4.4 Merge Semantics

5. Security & Privacy

5.1 The Lossy Compression Argument

5.2 Anti-Poisoning Defenses

5.3 Differential Privacy

6. The Emergent Intelligence Experiment

6.1 Setup

6.2 The Key Question

6.3 Why It Should Work

6.4 Early Results

7. Roadmap

✓ Phase 1 — Foundation

✓ Phase 2 — Networking

⟳ Phase 3 — Kill the LLM

◇ Phase 4 — Pure NCA Intelligence

Research Milestones

8. References