Contents
1. Abstract
SAGE (Shared Adaptive Growing Experience) is a decentralized AI system where each node maintains a Neural Cellular Automata (NCA) grid as a living knowledge store. Knowledge is encoded into compact grid states (~128KB), synchronized across a peer-to-peer network as tiny diffs (~1KB), and used to provide context to small local language models. The result: an AI that runs on commodity hardware, requires no GPU, costs nothing, and gets smarter the more people run it.
This whitepaper presents the architecture, distribution protocol, security model, and early experimental results demonstrating that NCA grids can learn real statistical structure from natural language β with signal ratios up to 238.9Γ random baseline using only ~5,000 parameters.
The key insight is that NCA dynamics enable lossy knowledge compression into tiny grid states that form emergent associative structure. These states sync efficiently as sparse diffs, provide knowledge context to small transformer models, and β critically β enable emergent collective intelligence: when independent nodes learn different domains and synchronize, the merged knowledge enables answers that no single node could produce alone.
2. The Problem
2.1 AI Centralization
The current AI landscape is defined by centralization. A handful of corporations control the most capable models, gate access behind API keys and subscriptions, and treat users as data sources without agency. This creates multiple failure modes:
- Cost barriers: GPT-4-class models cost $20β200/month per user, or per-token API fees that scale unpredictably
- Privacy violations: Every conversation traverses corporate servers, subject to logging, training, and policy review
- Single points of failure: API outages, policy changes, and corporate shutdowns can eliminate access overnight
- Static knowledge: Models are frozen at training time; they don't learn from ongoing interactions
- GPU dependency: State-of-the-art models require hardware that costs tens of thousands of dollars
2.2 Why Existing Alternatives Fall Short
Local LLMs (Ollama, llama.cpp): These solve the privacy and cost problems but create new ones. Models are large (4β70GB), static (no learning), GPU-hungry for acceptable performance, and fundamentally isolated β your local model is an island that never benefits from what others learn.
Federated Learning: The academic gold standard for distributed ML, but impractical for real deployment. Federated learning requires a central parameter server to coordinate rounds, suffers from gradient leakage attacks that can reconstruct training data [1], and demands synchronous communication that doesn't survive real-world network conditions.
Blockchain AI (Bittensor, etc.): These conflate AI with cryptocurrency, introducing mining overhead, token speculation, and enormous complexity. The AI is secondary to the financial instrument. Most users want intelligence, not a portfolio.
The gap: No existing system provides AI that is simultaneously free, private, local-first, continuously learning, and collectively intelligent. SAGE fills this gap.
3. SAGE Architecture
3.1 System Overview
Each SAGE node consists of three primary components working in concert: the NCA Knowledge Grid (knowledge storage and retrieval), a small transformer model (text generation), and the gossip protocol layer (knowledge distribution).
3.2 The NCA Knowledge Grid
At the heart of SAGE is a Neural Cellular Automata grid β a 2D grid of cells where each cell contains multiple channels of information. Cells perceive their neighbors through Sobel filters and update their state through tiny neural networks, creating self-organizing, self-healing knowledge structures.
Grid specification:
- Dimensions: 256 Γ 256 cells (65,536 cells total)
- Channels per cell: 32 (24 shared + 8 private)
- Total state size: ~128KB (256 Γ 256 Γ 32 Γ Β½ byte quantized)
- Topology: Toroidal (edges wrap around)
- Parameters: ~5,000 (the NCA update rules, not the grid state)
The 32 channels are allocated across specialized roles:
| Channels | Role | Shared? | Purpose |
|---|---|---|---|
| 0β3 | Structural | Yes | Cell activation, health, connectivity |
| 4β11 | Semantic | Yes | Token embeddings, meaning vectors |
| 12β17 | Association | Yes | Cross-token relationships, co-occurrence |
| 18β21 | Temporal | Yes | Recency, confidence, decay dynamics |
| 22β23 | Provenance | Yes | Source hashing, reputation metadata |
| 24β31 | Private | No | Local-only knowledge, PII-adjacent data |
Why NCA? Four properties make NCA ideal for distributed knowledge:
Compression: 128KB captures rich associative structure that would require megabytes in a vector database.
Locality: Knowledge interactions are local (cells only see neighbors), enabling efficient partial updates.
Robustness: NCA self-repairs from partial damage β analogous to handling network churn where nodes join and leave.
Differentiability: The entire system is end-to-end trainable via standard gradient methods.
3.3 Knowledge Encoding (Text β Grid)
When a user has a conversation, SAGE encodes the knowledge into the NCA grid through a multi-step process:
- Tokenization: Text is tokenized using a BPE vocabulary (1024 tokens in current experiments)
- Spatial projection: Each token is mapped to 2D grid coordinates via a learned semantic hash. Semantically related tokens land in nearby cells
- Gaussian write: Knowledge is written to the grid as a Gaussian-weighted activation centered on the projected coordinates, with gating to prevent overwriting high-confidence existing knowledge
- NCA integration: The grid runs several NCA update steps, allowing the new knowledge to interact with and integrate into the existing knowledge landscape through local cellular dynamics
3.4 Knowledge Retrieval (Grid β Context)
When answering a question, SAGE extracts relevant knowledge from the grid:
- Query encoding: The user's question is tokenized and projected to grid coordinates, identifying regions of interest
- Attention readout: An attention mechanism reads from cells in the query's neighborhood, weighted by activation strength and confidence
- Context generation: The readout produces a context vector that is decoded into a natural language prefix
- Confidence gating: Low-confidence knowledge is suppressed β the system says "I don't know" rather than hallucinate
3.5 Hybrid Generation Pipeline
SAGE uses a hybrid architecture where the NCA grid provides knowledge and a small transformer (~100M parameters) provides language fluency. This is fundamentally different from RAG (Retrieval-Augmented Generation) in that the knowledge isn't stored as text chunks β it's encoded as emergent grid patterns that capture associative relationships.
The transformer receives the NCA context vector via cross-attention layers, allowing it to attend to grid-derived knowledge while generating fluent text. The transformer handles grammar, coherence, and style; the NCA handles facts, associations, and domain knowledge.
The long-term goal: Progressively replace the transformer with NCA computation. Our reservoir computing experiments show that a simple linear readout on frozen NCA dynamics achieves 100% top-5 next-token prediction β proving the NCA can perform the computation currently delegated to the transformer.
4. Distribution Protocol
4.1 Network Layer
SAGE nodes communicate over libp2p, the same networking stack used by IPFS and Ethereum 2.0. This provides battle-tested transport security (Noise protocol), stream multiplexing (Yamux), and NAT traversal via relay nodes.
Peer discovery uses three complementary mechanisms:
- Bootstrap nodes: Well-known entry points for initial network connection
- mDNS: Local network discovery for LAN peers (zero configuration)
- Kademlia DHT: Distributed hash table for global peer discovery
4.2 Knowledge Diffs
Instead of syncing entire grid states, SAGE nodes exchange sparse knowledge diffs β compact representations of what changed since the last sync.
A typical diff is ~1KB β three orders of magnitude smaller than even the smallest LLM weight update. This means SAGE can synchronize knowledge over dial-up-class bandwidth.
4.3 Gossip Protocol
Diffs propagate through the network via GossipSub β a pubsub protocol designed for attack resilience [2]. When a node generates a knowledge diff, it publishes to the SAGE topic. GossipSub ensures the diff reaches all subscribed nodes in O(log N) rounds, where N is the network size.
For a network of 10,000 nodes, a diff reaches full propagation in ~13 gossip rounds β typically under 30 seconds.
4.4 Merge Semantics
When a node receives a diff from the network, it must merge the incoming knowledge with its existing grid state. SAGE uses a confidence-weighted merge:
- Non-conflicting updates: If the incoming diff touches cells that the local node hasn't recently modified, the diff is applied directly with the sender's confidence values
- Conflicting updates: If both the local node and the incoming diff have modified the same cells, values are blended using a weighted average: merged = (local Γ local_confidence + remote Γ remote_confidence Γ reputation) / (local_confidence + remote_confidence Γ reputation)
- Post-merge integration: After applying the diff, the NCA runs several integration steps, allowing the new knowledge to organically merge with existing patterns through local dynamics
Knowledge history is maintained as a Merkle DAG (similar to Git), enabling efficient sync between nodes that have been offline β they compare heads, identify divergence points, and exchange only missing diffs.
5. Security & Privacy
5.1 The Lossy Compression Argument
SAGE's strongest privacy guarantee is architectural: the NCA encoding is a many-to-one function. Many different input texts produce the same or similar grid state changes. This is not a bug β it's the feature.
Consider: a 256Γ256 grid with 24 shared channels has 256 Γ 256 Γ 24 = 1,572,864 values. A typical diff modifies 50β100 cells across ~24 channels β roughly 1,200β2,400 values. But the source text that produced that diff contained 500β5,000 tokens of natural language, each carrying far more information. The encoding is lossy by design.
Information-theoretic argument: A 1KB diff contains ~8,000 bits of information. The source text that produced it contains 50,000β500,000 bits. Even with perfect analysis, an attacker cannot recover more information than the diff contains. The reconstruction problem is fundamentally underdetermined.
5.2 Anti-Poisoning Defenses
SAGE employs a four-layer defense against knowledge poisoning:
- Cryptographic identity: Every diff is signed with the author's Ed25519 key. Sybil attacks require generating many long-lived identities that build reputation independently
- Diff validation: Receiving nodes check magnitude bounds (no single cell can change by more than threshold), coverage limits (a diff can't modify more than 5% of the grid), and stability checks (the grid must remain stable after applying the diff)
- Reputation system: Nodes accumulate reputation through consistent, validated contributions. New nodes start with low trust. Reputation decays if diffs are frequently rejected by peers
- Knowledge consensus: Knowledge that is corroborated by multiple independent nodes receives a confidence boost. Isolated claims from single sources remain low-confidence and are gated during retrieval
5.3 Differential Privacy
Beyond the inherent lossyness of NCA encoding, SAGE applies calibrated Laplace noise to diffs before publication. This provides formal Ξ΅-differential privacy guarantees β an observer cannot determine whether any specific piece of information was in the training data, even with access to all published diffs.
The private channels (24β31) are never shared over the network. Users can configure additional channel filtering to control exactly what knowledge categories are shared with the network.
6. The Emergent Intelligence Experiment
The most compelling demonstration of SAGE's architecture is an experiment in emergent collective intelligence β knowledge that no single node possesses, arising from the combination of independently-learned domains.
6.1 Setup
6.2 The Key Question
Node A learned about climate change. Node B learned about coral reefs. Neither node was ever told the relationship between them. After gossip sync, can Node C β which learned nothing directly β answer the combined question?
This is the acid test for emergent intelligence. The answer must synthesize information from two independent sources, creating an association that was never explicitly taught.
6.3 Why It Should Work
The NCA architecture makes this possible through spatial proximity in the knowledge grid. When Node A encodes "ocean acidification" and Node B encodes "sensitive to pH," these concepts land in nearby grid regions (because the semantic hash maps related concepts to nearby coordinates). After merge and NCA integration steps, the local dynamics create new associations between the merged knowledge β associations that emerge from the cellular automata rules, not from any explicit programming.
This is emergent collective intelligence: knowledge that exists in the network but in no single node's training data. It arises from the interaction of independently-learned patterns through NCA dynamics. The whole becomes greater than the sum of its parts.
6.4 Early Results
Our initial experiments on smaller grids demonstrate the foundation. NCA grids trained on literary corpora show signal ratios of 25β239Γ random baseline [3], proving the architecture learns real statistical structure. Reservoir computing experiments show 100% top-5 prediction accuracy with a simple linear readout [4], proving the grid state encodes computationally useful representations.
Full multi-node emergent intelligence benchmarks are planned for Q2 2026, with results to be published in the forthcoming academic paper.
7. Roadmap
β Phase 1 β Foundation
Local NCA training, knowledge encoding/retrieval, OpenAI-compatible API, single-node chat. Reservoir computing proof-of-concept.
β Phase 2 β Networking
libp2p transport, gossip protocol, knowledge diff format, peer discovery, basic merge semantics. Multi-node sync operational.
β³ Phase 3 β Kill the LLM
Progressively replace transformer parameters with NCA computation. Hierarchical NCA grids, criticality-driven training, channel partitioning. Target: 1.7B β 500M β 100M parameter transformer.
β Phase 4 β Pure NCA Intelligence
Full text generation from NCA dynamics alone. Multi-modal knowledge (images, audio). Specialized sub-networks. Mobile nodes (phones as SAGE peers).
Research Milestones
| Milestone | Target |
|---|---|
| Multi-node emergent intelligence benchmark | Q2 2026 |
| Formal differential privacy proof | Q2 2026 |
| 1,000-node simulation study | Q3 2026 |
| Academic paper (arXiv preprint) | August 2026 |
| Conference submission (NeurIPS/ICML) | September 2026 |
8. References
- [1] Zhu, L., Liu, Z., & Han, S. "Deep Leakage from Gradients." NeurIPS, 2019. β Demonstrates gradient leakage attacks that can reconstruct training data from shared gradients in federated learning.
- [2] Vyzovitis, D., et al. "GossipSub: Attack-Resilient Message Propagation in the Filecoin and ETH2.0 Networks." 2020. β The gossip protocol SAGE uses for knowledge diff propagation.
- [3] SAGE Project. "Can Neural Cellular Automata Learn Language?" whatssage.ai/research, 2026. β NCA training experiments showing 238.9Γ signal ratio on literary corpora.
- [4] SAGE Project. "NCA as a Dynamical Reservoir." whatssage.ai/blog/reservoir-computing, 2026. β 100% top-5 prediction with linear readout on frozen NCA dynamics.
- [5] Mordvintsev, A., et al. "Growing Neural Cellular Automata." Distill, 2020. β The foundational work on differentiable NCA for self-organizing systems.
- [6] McMahan, B., et al. "Communication-Efficient Learning of Deep Networks from Decentralized Data." AISTATS, 2017. β FedAvg and the foundation of federated learning.
- [7] Dwork, C., & Roth, A. "The Algorithmic Foundations of Differential Privacy." Foundations and Trends in Theoretical Computer Science, 2014. β The theoretical foundation for SAGE's privacy guarantees.
- [8] Ramsauer, H., et al. "Hopfield Networks is All You Need." ICLR, 2021. β Modern Hopfield networks as associative memory, informing NCA grid design.
- [9] Lian, X., et al. "Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent." NeurIPS, 2017. β Theoretical foundation for gossip-based distributed optimization.
- [10] Benet, J. "IPFS β Content Addressed, Versioned, P2P File System." arXiv:1407.3561, 2014. β Content-addressed storage and Merkle DAG versioning that inspires SAGE's knowledge history.
πΏ SAGE is open source and free forever. Install it with curl -fsSL https://whatssage.ai/install.sh | bash or join the Discord to follow the research. Source code: github.com/Caryyon/sage