Last updated: April 2026
Memvid is an open-source memory layer that packages data, embeddings, search indexes, and metadata into a single portable .mv2 file — replacing traditional RAG pipelines and vector databases entirely. Written in Rust (v2.0, rewritten from Python), it achieves 85.7% accuracy on the LoCoMo benchmark (+35% vs. the industry average), 0.025ms P50 retrieval latency, and supports text, PDF, images (CLIP), and audio (Whisper) — all without a database server, API keys, or cloud dependency.
If you’ve built anything with AI agents, you know the pain: every time your agent needs to remember past conversations, you end up deploying Pinecone, Weaviate, or ChromaDB — managing API keys, cloud costs, network latency, and debugging nightmares when something breaks. Memvid takes a radically different approach: what if your agent’s entire memory lived in a single file you could git commit, scp, or email?
With 13.7K GitHub stars and a complete Python-to-Rust rewrite in early 2026, Memvid has graduated from an experimental curiosity into a production-ready tool. In this guide, I break down what makes it work, when it’s the right choice, and when you should still reach for a traditional vector database.
How does Memvid actually work?
The core insight behind Memvid borrows from video encoding — not to store video, but to organize AI memory as an append-only sequence of Smart Frames. Each Smart Frame is an immutable unit that stores content along with timestamps, checksums, and metadata. Frames are grouped into segments that allow efficient compression, indexing, and parallel reads.
This design gives you three things that traditional RAG architectures struggle with: append-only writes without corrupting existing data, the ability to query past memory states (time-travel debugging), and crash safety through committed immutable frames.
Everything lives in a single .mv2 file. There are no .wal, .lock, .shm, or sidecar files — ever. Think of it like SQLite for AI memory: a complete database in one file you can copy, email, or version control, but designed specifically for semantic search and agent recall instead of relational data.
What are Smart Frames and why do they matter?
A Smart Frame is the fundamental storage unit in Memvid. It contains the actual content (text, metadata, embeddings) along with a timestamp and checksum. Unlike rows in a database that can be updated or deleted in place, Smart Frames are immutable — once committed, they never change.
This append-only design has several practical consequences for developers building AI agents:
Crash safety. If your process dies mid-write, the embedded WAL (Write-Ahead Log) ensures recovery. No corrupted indexes, no lost data. The agent picks up where it left off.
Time-travel debugging. Because frames are never modified, you can rewind to any point in time and inspect exactly what the agent “knew” at that moment. If your customer support agent gave a wrong answer on Tuesday, you replay its memory state from Tuesday to see exactly what context was retrieved and what was missing.
Auditability. In regulated industries — healthcare, legal, finance — you need a clear trail of what data informed each AI decision. Immutable frames give you that out of the box, which aligns with the traceability requirements emerging from the EU AI Act.
How does Memvid compare to traditional RAG stacks?
The typical RAG pipeline requires you to: chunk your documents, generate embeddings via an API, store them in a vector database (Pinecone, Weaviate, Qdrant, ChromaDB), write retrieval logic, and manage the entire infrastructure. Memvid collapses all of that into a library call.
Here’s where the architectures diverge:
| Dimension | Traditional RAG (Pinecone / Weaviate) | Memvid (.mv2) |
|---|---|---|
| Infrastructure | Separate database server, API keys, cloud costs | Zero infrastructure — single file |
| Deployment | Docker, Kubernetes, managed service | cargo add memvid-core or pip install memvid-sdk |
| Portability | Database export/import workflows | Copy the .mv2 file — works anywhere |
| Latency (P50) | 5–50ms (network + query) | 0.025ms (local file I/O) |
| Offline support | Requires network connection (managed) or local server | Fully offline — no internet needed |
| Search | Usually vector-only; hybrid requires extra config | Hybrid BM25 + HNSW out of the box |
| Time-travel | Not available (snapshots at best) | Built-in — rewind to any memory state |
| Concurrent writes | Full ACID transactions | Single-writer only (no concurrent writes) |
| Fine-grained deletion | Delete individual records | Append-only — deletion requires rebuild |
| Scale ceiling | Billions of vectors | Millions (single machine) |
| LoCoMo accuracy | 52.9%–66.9% (OpenAI / Mem0) | 85.7% (+35% SOTA) |
Memvid is not a Pinecone replacement for enterprise-scale production with billions of vectors, concurrent writes, and ACID transactions. It’s an excellent choice for single-user agents, offline-first applications, edge deployments, and any scenario where infrastructure overhead outweighs the benefits of a managed database. Choose the right tool for the constraint you actually have.
What benchmarks actually show (LoCoMo, latency, throughput)
Memvid’s claims are backed by reproducible benchmarks on LoCoMo — the industry-standard test for long-term conversational memory developed by Snap Research. LoCoMo evaluates how well a system can recall facts, perform multi-hop reasoning, handle temporal questions, and resist adversarial queries across very long conversations (~26K tokens each).
The headline numbers from Memvid’s open-source benchmark suite (memvidbench):
| System | LoCoMo Overall (Cat. 1–4) | Multi-hop | Temporal |
|---|---|---|---|
| Memvid v2 | 85.7% | Best-in-class | Best-in-class |
| MemU | 92.09%* | — | — |
| MemMachine | 84.87% | 72.59% | 64.58% |
| Mem0 | 66.9% | — | — |
| Zep (corrected) | 58.4% | — | — |
| OpenAI Memory | 52.9% | — | — |
*MemU’s 92.09% claim uses a different evaluation setup; direct comparison requires methodological alignment. The Mem0-vs-Zep benchmark dispute (Zep claimed 84%, Mem0 corrected to 58.4%) shows why reproducible evaluation matters.
On the latency side, Memvid reports 0.025ms P50 and 0.075ms P99, with 1,372× higher throughput than standard vector database setups. These numbers make sense because Memvid eliminates all network hops — everything happens through local file I/O with memory-mapped reads.
The AI agent memory space in 2026 resembles the vector database race of 2022 — active benchmark wars, real production pain, and consolidation approaching. The LoCoMo benchmark dispute between Mem0 and Zep (where Zep’s 84% claim was corrected to 58.4% after methodological review) demonstrates why open-source, reproducible evaluation is essential. Memvid publishes its full evaluation code in memvidbench.
From QR codes to Rust: Memvid’s evolution
The original Memvid (v1) was a Python library that used an unconventional approach — encoding text chunks into QR codes, embedding those QR codes as video frames in an MP4 file, and using FAISS for semantic search with a separate JSON index. It was clever but fragile: you needed FFmpeg, OpenCV, and codec dependencies; performance was bottlenecked by Python’s GIL; and the QR-code-in-video approach had inherent scaling limitations.
Memvid v2, released in early 2026, is a complete ground-up rewrite in Rust. The QR/video concept was replaced with a purpose-built binary format (.mv2) that stores compressed Smart Frames directly. The results are dramatic: 10–100× performance improvement, native hybrid search (BM25 via Tantivy + HNSW vectors), multi-modal support (text, PDF, images via CLIP, audio via Whisper), and optional encryption (.mv2e files).
This matters because it signals the project’s maturity. The v1 Python prototype proved the concept; v2 is an engineering product with SDKs for Rust, Node.js, and Python, a CLI, Docker support, and a cloud sandbox at sandbox.memvid.com.
Getting started: code examples
Rust (native, fastest)
Add Memvid to your Cargo.toml with the features you need:
[dependencies]
memvid-core = { version = "2.0", features = ["lex", "vec", "temporal_track"] }
Create a memory file, ingest data, and search:
use memvid_core::{Memvid, PutOptions, SearchRequest};
fn main() -> memvid_core::Result<()> {
// Create a new memory file
let mut mem = Memvid::create("agent_memory.mv2")?;
// Ingest documents with metadata and tags
let opts = PutOptions::builder()
.title("Customer Ticket #4821")
.uri("mv2://tickets/4821")
.tag("priority", "high")
.tag("customer", "acme-corp")
.build();
mem.put_bytes_with_options(
b"Customer reports login timeout after MFA upgrade...",
opts
)?;
mem.commit()?;
// Hybrid search (BM25 + vector similarity)
let results = mem.search(SearchRequest {
query: "MFA authentication timeout".into(),
top_k: 5,
snippet_chars: 200,
..Default::default()
})?;
for hit in results.hits {
println!("[{:.2}] {}", hit.score, hit.text);
}
Ok(())
}
Python SDK
pip install memvid-sdk
from memvid import Memvid
# Create or open a memory file
mem = Memvid.create("research_notes.mv2")
# Add content
mem.put("Transformer attention mechanism scales quadratically...",
title="Attention Research",
tags={"topic": "transformers", "source": "arxiv"})
mem.commit()
# Search across all indexed content
results = mem.find("attention scaling solutions", top_k=5)
for hit in results:
print(f"[{hit.score:.2f}] {hit.text[:100]}...")
Node.js SDK
import { Memvid } from '@memvid/sdk';
const mem = await Memvid.create('chatbot_memory.mv2');
await mem.put('User prefers dark mode and concise answers', {
title: 'User Preferences',
tags: { userId: 'usr_42', type: 'preference' }
});
await mem.commit();
const results = await mem.find('user display preferences');
console.log(results.hits);
Feature flags: choosing what you need
Memvid uses Rust feature flags to keep the binary lean. You enable only the capabilities your use case requires:
| Feature | What it adds | When you need it |
|---|---|---|
lex |
Full-text BM25 search (Tantivy) | Keyword search, exact-match queries |
vec |
HNSW vector similarity (ONNX local embeddings) | Semantic search without cloud APIs |
api_embed |
Cloud embeddings (OpenAI) | Higher-quality embeddings via API |
clip |
CLIP visual embeddings | Image search — computer vision use cases |
whisper |
Audio transcription (Whisper) | Ingest meetings, podcasts, voice notes |
pdf_extract |
Pure Rust PDF extraction | Ingest PDF documents without Python |
temporal_track |
Natural language date parsing | Queries like “what happened last Tuesday” |
encryption |
Password-based encryption (.mv2e) | Sensitive data, GDPR compliance |
symspell_cleanup |
PDF text repair | Fixes OCR artifacts like “emp lo yee” → “employee” |
For a typical AI agent that needs both keyword and semantic search with offline embeddings, your config would be: features = ["lex", "vec", "temporal_track"]. That gives you hybrid search with sub-millisecond retrieval and natural language date parsing — without any cloud dependency.
When should you actually use Memvid?
Based on the architecture’s strengths and limitations, Memvid fits best in these scenarios:
Single-user AI agents. A personal assistant that runs locally on a laptop. The user doesn’t want conversation history sent to a cloud database, and you don’t want to ask them to install PostgreSQL. One .mv2 file in the app directory solves it.
Offline-first and edge deployments. Field service agents, medical devices, or industrial IoT where internet connectivity is unreliable or nonexistent. The entire memory file travels with the device.
Developer tooling. Memvid’s claude-brain project gives MCP-compatible coding agents persistent memory in one file you can git commit alongside your codebase.
Prototyping and hackathons. Zero setup means you go from idea to working memory-augmented agent in minutes, not hours.
Auditable AI workflows. The immutable Smart Frame timeline provides a natural audit trail for compliance-sensitive applications — relevant as the EU AI Act’s transparency requirements take effect.
Where Memvid is the wrong choice: multi-tenant SaaS with concurrent writers, applications requiring fine-grained deletion (GDPR right-to-erasure at individual record level — though encryption capsules offer a workaround), and systems already scaling to billions of vectors on managed infrastructure.
Memvid in the wider AI memory landscape
The AI agent memory space in 2026 is fragmented, with each solution optimizing for different constraints. Understanding where Memvid sits helps you make the right architectural decision:
Mem0 combines vector search with an optional knowledge graph. It’s the most enterprise-ready option with SOC 2 compliance, managed cloud, and graph-based entity relationships. Best for: teams needing a hosted, multi-tenant memory service.
Zep focuses on temporal reasoning — tracking how facts change over time. Integrates structured business data with conversational history. Best for: enterprise scenarios requiring relationship modeling.
Hindsight separates evidence from inference using an “Opinion Network” where beliefs have confidence scores. Open-source, ships with MCP server. Best for: agents that need to distinguish what they know from what they infer.
Memvid eliminates infrastructure entirely. One file, zero servers, offline-capable. Best for: portable agents, edge deployments, developer tooling, and anyone allergic to managing database infrastructure.
The correct framing for this space isn’t “memory as storage” but “memory as a cognitive substrate.” The systems that win long-term will be the ones treating memory as a first-class cognitive architecture — combining factual recall, temporal reasoning, and belief management — not just a key-value store for past conversations. Whether Memvid, Hindsight, Memobase, or something not yet shipped becomes the defining standard — that’s the question worth watching.
FAQ
vec feature flag enabled, Memvid runs local ONNX-based text embeddings (BGE-small by default). All search — both lexical (BM25) and semantic (HNSW) — happens entirely on-device. The only feature requiring internet is api_embed, which uses OpenAI’s embedding API.Bibliography
- Memvid GitHub Repository — Official source code, documentation, and MV2 format specification (Apache 2.0)
- Memvid Official Documentation — SDK guides, API reference, and deployment patterns
- memvidbench — Open-source LoCoMo benchmark evaluation suite
- LoCoMo: Evaluating Very Long-Term Conversational Memory of LLM Agents — Snap Research benchmark paper
- Mem0 Research: AI Memory Benchmark Results — Mem0 LoCoMo evaluation and methodology
- EU Artificial Intelligence Act — Full text and compliance guidance (European Commission)
- LoCoMo benchmark paper (arXiv:2312.17487) — Original benchmark for evaluating long-term conversational memory