A JavaScript interface for chroma
chromadb is the official JavaScript/TypeScript client for ChromaDB, an open-source vector database designed specifically for AI and large language model applications. It provides a straightforward API for storing, managing, and querying high-dimensional embeddings alongside their associated documents and metadata. With over 170,000 weekly downloads, it has become a go-to solution for developers building semantic search, retrieval-augmented generation (RAG), and recommendation systems.
The package supports multiple deployment modes: ephemeral in-memory instances for prototyping, persistent local storage using DuckDB, client-server architectures for production workloads, and managed Chroma Cloud deployments. This flexibility lets you start development locally without infrastructure setup and scale to production without rewriting code. The client handles both pre-computed embeddings and automatic embedding generation, with built-in support for popular embedding models.
Under the hood, ChromaDB uses HNSW indexing for fast similarity search, automatic tiering for hot/cold data management, and supports advanced querying with metadata filters, full-text search, and regex patterns. The JavaScript client offers full TypeScript support and works seamlessly in both Node.js and browser environments. Its Apache 2.0 license makes it suitable for commercial applications, and it integrates natively with LangChain and other LLM frameworks.
The architecture is optimized for developer experience: collections act like tables, items have IDs, embeddings, documents, and metadata, and queries return nearest neighbors based on vector similarity. Whether you're building a chatbot with memory, a document search engine, or a content recommendation system, chromadb provides the vector storage layer without the complexity of managing distributed systems.
import { ChromaClient } from 'chromadb';
// Initialize client (connects to localhost:8000 by default)
const client = new ChromaClient();
// Create or get a collection for storing documents
const collection = await client.getOrCreateCollection({
name: 'knowledge_base',
metadata: { description: 'Company documentation' }
});
// Add documents with embeddings and metadata
await collection.add({
ids: ['doc1', 'doc2', 'doc3'],
documents: [
'ChromaDB is a vector database for AI applications',
'Use embeddings to enable semantic search capabilities',
'RAG systems retrieve context before generating responses'
],
metadatas: [
{ source: 'docs', category: 'database' },
{ source: 'docs', category: 'search' },
{ source: 'blog', category: 'ai' }
]
// Embeddings auto-generated if not provided
});
// Query with metadata filtering
const results = await collection.query({
queryTexts: ['How do I build a search engine?'],
nResults: 2,
where: { category: { $in: ['search', 'ai'] } }
});
console.log(results.documents[0]); // Most relevant docs
console.log(results.distances[0]); // Similarity scores
// Update existing document
await collection.update({
ids: ['doc1'],
documents: ['ChromaDB: open-source vector database for LLMs'],
metadatas: [{ source: 'docs', category: 'database', updated: true }]
});
// Delete by ID or metadata filter
await collection.delete({
where: { updated: true }
});Retrieval-Augmented Generation (RAG): Store company documentation, knowledge bases, or customer data as embeddings. When a user asks a question, query the most relevant documents and feed them as context to an LLM like GPT-4 or Claude, reducing hallucinations and grounding responses in your data.
Semantic Search: Build search engines that understand meaning rather than just keywords. A user searching for "affordable transportation" would find results about "cheap bikes" or "budget cars" even without exact keyword matches, because embeddings capture semantic similarity.
Content Recommendations: Store user behavior, article content, or product descriptions as vectors. Query for similar items to power "users who liked this also liked" features or personalized content feeds based on user preference embeddings.
Duplicate Detection and Clustering: Identify near-duplicate documents, images, or user-generated content by finding items with high similarity scores. Useful for deduplication pipelines, content moderation, or grouping similar support tickets.
Conversational Memory: Maintain context in chatbots by storing conversation history as embeddings. Retrieve relevant past interactions when a user returns or mentions a previous topic, enabling more coherent long-term conversations without sending entire chat logs to the LLM.
npm install chromadbpnpm add chromadbbun add chromadb