ollama

Ollama Javascript library

weekly downloads570K

versionv0.6.3

licenseMIT

categoryai

Overview

The ollama npm package is the official JavaScript/TypeScript client for integrating with Ollama servers, enabling developers to run large language models locally without cloud dependencies. It provides a straightforward API for chat completions, text generation, and model lifecycle management (pulling, creating, and pushing models). With over 569,000 weekly downloads, it has become the de facto standard for JavaScript developers building AI-powered applications that prioritize data privacy and offline capability.

The library mirrors Ollama's REST API with idiomatic JavaScript patterns, offering async/await support and streaming capabilities through AsyncIterables. It handles everything from simple text completions to advanced features like multi-modal inputs (vision models), structured JSON output with schema validation, and custom model creation via Modelfiles. The package works seamlessly in both Node.js and browser environments when properly configured.

Developers choose ollama when they need full control over their AI infrastructure—models run on local hardware, no data leaves the system, and there are no per-token API costs. It's particularly valuable for enterprise applications with strict compliance requirements, edge computing scenarios, or development workflows where internet connectivity is unreliable. The library maintains feature parity with Ollama's server capabilities, automatically exposing new functionality like tool-calling and advanced quantization as the ecosystem evolves.

Quick Start

typescript

import { Ollama } from 'ollama';

// Initialize client (defaults to http://127.0.0.1:11434)
const ollama = new Ollama({ host: 'http://localhost:11434' });

// Non-streaming chat completion
async function simpleChat() {
  const response = await ollama.chat({
    model: 'llama3.1',
    messages: [
      { role: 'system', content: 'You are a helpful coding assistant.' },
      { role: 'user', content: 'Explain JavaScript promises in one sentence.' }
    ]
  });
  console.log(response.message.content);
}

// Streaming response with real-time output
async function streamingChat() {
  const stream = await ollama.chat({
    model: 'llama3.1',
    messages: [{ role: 'user', content: 'Write a haiku about TypeScript' }],
    stream: true
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.message.content);
  }
  console.log('\n');
}

// Vision model with image input
async function analyzeImage() {
  const fs = require('fs');
  const imageBuffer = fs.readFileSync('./screenshot.png');
  const base64Image = imageBuffer.toString('base64');

  const response = await ollama.chat({
    model: 'llava',
    messages: [{
      role: 'user',
      content: 'Describe this image in detail',
      images: [base64Image]
    }]
  });
  console.log(response.message.content);
}

// Pull model with progress tracking
async function downloadModel() {
  const stream = await ollama.pull({ model: 'mistral', stream: true });
  for await (const progress of stream) {
    if (progress.status) {
      console.log(`${progress.status}: ${progress.completed || 0}/${progress.total || 0}`);
    }
  }
}

streamingChat().catch(console.error);

Use Cases

Local chatbot interfaces: Build conversational UIs in Electron or web apps where user data never leaves the device. Stream responses token-by-token for real-time feedback, using models like Llama 3.1 or Mistral running entirely on user hardware.

Document analysis pipelines: Process sensitive documents (legal, medical, financial) through local LLMs for summarization, extraction, or classification without third-party API calls. Combine with RAG patterns to query internal knowledge bases while maintaining HIPAA/GDPR compliance.

Development tooling and IDE extensions: Create VS Code extensions or CLI tools that provide code completion, documentation generation, or test case creation using local models. Developers get AI assistance without sending proprietary code to external services.

Multi-modal content processing: Analyze images with vision models (LLaVA) for tasks like automated alt-text generation, product catalog classification, or visual QA systems. Pass base64-encoded images directly to chat methods alongside text prompts.

Prototyping and experimentation: Quickly test different LLM architectures and fine-tuned models by pulling from Ollama's registry. Create custom models with specific system prompts or parameter tweaks via Modelfiles, ideal for research or proof-of-concept work before cloud deployment.

Pros & Cons

Pros

+Complete data privacy with local execution—no external API calls or data transmission required
+Zero per-token costs after initial setup; unlimited usage constrained only by local hardware
+OpenAI-compatible API design makes it a drop-in replacement for cloud providers with minimal code changes
+Native streaming support with AsyncIterables for real-time token generation in UIs
+Comprehensive model management (pull, create, push) with progress tracking for DevOps workflows
+First-class TypeScript support with accurate type definitions for all methods and options

Cons

−Requires separate Ollama server installation and maintenance—not a pure JavaScript solution
−Inference speed heavily dependent on local GPU/CPU capabilities; slower than cloud providers on consumer hardware
−Model downloads can be large (2-70GB) requiring significant disk space and bandwidth
−Limited model selection compared to cloud providers—constrained to open-source models compatible with llama.cpp
−No built-in load balancing or failover for production deployments without additional infrastructure
−Documentation assumes familiarity with Ollama server concepts and model management workflows

Alternatives

llamaindexv0.12.1

<p align="center"> <img height="100" width="100" alt="LlamaIndex logo" src="https://ts.llamaindex.ai/square.svg" /> </p> <h1 align="center">LlamaIndex.TS</h1> <h3 align="center"> Data framework for your LLM application. </h3>

aillmllama

95K / week

openaiv6.22.0

The official TypeScript library for the OpenAI API

11.8M / week

Install

bash

npm install ollama

bash

pnpm add ollama

bash

bun add ollama

Overview

Quick Start

typescript

import { Ollama } from 'ollama';

// Initialize client (defaults to http://127.0.0.1:11434)
const ollama = new Ollama({ host: 'http://localhost:11434' });

// Non-streaming chat completion
async function simpleChat() {
  const response = await ollama.chat({
    model: 'llama3.1',
    messages: [
      { role: 'system', content: 'You are a helpful coding assistant.' },
      { role: 'user', content: 'Explain JavaScript promises in one sentence.' }
    ]
  });
  console.log(response.message.content);
}

// Streaming response with real-time output
async function streamingChat() {
  const stream = await ollama.chat({
    model: 'llama3.1',
    messages: [{ role: 'user', content: 'Write a haiku about TypeScript' }],
    stream: true
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.message.content);
  }
  console.log('\n');
}

// Vision model with image input
async function analyzeImage() {
  const fs = require('fs');
  const imageBuffer = fs.readFileSync('./screenshot.png');
  const base64Image = imageBuffer.toString('base64');

  const response = await ollama.chat({
    model: 'llava',
    messages: [{
      role: 'user',
      content: 'Describe this image in detail',
      images: [base64Image]
    }]
  });
  console.log(response.message.content);
}

// Pull model with progress tracking
async function downloadModel() {
  const stream = await ollama.pull({ model: 'mistral', stream: true });
  for await (const progress of stream) {
    if (progress.status) {
      console.log(`${progress.status}: ${progress.completed || 0}/${progress.total || 0}`);
    }
  }
}

streamingChat().catch(console.error);

Use Cases

Pros & Cons

Pros

+Complete data privacy with local execution—no external API calls or data transmission required
+Zero per-token costs after initial setup; unlimited usage constrained only by local hardware
+OpenAI-compatible API design makes it a drop-in replacement for cloud providers with minimal code changes
+Native streaming support with AsyncIterables for real-time token generation in UIs
+Comprehensive model management (pull, create, push) with progress tracking for DevOps workflows
+First-class TypeScript support with accurate type definitions for all methods and options

Cons

−Requires separate Ollama server installation and maintenance—not a pure JavaScript solution
−Inference speed heavily dependent on local GPU/CPU capabilities; slower than cloud providers on consumer hardware
−Model downloads can be large (2-70GB) requiring significant disk space and bandwidth
−Limited model selection compared to cloud providers—constrained to open-source models compatible with llama.cpp
−No built-in load balancing or failover for production deployments without additional infrastructure
−Documentation assumes familiarity with Ollama server concepts and model management workflows

ollama

Overview

Quick Start

Use Cases

Pros & Cons

Pros

Cons

Alternatives

Related Content

Install

ollama

Overview

Quick Start

Use Cases

Pros & Cons

Pros

Cons

Alternatives

Related Content

Install