How to Install Ollama on Linux

Ollama is a powerful tool for running large language models (LLMs) locally on your machine. It makes it easy to download, run, and manage models like Llama, Gemma, Mistral, and many others without requiring cloud services or API keys. This guide will walk you through installing Ollama on various Linux distributions.

System Requirements

Before installing Ollama, ensure your system meets these requirements:

Operating System: Any modern Linux distribution (64-bit)
RAM:
- Minimum 8 GB for 7B parameter models
- 16 GB for 13B parameter models
- 32 GB for 33B parameter models
- 64+ GB for larger models (70B+)
Storage: At least 10 GB free space (models range from 1GB to 400GB+)
CPU: x86_64 architecture (AMD64/Intel 64-bit)
GPU (Optional): NVIDIA GPU with CUDA support for faster inference
Internet connection: To download models and updates

Method 1: Quick Install Script (Recommended)

The easiest way to install Ollama on Linux is using the official installation script:

Step 1: Download and Run Install Script

curl -fsSL https://ollama.com/install.sh | sh

What this script does:

Downloads the latest Ollama binary
Installs it to /usr/local/bin/ollama
Creates a systemd service for automatic startup
Sets up proper permissions and user accounts

Step 2: Verify Installation

Check that Ollama is installed correctly:

ollama --version

Step 3: Start Ollama Service

Start and enable the Ollama service:

sudo systemctl start ollama
sudo systemctl enable ollama

Step 4: Test Installation

Download and run a small model to test:

ollama run gemma3:1b

This will download the 1B parameter Gemma 3 model (815MB) and start a chat interface.

Method 2: Manual Installation

If you prefer manual installation or the script doesn’t work for your system:

Step 1: Download Ollama Binary

Visit the Ollama releases page and download the latest Linux binary, or use wget:

# Download latest release (replace with actual version)
wget https://github.com/ollama/ollama/releases/download/v0.1.32/ollama-linux-amd64 -O ollama

# Make executable
chmod +x ollama

# Move to system path
sudo mv ollama /usr/local/bin/

Step 2: Create Ollama User

Create a dedicated user for running Ollama:

sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama

Step 3: Create Systemd Service

Create a systemd service file:

sudo tee /etc/systemd/system/ollama.service << 'EOF'
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"

[Install]
WantedBy=default.target
EOF

Step 4: Start and Enable Service

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Method 3: Docker Installation

Run Ollama in a Docker container for isolated deployment:

Step 1: Pull Ollama Docker Image

docker pull ollama/ollama

Step 2: Run Ollama Container

Basic run:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

With GPU support (NVIDIA):

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Step 3: Execute Commands in Container

# Access container shell
docker exec -it ollama bash

# Run a model directly
docker exec -it ollama ollama run gemma3:1b

Method 4: Package Manager Installation

Arch Linux (AUR)

# Using yay
yay -S ollama

# Or manually
git clone https://aur.archlinux.org/ollama.git
cd ollama
makepkg -si

Fedora/RHEL/CentOS

# Add repository and install (when available)
# Currently, manual installation is recommended

Ubuntu/Debian

# Download .deb package from GitHub releases
wget https://github.com/ollama/ollama/releases/download/v0.1.32/ollama_0.1.32_linux_amd64.deb
sudo dpkg -i ollama_0.1.32_linux_amd64.deb

Getting Started with Ollama

Running Your First Model

Once installed, try running a model:

# Run Gemma 3 (4B parameters)
ollama run gemma3

# Run Llama 3.2 (3B parameters)
ollama run llama3.2

# Run a specific size variant
ollama run gemma3:1b  # 1B parameter version

Popular Models to Try

Small models (good for testing):

gemma3:1b – 815MB, fast and lightweight
llama3.2:1b – 1.3GB, excellent for basic tasks
phi4-mini – 2.5GB, Microsoft’s efficient model

Medium models (balanced performance):

gemma3 – 3.3GB, Google’s latest model
llama3.2 – 2.0GB, Meta’s efficient model
mistral – 4.1GB, great general-purpose model

Large models (high performance):

llama3.1 – 4.7GB, excellent reasoning
phi4 – 9.1GB, Microsoft’s advanced model
llama3.3 – 43GB, very capable large model

Basic Commands

# List available models
ollama list

# Show running models
ollama ps

# Pull a model without running
ollama pull llama3.2

# Remove a model
ollama rm llama3.2

# Copy a model
ollama cp llama3.2 my-custom-model

# Show model information
ollama show llama3.2

# Stop a running model
ollama stop llama3.2

Configuration and Customization

Environment Variables

Set environment variables to customize Ollama:

# Set custom model storage location
export OLLAMA_MODELS=/path/to/models

# Set custom host and port
export OLLAMA_HOST=0.0.0.0:11434

# Enable GPU acceleration
export OLLAMA_GPU=1

Custom Models with Modelfile

Create custom model configurations:

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM llama3.2

# Set parameters
PARAMETER temperature 0.8
PARAMETER top_p 0.9

# Set system prompt
SYSTEM """
You are a helpful coding assistant. Always provide clear, 
well-commented code examples and explain your reasoning.
"""
EOF

# Create custom model
ollama create coding-assistant -f ./Modelfile

# Run your custom model
ollama run coding-assistant

GPU Configuration

For NVIDIA GPUs:

# Install CUDA toolkit
sudo apt install nvidia-cuda-toolkit  # Ubuntu/Debian
sudo dnf install cuda-toolkit         # Fedora

# Verify GPU detection
ollama run llama3.2
# Check logs: sudo journalctl -u ollama -f

For AMD GPUs:

# Install ROCm (AMD's GPU computing platform)
# Follow AMD's ROCm installation guide for your distribution

Advanced Usage

REST API

Ollama provides a REST API on port 11434:

# Generate text
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in simple terms",
  "stream": false
}'

# Chat interface
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "What is machine learning?"}
  ]
}'

Integration with Programming Languages

Python:

pip install ollama

import ollama

response = ollama.chat(model='llama3.2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

JavaScript/Node.js:

npm install ollama

import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})
console.log(response.message.content)

Multimodal Models

For models that support images:

# Run vision model
ollama run llava

# In chat, reference an image
>>> What's in this image? /path/to/image.jpg

Performance Optimization

Memory Management

# Limit concurrent models
export OLLAMA_MAX_LOADED_MODELS=1

# Set memory limit
export OLLAMA_MAX_VRAM=8GB

CPU Optimization

# Set number of threads
export OLLAMA_NUM_THREADS=8

# Enable CPU-specific optimizations
export OLLAMA_CPU_FLAGS="avx2,fma"

Troubleshooting

Common Issues

1. „Connection refused“ errors:

# Check if service is running
sudo systemctl status ollama

# Start if not running
sudo systemctl start ollama

# Check logs
sudo journalctl -u ollama -f

2. Out of memory errors:

# Try smaller models
ollama run gemma3:1b

# Check available RAM
free -h

# Monitor memory usage
htop

3. GPU not detected:

# Check NVIDIA drivers
nvidia-smi

# Verify CUDA installation
nvcc --version

# Check Ollama GPU support
ollama run llama3.2
# Look for GPU initialization in logs

4. Models not downloading:

# Check internet connection
ping ollama.com

# Try manual model pull
ollama pull llama3.2

# Check disk space
df -h

5. Permission errors:

# Fix ownership of Ollama directory
sudo chown -R ollama:ollama /usr/share/ollama

# Check service user
sudo systemctl show ollama | grep User

Performance Issues

Slow model loading:

Use SSD storage for model files
Increase available RAM
Close unnecessary applications

Slow inference:

Enable GPU acceleration if available
Try smaller model variants
Adjust thread count with OLLAMA_NUM_THREADS

Security Considerations

Network Security

Limit access to local network:

# Bind to localhost only (default)
export OLLAMA_HOST=127.0.0.1:11434

# Or specific interface
export OLLAMA_HOST=192.168.1.100:11434

Firewall configuration:

# Allow local access only
sudo ufw deny 11434
sudo ufw allow from 127.0.0.1 to any port 11434

Data Privacy

Models run entirely locally – no data sent to external servers
Model files stored in /usr/share/ollama/.ollama/models
Chat history not persisted by default
Consider encrypting model storage directory for sensitive use cases

Updating Ollama

Automatic Updates

# Re-run install script
curl -fsSL https://ollama.com/install.sh | sh

# Restart service
sudo systemctl restart ollama

Manual Updates

# Download new version
wget https://github.com/ollama/ollama/releases/download/v0.1.33/ollama-linux-amd64 -O ollama

# Replace binary
sudo systemctl stop ollama
sudo mv ollama /usr/local/bin/
sudo chmod +x /usr/local/bin/ollama
sudo systemctl start ollama

Update Models

# Update specific model
ollama pull llama3.2

# Update all models
ollama list | grep -v NAME | awk '{print $1}' | xargs -I {} ollama pull {}

Uninstalling Ollama

Remove Service and Binary

# Stop and disable service
sudo systemctl stop ollama
sudo systemctl disable ollama

# Remove service file
sudo rm /etc/systemd/system/ollama.service
sudo systemctl daemon-reload

# Remove binary
sudo rm /usr/local/bin/ollama

# Remove user account
sudo userdel ollama

Remove Models and Data

Warning: This will delete all downloaded models and configurations.

# Remove model data
sudo rm -rf /usr/share/ollama

# Remove user data (if running as regular user)
rm -rf ~/.ollama

Docker Cleanup

# Stop and remove container
docker stop ollama
docker rm ollama

# Remove image
docker rmi ollama/ollama

# Remove volume
docker volume rm ollama

Alternative AI Tools

While Ollama is excellent for local LLM deployment, consider these alternatives:

LM Studio: GUI-based model runner with drag-and-drop interface GPT4All: Cross-platform local AI assistant text-generation-webui: Web interface for running various models Llamafile: Single-file executables for LLMs vLLM: High-throughput LLM serving engine

Next Steps

Now that Ollama is installed on your Linux system, you can:

Experiment with different models to find ones that suit your needs
Build applications using the REST API or language libraries
Create custom models with Modelfiles for specific use cases
Set up automated scripts for batch processing tasks
Integrate with development workflows for code assistance
Explore multimodal capabilities with vision-language models

Community and Resources

Official Documentation: ollama.com
GitHub Repository: github.com/ollama/ollama
Discord Community: Join for support and discussions
Model Library: ollama.com/library
Reddit: r/LocalLLaMA for local AI discussions

Having trouble with your Ollama installation on Linux? Leave a comment and we’ll help you troubleshoot!