How to Install Ollama on Linux

Ollama is a powerful tool for running large language models (LLMs) locally on your machine. It makes it easy to download, run, and manage models like Llama, Gemma, Mistral, and many others without requiring cloud services or API keys. This guide will walk you through installing Ollama on various Linux distributions.

System Requirements

Before installing Ollama, ensure your system meets these requirements:

  • Operating System: Any modern Linux distribution (64-bit)
  • RAM:
    • Minimum 8 GB for 7B parameter models
    • 16 GB for 13B parameter models
    • 32 GB for 33B parameter models
    • 64+ GB for larger models (70B+)
  • Storage: At least 10 GB free space (models range from 1GB to 400GB+)
  • CPU: x86_64 architecture (AMD64/Intel 64-bit)
  • GPU (Optional): NVIDIA GPU with CUDA support for faster inference
  • Internet connection: To download models and updates

Method 1: Quick Install Script (Recommended)

The easiest way to install Ollama on Linux is using the official installation script:

Step 1: Download and Run Install Script

curl -fsSL https://ollama.com/install.sh | sh

What this script does:

  • Downloads the latest Ollama binary
  • Installs it to /usr/local/bin/ollama
  • Creates a systemd service for automatic startup
  • Sets up proper permissions and user accounts

Step 2: Verify Installation

Check that Ollama is installed correctly:

ollama --version

Step 3: Start Ollama Service

Start and enable the Ollama service:

sudo systemctl start ollama
sudo systemctl enable ollama

Step 4: Test Installation

Download and run a small model to test:

ollama run gemma3:1b

This will download the 1B parameter Gemma 3 model (815MB) and start a chat interface.

Method 2: Manual Installation

If you prefer manual installation or the script doesn’t work for your system:

Step 1: Download Ollama Binary

Visit the Ollama releases page and download the latest Linux binary, or use wget:

# Download latest release (replace with actual version)
wget https://github.com/ollama/ollama/releases/download/v0.1.32/ollama-linux-amd64 -O ollama

# Make executable
chmod +x ollama

# Move to system path
sudo mv ollama /usr/local/bin/

Step 2: Create Ollama User

Create a dedicated user for running Ollama:

sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama

Step 3: Create Systemd Service

Create a systemd service file:

sudo tee /etc/systemd/system/ollama.service << 'EOF'
[Unit]
Description=Ollama Service
After=network-online.target

[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"

[Install]
WantedBy=default.target
EOF

Step 4: Start and Enable Service

sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

Method 3: Docker Installation

Run Ollama in a Docker container for isolated deployment:

Step 1: Pull Ollama Docker Image

docker pull ollama/ollama

Step 2: Run Ollama Container

Basic run:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

With GPU support (NVIDIA):

docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

Step 3: Execute Commands in Container

# Access container shell
docker exec -it ollama bash

# Run a model directly
docker exec -it ollama ollama run gemma3:1b

Method 4: Package Manager Installation

Arch Linux (AUR)

# Using yay
yay -S ollama

# Or manually
git clone https://aur.archlinux.org/ollama.git
cd ollama
makepkg -si

Fedora/RHEL/CentOS

# Add repository and install (when available)
# Currently, manual installation is recommended

Ubuntu/Debian

# Download .deb package from GitHub releases
wget https://github.com/ollama/ollama/releases/download/v0.1.32/ollama_0.1.32_linux_amd64.deb
sudo dpkg -i ollama_0.1.32_linux_amd64.deb

Getting Started with Ollama

Running Your First Model

Once installed, try running a model:

# Run Gemma 3 (4B parameters)
ollama run gemma3

# Run Llama 3.2 (3B parameters)
ollama run llama3.2

# Run a specific size variant
ollama run gemma3:1b  # 1B parameter version

Popular Models to Try

Small models (good for testing):

  • gemma3:1b – 815MB, fast and lightweight
  • llama3.2:1b – 1.3GB, excellent for basic tasks
  • phi4-mini – 2.5GB, Microsoft’s efficient model

Medium models (balanced performance):

  • gemma3 – 3.3GB, Google’s latest model
  • llama3.2 – 2.0GB, Meta’s efficient model
  • mistral – 4.1GB, great general-purpose model

Large models (high performance):

  • llama3.1 – 4.7GB, excellent reasoning
  • phi4 – 9.1GB, Microsoft’s advanced model
  • llama3.3 – 43GB, very capable large model

Basic Commands

# List available models
ollama list

# Show running models
ollama ps

# Pull a model without running
ollama pull llama3.2

# Remove a model
ollama rm llama3.2

# Copy a model
ollama cp llama3.2 my-custom-model

# Show model information
ollama show llama3.2

# Stop a running model
ollama stop llama3.2

Configuration and Customization

Environment Variables

Set environment variables to customize Ollama:

# Set custom model storage location
export OLLAMA_MODELS=/path/to/models

# Set custom host and port
export OLLAMA_HOST=0.0.0.0:11434

# Enable GPU acceleration
export OLLAMA_GPU=1

Custom Models with Modelfile

Create custom model configurations:

# Create a Modelfile
cat > Modelfile << 'EOF'
FROM llama3.2

# Set parameters
PARAMETER temperature 0.8
PARAMETER top_p 0.9

# Set system prompt
SYSTEM """
You are a helpful coding assistant. Always provide clear, 
well-commented code examples and explain your reasoning.
"""
EOF

# Create custom model
ollama create coding-assistant -f ./Modelfile

# Run your custom model
ollama run coding-assistant

GPU Configuration

For NVIDIA GPUs:

# Install CUDA toolkit
sudo apt install nvidia-cuda-toolkit  # Ubuntu/Debian
sudo dnf install cuda-toolkit         # Fedora

# Verify GPU detection
ollama run llama3.2
# Check logs: sudo journalctl -u ollama -f

For AMD GPUs:

# Install ROCm (AMD's GPU computing platform)
# Follow AMD's ROCm installation guide for your distribution

Advanced Usage

REST API

Ollama provides a REST API on port 11434:

# Generate text
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Explain quantum computing in simple terms",
  "stream": false
}'

# Chat interface
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    {"role": "user", "content": "What is machine learning?"}
  ]
}'

Integration with Programming Languages

Python:

pip install ollama
import ollama

response = ollama.chat(model='llama3.2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])
print(response['message']['content'])

JavaScript/Node.js:

npm install ollama
import ollama from 'ollama'

const response = await ollama.chat({
  model: 'llama3.2',
  messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})
console.log(response.message.content)

Multimodal Models

For models that support images:

# Run vision model
ollama run llava

# In chat, reference an image
>>> What's in this image? /path/to/image.jpg

Performance Optimization

Memory Management

# Limit concurrent models
export OLLAMA_MAX_LOADED_MODELS=1

# Set memory limit
export OLLAMA_MAX_VRAM=8GB

CPU Optimization

# Set number of threads
export OLLAMA_NUM_THREADS=8

# Enable CPU-specific optimizations
export OLLAMA_CPU_FLAGS="avx2,fma"

Troubleshooting

Common Issues

1. „Connection refused“ errors:

# Check if service is running
sudo systemctl status ollama

# Start if not running
sudo systemctl start ollama

# Check logs
sudo journalctl -u ollama -f

2. Out of memory errors:

# Try smaller models
ollama run gemma3:1b

# Check available RAM
free -h

# Monitor memory usage
htop

3. GPU not detected:

# Check NVIDIA drivers
nvidia-smi

# Verify CUDA installation
nvcc --version

# Check Ollama GPU support
ollama run llama3.2
# Look for GPU initialization in logs

4. Models not downloading:

# Check internet connection
ping ollama.com

# Try manual model pull
ollama pull llama3.2

# Check disk space
df -h

5. Permission errors:

# Fix ownership of Ollama directory
sudo chown -R ollama:ollama /usr/share/ollama

# Check service user
sudo systemctl show ollama | grep User

Performance Issues

Slow model loading:

  • Use SSD storage for model files
  • Increase available RAM
  • Close unnecessary applications

Slow inference:

  • Enable GPU acceleration if available
  • Try smaller model variants
  • Adjust thread count with OLLAMA_NUM_THREADS

Security Considerations

Network Security

Limit access to local network:

# Bind to localhost only (default)
export OLLAMA_HOST=127.0.0.1:11434

# Or specific interface
export OLLAMA_HOST=192.168.1.100:11434

Firewall configuration:

# Allow local access only
sudo ufw deny 11434
sudo ufw allow from 127.0.0.1 to any port 11434

Data Privacy

  • Models run entirely locally – no data sent to external servers
  • Model files stored in /usr/share/ollama/.ollama/models
  • Chat history not persisted by default
  • Consider encrypting model storage directory for sensitive use cases

Updating Ollama

Automatic Updates

# Re-run install script
curl -fsSL https://ollama.com/install.sh | sh

# Restart service
sudo systemctl restart ollama

Manual Updates

# Download new version
wget https://github.com/ollama/ollama/releases/download/v0.1.33/ollama-linux-amd64 -O ollama

# Replace binary
sudo systemctl stop ollama
sudo mv ollama /usr/local/bin/
sudo chmod +x /usr/local/bin/ollama
sudo systemctl start ollama

Update Models

# Update specific model
ollama pull llama3.2

# Update all models
ollama list | grep -v NAME | awk '{print $1}' | xargs -I {} ollama pull {}

Uninstalling Ollama

Remove Service and Binary

# Stop and disable service
sudo systemctl stop ollama
sudo systemctl disable ollama

# Remove service file
sudo rm /etc/systemd/system/ollama.service
sudo systemctl daemon-reload

# Remove binary
sudo rm /usr/local/bin/ollama

# Remove user account
sudo userdel ollama

Remove Models and Data

Warning: This will delete all downloaded models and configurations.

# Remove model data
sudo rm -rf /usr/share/ollama

# Remove user data (if running as regular user)
rm -rf ~/.ollama

Docker Cleanup

# Stop and remove container
docker stop ollama
docker rm ollama

# Remove image
docker rmi ollama/ollama

# Remove volume
docker volume rm ollama

Alternative AI Tools

While Ollama is excellent for local LLM deployment, consider these alternatives:

LM Studio: GUI-based model runner with drag-and-drop interface GPT4All: Cross-platform local AI assistant text-generation-webui: Web interface for running various models Llamafile: Single-file executables for LLMs vLLM: High-throughput LLM serving engine

Next Steps

Now that Ollama is installed on your Linux system, you can:

  • Experiment with different models to find ones that suit your needs
  • Build applications using the REST API or language libraries
  • Create custom models with Modelfiles for specific use cases
  • Set up automated scripts for batch processing tasks
  • Integrate with development workflows for code assistance
  • Explore multimodal capabilities with vision-language models

Community and Resources


Having trouble with your Ollama installation on Linux? Leave a comment and we’ll help you troubleshoot!

Loading

Über chukfinley

I am a long time Linux user and FLOSS enthusiast. I use Debian with DWM. Furthermore, I know how to code in Python, Flutter, HTML (How to meet ladies). I also love minimalism.

Zeige alle Beiträge von chukfinley →

Schreibe einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind mit * markiert