Ollama is a powerful tool for running large language models (LLMs) locally on your machine. It makes it easy to download, run, and manage models like Llama, Gemma, Mistral, and many others without requiring cloud services or API keys. This guide will walk you through installing Ollama on various Linux distributions.
System Requirements
Before installing Ollama, ensure your system meets these requirements:
- Operating System: Any modern Linux distribution (64-bit)
- RAM:
- Minimum 8 GB for 7B parameter models
- 16 GB for 13B parameter models
- 32 GB for 33B parameter models
- 64+ GB for larger models (70B+)
- Storage: At least 10 GB free space (models range from 1GB to 400GB+)
- CPU: x86_64 architecture (AMD64/Intel 64-bit)
- GPU (Optional): NVIDIA GPU with CUDA support for faster inference
- Internet connection: To download models and updates
Method 1: Quick Install Script (Recommended)
The easiest way to install Ollama on Linux is using the official installation script:
Step 1: Download and Run Install Script
curl -fsSL https://ollama.com/install.sh | sh
What this script does:
- Downloads the latest Ollama binary
- Installs it to
/usr/local/bin/ollama
- Creates a systemd service for automatic startup
- Sets up proper permissions and user accounts
Step 2: Verify Installation
Check that Ollama is installed correctly:
ollama --version
Step 3: Start Ollama Service
Start and enable the Ollama service:
sudo systemctl start ollama
sudo systemctl enable ollama
Step 4: Test Installation
Download and run a small model to test:
ollama run gemma3:1b
This will download the 1B parameter Gemma 3 model (815MB) and start a chat interface.
Method 2: Manual Installation
If you prefer manual installation or the script doesn’t work for your system:
Step 1: Download Ollama Binary
Visit the Ollama releases page and download the latest Linux binary, or use wget:
# Download latest release (replace with actual version)
wget https://github.com/ollama/ollama/releases/download/v0.1.32/ollama-linux-amd64 -O ollama
# Make executable
chmod +x ollama
# Move to system path
sudo mv ollama /usr/local/bin/
Step 2: Create Ollama User
Create a dedicated user for running Ollama:
sudo useradd -r -s /bin/false -m -d /usr/share/ollama ollama
Step 3: Create Systemd Service
Create a systemd service file:
sudo tee /etc/systemd/system/ollama.service << 'EOF'
[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/local/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin"
[Install]
WantedBy=default.target
EOF
Step 4: Start and Enable Service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
Method 3: Docker Installation
Run Ollama in a Docker container for isolated deployment:
Step 1: Pull Ollama Docker Image
docker pull ollama/ollama
Step 2: Run Ollama Container
Basic run:
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
With GPU support (NVIDIA):
docker run -d --gpus=all -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
Step 3: Execute Commands in Container
# Access container shell
docker exec -it ollama bash
# Run a model directly
docker exec -it ollama ollama run gemma3:1b
Method 4: Package Manager Installation
Arch Linux (AUR)
# Using yay
yay -S ollama
# Or manually
git clone https://aur.archlinux.org/ollama.git
cd ollama
makepkg -si
Fedora/RHEL/CentOS
# Add repository and install (when available)
# Currently, manual installation is recommended
Ubuntu/Debian
# Download .deb package from GitHub releases
wget https://github.com/ollama/ollama/releases/download/v0.1.32/ollama_0.1.32_linux_amd64.deb
sudo dpkg -i ollama_0.1.32_linux_amd64.deb
Getting Started with Ollama
Running Your First Model
Once installed, try running a model:
# Run Gemma 3 (4B parameters)
ollama run gemma3
# Run Llama 3.2 (3B parameters)
ollama run llama3.2
# Run a specific size variant
ollama run gemma3:1b # 1B parameter version
Popular Models to Try
Small models (good for testing):
gemma3:1b
– 815MB, fast and lightweightllama3.2:1b
– 1.3GB, excellent for basic tasksphi4-mini
– 2.5GB, Microsoft’s efficient model
Medium models (balanced performance):
gemma3
– 3.3GB, Google’s latest modelllama3.2
– 2.0GB, Meta’s efficient modelmistral
– 4.1GB, great general-purpose model
Large models (high performance):
llama3.1
– 4.7GB, excellent reasoningphi4
– 9.1GB, Microsoft’s advanced modelllama3.3
– 43GB, very capable large model
Basic Commands
# List available models
ollama list
# Show running models
ollama ps
# Pull a model without running
ollama pull llama3.2
# Remove a model
ollama rm llama3.2
# Copy a model
ollama cp llama3.2 my-custom-model
# Show model information
ollama show llama3.2
# Stop a running model
ollama stop llama3.2
Configuration and Customization
Environment Variables
Set environment variables to customize Ollama:
# Set custom model storage location
export OLLAMA_MODELS=/path/to/models
# Set custom host and port
export OLLAMA_HOST=0.0.0.0:11434
# Enable GPU acceleration
export OLLAMA_GPU=1
Custom Models with Modelfile
Create custom model configurations:
# Create a Modelfile
cat > Modelfile << 'EOF'
FROM llama3.2
# Set parameters
PARAMETER temperature 0.8
PARAMETER top_p 0.9
# Set system prompt
SYSTEM """
You are a helpful coding assistant. Always provide clear,
well-commented code examples and explain your reasoning.
"""
EOF
# Create custom model
ollama create coding-assistant -f ./Modelfile
# Run your custom model
ollama run coding-assistant
GPU Configuration
For NVIDIA GPUs:
# Install CUDA toolkit
sudo apt install nvidia-cuda-toolkit # Ubuntu/Debian
sudo dnf install cuda-toolkit # Fedora
# Verify GPU detection
ollama run llama3.2
# Check logs: sudo journalctl -u ollama -f
For AMD GPUs:
# Install ROCm (AMD's GPU computing platform)
# Follow AMD's ROCm installation guide for your distribution
Advanced Usage
REST API
Ollama provides a REST API on port 11434:
# Generate text
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt": "Explain quantum computing in simple terms",
"stream": false
}'
# Chat interface
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{"role": "user", "content": "What is machine learning?"}
]
}'
Integration with Programming Languages
Python:
pip install ollama
import ollama
response = ollama.chat(model='llama3.2', messages=[
{
'role': 'user',
'content': 'Why is the sky blue?',
},
])
print(response['message']['content'])
JavaScript/Node.js:
npm install ollama
import ollama from 'ollama'
const response = await ollama.chat({
model: 'llama3.2',
messages: [{ role: 'user', content: 'Why is the sky blue?' }],
})
console.log(response.message.content)
Multimodal Models
For models that support images:
# Run vision model
ollama run llava
# In chat, reference an image
>>> What's in this image? /path/to/image.jpg
Performance Optimization
Memory Management
# Limit concurrent models
export OLLAMA_MAX_LOADED_MODELS=1
# Set memory limit
export OLLAMA_MAX_VRAM=8GB
CPU Optimization
# Set number of threads
export OLLAMA_NUM_THREADS=8
# Enable CPU-specific optimizations
export OLLAMA_CPU_FLAGS="avx2,fma"
Troubleshooting
Common Issues
1. „Connection refused“ errors:
# Check if service is running
sudo systemctl status ollama
# Start if not running
sudo systemctl start ollama
# Check logs
sudo journalctl -u ollama -f
2. Out of memory errors:
# Try smaller models
ollama run gemma3:1b
# Check available RAM
free -h
# Monitor memory usage
htop
3. GPU not detected:
# Check NVIDIA drivers
nvidia-smi
# Verify CUDA installation
nvcc --version
# Check Ollama GPU support
ollama run llama3.2
# Look for GPU initialization in logs
4. Models not downloading:
# Check internet connection
ping ollama.com
# Try manual model pull
ollama pull llama3.2
# Check disk space
df -h
5. Permission errors:
# Fix ownership of Ollama directory
sudo chown -R ollama:ollama /usr/share/ollama
# Check service user
sudo systemctl show ollama | grep User
Performance Issues
Slow model loading:
- Use SSD storage for model files
- Increase available RAM
- Close unnecessary applications
Slow inference:
- Enable GPU acceleration if available
- Try smaller model variants
- Adjust thread count with
OLLAMA_NUM_THREADS
Security Considerations
Network Security
Limit access to local network:
# Bind to localhost only (default)
export OLLAMA_HOST=127.0.0.1:11434
# Or specific interface
export OLLAMA_HOST=192.168.1.100:11434
Firewall configuration:
# Allow local access only
sudo ufw deny 11434
sudo ufw allow from 127.0.0.1 to any port 11434
Data Privacy
- Models run entirely locally – no data sent to external servers
- Model files stored in
/usr/share/ollama/.ollama/models
- Chat history not persisted by default
- Consider encrypting model storage directory for sensitive use cases
Updating Ollama
Automatic Updates
# Re-run install script
curl -fsSL https://ollama.com/install.sh | sh
# Restart service
sudo systemctl restart ollama
Manual Updates
# Download new version
wget https://github.com/ollama/ollama/releases/download/v0.1.33/ollama-linux-amd64 -O ollama
# Replace binary
sudo systemctl stop ollama
sudo mv ollama /usr/local/bin/
sudo chmod +x /usr/local/bin/ollama
sudo systemctl start ollama
Update Models
# Update specific model
ollama pull llama3.2
# Update all models
ollama list | grep -v NAME | awk '{print $1}' | xargs -I {} ollama pull {}
Uninstalling Ollama
Remove Service and Binary
# Stop and disable service
sudo systemctl stop ollama
sudo systemctl disable ollama
# Remove service file
sudo rm /etc/systemd/system/ollama.service
sudo systemctl daemon-reload
# Remove binary
sudo rm /usr/local/bin/ollama
# Remove user account
sudo userdel ollama
Remove Models and Data
Warning: This will delete all downloaded models and configurations.
# Remove model data
sudo rm -rf /usr/share/ollama
# Remove user data (if running as regular user)
rm -rf ~/.ollama
Docker Cleanup
# Stop and remove container
docker stop ollama
docker rm ollama
# Remove image
docker rmi ollama/ollama
# Remove volume
docker volume rm ollama
Alternative AI Tools
While Ollama is excellent for local LLM deployment, consider these alternatives:
LM Studio: GUI-based model runner with drag-and-drop interface GPT4All: Cross-platform local AI assistant text-generation-webui: Web interface for running various models Llamafile: Single-file executables for LLMs vLLM: High-throughput LLM serving engine
Next Steps
Now that Ollama is installed on your Linux system, you can:
- Experiment with different models to find ones that suit your needs
- Build applications using the REST API or language libraries
- Create custom models with Modelfiles for specific use cases
- Set up automated scripts for batch processing tasks
- Integrate with development workflows for code assistance
- Explore multimodal capabilities with vision-language models
Community and Resources
- Official Documentation: ollama.com
- GitHub Repository: github.com/ollama/ollama
- Discord Community: Join for support and discussions
- Model Library: ollama.com/library
- Reddit: r/LocalLLaMA for local AI discussions
Having trouble with your Ollama installation on Linux? Leave a comment and we’ll help you troubleshoot!