SurfSense Documentation

Docker Installation

Setting up SurfSense using Docker

Docker Installation

This guide explains how to run SurfSense using Docker, with options ranging from quick single-command deployment to full production setups.

Quick Start with Docker 🐳

Get SurfSense running in seconds with a single command:

The all-in-one Docker image bundles PostgreSQL (with pgvector), Redis, and all SurfSense services. Perfect for quick evaluation and development.

Make sure to include the -v surfsense-data:/data in your Docker command. This ensures your database and files are properly persisted.

One-Line Installation

Linux/macOS:

docker run -d -p 3000:3000 -p 8000:8000 \
  -v surfsense-data:/data \
  --name surfsense \
  --restart unless-stopped \
  ghcr.io/modsetter/surfsense:latest

Windows (PowerShell):

docker run -d -p 3000:3000 -p 8000:8000 `
  -v surfsense-data:/data `
  --name surfsense `
  --restart unless-stopped `
  ghcr.io/modsetter/surfsense:latest

Note: A secure SECRET_KEY is automatically generated and persisted in the data volume on first run.

With Custom Configuration

Using OpenAI Embeddings:

docker run -d -p 3000:3000 -p 8000:8000 \
  -v surfsense-data:/data \
  -e EMBEDDING_MODEL=openai://text-embedding-ada-002 \
  -e OPENAI_API_KEY=your_openai_api_key \
  --name surfsense \
  --restart unless-stopped \
  ghcr.io/modsetter/surfsense:latest

With Google OAuth:

docker run -d -p 3000:3000 -p 8000:8000 \
  -v surfsense-data:/data \
  -e AUTH_TYPE=GOOGLE \
  -e GOOGLE_OAUTH_CLIENT_ID=your_client_id \
  -e GOOGLE_OAUTH_CLIENT_SECRET=your_client_secret \
  --name surfsense \
  --restart unless-stopped \
  ghcr.io/modsetter/surfsense:latest

Quick Start with Docker Compose

For easier management with environment files:

# Download the quick start compose file
curl -o docker-compose.yml https://raw.githubusercontent.com/MODSetter/SurfSense/main/docker-compose.quickstart.yml

# Create .env file (optional - for custom configuration)
cat > .env << EOF
# EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
# ETL_SERVICE=DOCLING
# SECRET_KEY=your_custom_secret_key  # Auto-generated if not set
EOF

# Start SurfSense
docker compose up -d

After starting, access SurfSense at:

Quick Start Environment Variables

VariableDescriptionDefault
SECRET_KEYJWT secret key (auto-generated if not set)Auto-generated
AUTH_TYPEAuthentication: LOCAL or GOOGLELOCAL
EMBEDDING_MODELModel for embeddingssentence-transformers/all-MiniLM-L6-v2
ETL_SERVICEDocument parser: DOCLING, UNSTRUCTURED, LLAMACLOUDDOCLING
TTS_SERVICEText-to-speech for podcastslocal/kokoro
STT_SERVICESpeech-to-text for audio (model size: tiny, base, small, medium, large)local/base
REGISTRATION_ENABLEDAllow new user registrationTRUE

Useful Commands

# View logs
docker logs -f surfsense

# Stop SurfSense
docker stop surfsense

# Start SurfSense
docker start surfsense

# Remove container (data preserved in volume)
docker rm surfsense

# Remove container AND data
docker rm surfsense && docker volume rm surfsense-data

Full Docker Compose Setup (Production)

For production deployments with separate services and more control, use the full Docker Compose setup below.

Prerequisites

Before you begin, ensure you have:

  • Docker and Docker Compose installed on your machine
  • Git (to clone the repository)
  • Completed all the prerequisite setup steps including:
    • Auth setup
    • File Processing ETL Service (choose one):
      • Unstructured.io API key (Supports 34+ formats)
      • LlamaIndex API key (enhanced parsing, supports 50+ formats)
      • Docling (local processing, no API key required, supports PDF, Office docs, images, HTML, CSV)
    • Other required API keys

Installation Steps

  1. Configure Environment Variables Set up the necessary environment variables:

    Linux/macOS:

    # Copy example environment files
    cp surfsense_backend/.env.example surfsense_backend/.env
    cp surfsense_web/.env.example surfsense_web/.env
    cp .env.example .env  # For Docker-specific settings

    Windows (Command Prompt):

    copy surfsense_backend\.env.example surfsense_backend\.env
    copy surfsense_web\.env.example surfsense_web\.env
    copy .env.example .env

    Windows (PowerShell):

    Copy-Item -Path surfsense_backend\.env.example -Destination surfsense_backend\.env
    Copy-Item -Path surfsense_web\.env.example -Destination surfsense_web\.env
    Copy-Item -Path .env.example -Destination .env

    Edit all .env files and fill in the required values:

Docker-Specific Environment Variables (Optional)

ENV VARIABLEDESCRIPTIONDEFAULT VALUE
FRONTEND_PORTPort for the frontend service3000
BACKEND_PORTPort for the backend API service8000
POSTGRES_PORTPort for the PostgreSQL database5432
PGADMIN_PORTPort for pgAdmin web interface5050
REDIS_PORTPort for Redis (used by Celery)6379
FLOWER_PORTPort for Flower (Celery monitoring tool)5555
POSTGRES_USERPostgreSQL usernamepostgres
POSTGRES_PASSWORDPostgreSQL passwordpostgres
POSTGRES_DBPostgreSQL database namesurfsense
PGADMIN_DEFAULT_EMAILEmail for pgAdmin loginadmin@surfsense.com
PGADMIN_DEFAULT_PASSWORDPassword for pgAdmin loginsurfsense
NEXT_PUBLIC_FASTAPI_BACKEND_URLURL of the backend API (used by frontend during build and runtime)http://localhost:8000
NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPEAuthentication method for frontend: LOCAL or GOOGLELOCAL
NEXT_PUBLIC_ETL_SERVICEDocument parsing service for frontend UI: UNSTRUCTURED, LLAMACLOUD, or DOCLINGDOCLING

Note: Frontend environment variables with the NEXT_PUBLIC_ prefix are embedded into the Next.js production build at build time. Since the frontend now runs as a production build in Docker, these variables must be set in the root .env file (Docker-specific configuration) and will be passed as build arguments during the Docker build process.

Backend Environment Variables:

ENV VARIABLEDESCRIPTION
DATABASE_URLPostgreSQL connection string (e.g., postgresql+asyncpg://postgres:postgres@localhost:5432/surfsense)
SECRET_KEYJWT Secret key for authentication (should be a secure random string)
NEXT_FRONTEND_URLURL where your frontend application is hosted (e.g., http://localhost:3000)
AUTH_TYPEAuthentication method: GOOGLE for OAuth with Google, LOCAL for email/password authentication
GOOGLE_OAUTH_CLIENT_ID(Optional) Client ID from Google Cloud Console (required if AUTH_TYPE=GOOGLE)
GOOGLE_OAUTH_CLIENT_SECRET(Optional) Client secret from Google Cloud Console (required if AUTH_TYPE=GOOGLE)
EMBEDDING_MODELName of the embedding model (e.g., sentence-transformers/all-MiniLM-L6-v2, openai://text-embedding-ada-002)
RERANKERS_ENABLED(Optional) Enable or disable document reranking for improved search results (e.g., TRUE or FALSE, default: FALSE)
RERANKERS_MODEL_NAMEName of the reranker model (e.g., ms-marco-MiniLM-L-12-v2) (required if RERANKERS_ENABLED=TRUE)
RERANKERS_MODEL_TYPEType of reranker model (e.g., flashrank) (required if RERANKERS_ENABLED=TRUE)
TTS_SERVICEText-to-Speech API provider for Podcasts (e.g., local/kokoro, openai/tts-1). See supported providers
TTS_SERVICE_API_KEY(Optional if local) API key for the Text-to-Speech service
TTS_SERVICE_API_BASE(Optional) Custom API base URL for the Text-to-Speech service
STT_SERVICESpeech-to-Text API provider for Audio Files (e.g., local/base, openai/whisper-1). See supported providers
STT_SERVICE_API_KEY(Optional if local) API key for the Speech-to-Text service
STT_SERVICE_API_BASE(Optional) Custom API base URL for the Speech-to-Text service
FIRECRAWL_API_KEYAPI key for Firecrawl service for web crawling
ETL_SERVICEDocument parsing service: UNSTRUCTURED (supports 34+ formats), LLAMACLOUD (supports 50+ formats including legacy document types), or DOCLING (local processing, supports PDF, Office docs, images, HTML, CSV)
UNSTRUCTURED_API_KEYAPI key for Unstructured.io service for document parsing (required if ETL_SERVICE=UNSTRUCTURED)
LLAMA_CLOUD_API_KEYAPI key for LlamaCloud service for document parsing (required if ETL_SERVICE=LLAMACLOUD)
CELERY_BROKER_URLRedis connection URL for Celery broker (e.g., redis://localhost:6379/0)
CELERY_RESULT_BACKENDRedis connection URL for Celery result backend (e.g., redis://localhost:6379/0)
SCHEDULE_CHECKER_INTERVAL(Optional) How often to check for scheduled connector tasks. Format: <number><unit> where unit is m (minutes) or h (hours). Examples: 1m, 5m, 1h, 2h (default: 1m)
REGISTRATION_ENABLED(Optional) Enable or disable new user registration (e.g., TRUE or FALSE, default: TRUE)
PAGES_LIMIT(Optional) Maximum pages limit per user for ETL services (default: 999999999 for unlimited in OSS version)

Optional Backend LangSmith Observability:

ENV VARIABLEDESCRIPTION
LANGSMITH_TRACINGEnable LangSmith tracing (e.g., true)
LANGSMITH_ENDPOINTLangSmith API endpoint (e.g., https://api.smith.langchain.com)
LANGSMITH_API_KEYYour LangSmith API key
LANGSMITH_PROJECTLangSmith project name (e.g., surfsense)

Backend Uvicorn Server Configuration:

ENV VARIABLEDESCRIPTIONDEFAULT VALUE
UVICORN_HOSTHost address to bind the server0.0.0.0
UVICORN_PORTPort to run the backend API8000
UVICORN_LOG_LEVELLogging level (e.g., info, debug, warning)info
UVICORN_PROXY_HEADERSEnable/disable proxy headersfalse
UVICORN_FORWARDED_ALLOW_IPSComma-separated list of allowed IPs127.0.0.1
UVICORN_WORKERSNumber of worker processes1
UVICORN_ACCESS_LOGEnable/disable access log (true/false)true
UVICORN_LOOPEvent loop implementationauto
UVICORN_HTTPHTTP protocol implementationauto
UVICORN_WSWebSocket protocol implementationauto
UVICORN_LIFESPANLifespan implementationauto
UVICORN_LOG_CONFIGPath to logging config file or empty string
UVICORN_SERVER_HEADEREnable/disable Server headertrue
UVICORN_DATE_HEADEREnable/disable Date headertrue
UVICORN_LIMIT_CONCURRENCYMax concurrent connections
UVICORN_LIMIT_MAX_REQUESTSMax requests before worker restart
UVICORN_TIMEOUT_KEEP_ALIVEKeep-alive timeout (seconds)5
UVICORN_TIMEOUT_NOTIFYWorker shutdown notification timeout (sec)30
UVICORN_SSL_KEYFILEPath to SSL key file
UVICORN_SSL_CERTFILEPath to SSL certificate file
UVICORN_SSL_KEYFILE_PASSWORDPassword for SSL key file
UVICORN_SSL_VERSIONSSL version
UVICORN_SSL_CERT_REQSSSL certificate requirements
UVICORN_SSL_CA_CERTSPath to CA certificates file
UVICORN_SSL_CIPHERSSSL ciphers
UVICORN_HEADERSComma-separated list of headers
UVICORN_USE_COLORSEnable/disable colored logstrue
UVICORN_UDSUnix domain socket path
UVICORN_FDFile descriptor to bind to
UVICORN_ROOT_PATHRoot path for the application

For more details, see the Uvicorn documentation.

Frontend Environment Variables

Important: Frontend environment variables are now configured in the Docker-Specific Environment Variables section above since the Next.js application runs as a production build in Docker. The following NEXT_PUBLIC_* variables should be set in your root .env file:

  • NEXT_PUBLIC_FASTAPI_BACKEND_URL - URL of the backend service
  • NEXT_PUBLIC_FASTAPI_BACKEND_AUTH_TYPE - Authentication method (LOCAL or GOOGLE)
  • NEXT_PUBLIC_ETL_SERVICE - Document parsing service (should match backend ETL_SERVICE)

These variables are embedded into the application during the Docker build process and affect the frontend's behavior and available features.

  1. Build and Start Containers

    Start the Docker containers:

    Linux/macOS/Windows:

    docker compose up --build

    To run in detached mode (in the background):

    Linux/macOS/Windows:

    docker compose up -d

    Note for Windows users: If you're using older Docker Desktop versions, you might need to use docker compose (with a space) instead of docker compose.

  2. Access the Applications

    Once the containers are running, you can access:

Docker Services Overview

The Docker setup includes several services that work together:

  • Backend: FastAPI application server
  • Frontend: Next.js web application
  • PostgreSQL (db): Database with pgvector extension
  • Redis: Message broker for Celery
  • Celery Worker: Handles background tasks (document processing, indexing, etc.)
  • Celery Beat: Scheduler for periodic tasks (enables scheduled connector indexing)
    • The schedule interval can be configured using the SCHEDULE_CHECKER_INTERVAL environment variable in your backend .env file
    • Default: checks every minute for connectors that need indexing
  • pgAdmin: Database management interface

All services start automatically with docker compose up. The Celery Beat service ensures that periodic indexing functionality works out of the box.

Using pgAdmin

pgAdmin is included in the Docker setup to help manage your PostgreSQL database. To connect:

  1. Open pgAdmin at http://localhost:5050
  2. Login with the credentials from your .env file (default: admin@surfsense.com / surfsense)
  3. Right-click "Servers" > "Create" > "Server"
  4. In the "General" tab, name your connection (e.g., "SurfSense DB")
  5. In the "Connection" tab:
    • Host: db
    • Port: 5432
    • Maintenance database: surfsense
    • Username: postgres (or your custom POSTGRES_USER)
    • Password: postgres (or your custom POSTGRES_PASSWORD)
  6. Click "Save" to connect

Useful Docker Commands

Container Management

  • Stop containers:

    Linux/macOS/Windows:

    docker compose down
  • View logs:

    Linux/macOS/Windows:

    # All services
    docker compose logs -f
    
    # Specific service
    docker compose logs -f backend
    docker compose logs -f frontend
    docker compose logs -f db
  • Restart a specific service:

    Linux/macOS/Windows:

    docker compose restart backend
  • Execute commands in a running container:

    Linux/macOS/Windows:

    # Backend
    docker compose exec backend python -m pytest
    
    # Frontend
    docker compose exec frontend pnpm lint

Troubleshooting

  • Linux/macOS: If you encounter permission errors, you may need to run the docker commands with sudo.
  • Windows: If you see access denied errors, make sure you're running Command Prompt or PowerShell as Administrator.
  • If ports are already in use, modify the port mappings in the docker-compose.yml file.
  • For backend dependency issues, check the Dockerfile in the backend directory.
  • For frontend dependency issues, check the Dockerfile in the frontend directory.
  • Windows-specific: If you encounter line ending issues (CRLF vs LF), configure Git to handle line endings properly with git config --global core.autocrlf true before cloning the repository.

Next Steps

Once your installation is complete, you can start using SurfSense! Navigate to the frontend URL and log in using your Google account.