Skip to main content

AtlasML Deployment Guide

This guide covers deployment workflows, CI/CD integration, and production deployment strategies for AtlasML.


Deployment Overview

AtlasML uses a containerized deployment model with GitHub Actions for CI/CD:


GitHub Actions Workflows

1. Build and Push Docker Image

File: .github/workflows/atlas_build-and-push-docker.yml

This workflow automatically builds and pushes Docker images when code changes.

Trigger Conditions

on:
push:
paths:
- 'atlas/**'
- '.github/workflows/atlas_build-and-push-docker.yml'
release:
types:
- created

Triggers on:

  • Any push to atlas/ directory
  • Workflow file changes
  • New GitHub release

What It Does

  1. Builds Docker image using /atlas/AtlasMl/Dockerfile
  2. Tags image with:
    • Branch name (e.g., main, feature-branch)
    • PR number (e.g., pr-123)
    • Release version (e.g., v1.2.0)
  3. Pushes to GitHub Container Registry (ghcr.io)

Usage

Automatic (on push):

# Make changes
git add atlas/AtlasMl/
git commit -m "Update AtlasML"
git push

# GitHub Actions automatically builds and pushes image
# Result: ghcr.io/ls1intum/edutelligence/atlasml:main

Manual trigger:

  1. Go to GitHub Actions tab
  2. Select "AtlasML - Build Docker Images"
  3. Click "Run workflow"
  4. Select branch
  5. Click "Run workflow"

2. Deploy to Test Environment

File: .github/workflows/atlas_deploy-test.yml

Manual deployment workflow for test/staging environments.

Workflow Configuration

name: AtlasML - Deploy to Test 1

on:
workflow_dispatch:
inputs:
image-tag:
type: string
description: 'Image tag to deploy'
required: true
deploy-atlasml:
type: boolean
default: true
description: (Re-)deploys AtlasML

Steps

Step 1: Provision Environment

Writes .env file to remote server:

- name: Write .env to remote host
uses: appleboy/ssh-action@v1.0.3
with:
host: ${{ secrets.SSH_HOST }}
username: ${{ secrets.SSH_USERNAME }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
script: |
sudo mkdir -p /opt/atlasml
cat << EOF | sudo tee /opt/atlasml/.env > /dev/null
PYTHONPATH='${{ secrets.PYTHONPATH }}'
WEAVIATE_HOST='${{ secrets.WEAVIATE_HOST }}'
WEAVIATE_PORT='${{ secrets.WEAVIATE_PORT }}'
ATLAS_API_KEYS='${{ secrets.ATLAS_API_KEYS }}'
OPENAI_API_KEY='${{ secrets.OPENAI_API_KEY }}'
OPENAI_API_URL='${{ secrets.OPENAI_API_URL }}'
ENV='${{ secrets.ENV }}'
EOF
sudo chmod 600 /opt/atlasml/.env

Step 2: Deploy AtlasML

Uses reusable workflow to deploy:

- name: Deploy AtlasML
uses: ls1intum/.github/.github/workflows/deploy-docker-compose.yml@main
with:
environment: 'AtlasML - Test 1'
docker-compose-file: './atlas/docker-compose.prod.yml'
main-image-name: ls1intum/edutelligence/atlasml
image-tag: ${{ inputs.image-tag }}
deployment-base-path: '/opt/atlasml'

How to Deploy

  1. Go to GitHub Actions

    • Navigate to repository → Actions tab
  2. Select Workflow

    • Click "AtlasML - Deploy to Test 1"
  3. Run Workflow

    • Click "Run workflow" button
    • Select branch (usually main)
    • Enter image tag:
      • Branch name: main
      • PR number: pr-123
      • Version: v1.2.0
    • Click "Run workflow"
  4. Monitor Progress

    • Watch workflow execution in real-time
    • Check logs for any errors

Example:

Image tag: main
Deploy AtlasML: ✓ (checked)

Deployment Strategies

Strategy 1: Continuous Deployment (Dev/Test)

For development and test environments, deploy automatically on every push.

# Add to .github/workflows/atlas_deploy-test.yml
on:
push:
branches:
- develop
paths:
- 'atlas/**'

Flow:

Code Push → Auto Build → Auto Deploy to Test

Strategy 2: Manual Deployment (Staging)

For staging, require manual approval.

jobs:
deploy:
environment:
name: staging
# Requires manual approval

Flow:

Code Push → Auto Build → Manual Approval → Deploy to Staging

Strategy 3: Release-Based Deployment (Production)

For production, only deploy tagged releases.

on:
release:
types:
- published

Flow:

Create Release → Auto Build → Manual Deploy with Version Tag → Production

Production Deployment Process

Step-by-Step Production Deployment

1. Prepare Release

# Ensure main branch is stable
git checkout main
git pull origin main

# Run tests locally
cd atlas/AtlasMl
poetry run pytest

# Check for linting issues
poetry run ruff check .
poetry run black --check .

2. Create GitHub Release

# Create and push tag
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0

Or via GitHub UI:

  1. Go to Releases → Draft a new release
  2. Choose tag: Create new tag v1.2.0
  3. Title: AtlasML v1.2.0
  4. Description: Release notes
  5. Click "Publish release"

3. Wait for Image Build

GitHub Actions automatically:

  • Builds Docker image
  • Tags as v1.2.0
  • Pushes to ghcr.io/ls1intum/edutelligence/atlasml:v1.2.0

Monitor at: Actions → "AtlasML - Build Docker Images"

4. Deploy to Production

  1. Go to Actions → "AtlasML - Deploy to Production"
  2. Click "Run workflow"
  3. Enter image tag: v1.2.0
  4. Click "Run workflow"
  5. Monitor deployment logs

5. Verify Deployment

# SSH to production server
ssh production-server

# Check container status
docker ps | grep atlasml

# Check health
curl http://localhost/api/v1/health

# Check logs
docker logs atlasml --tail 50

# Test endpoint
curl -X POST http://localhost/api/v1/competency/suggest \
-H "Authorization: your-api-key" \
-H "Content-Type: application/json" \
-d '{"description":"Python programming","course_id":1}'

6. Monitor Post-Deployment

  • Check Sentry for errors (if configured)
  • Monitor logs for first hour
  • Verify Artemis integration working
  • Check response times

Rollback Procedure

If deployment fails or issues are discovered:

Quick Rollback

# SSH to server
ssh production-server

cd /opt/atlasml

# Set previous version in .env
echo "IMAGE_TAG=v1.1.0" > .env.new
cat .env >> .env.new
mv .env.new .env

# Pull and restart
docker-compose -f docker-compose.prod.yml pull
docker-compose -f docker-compose.prod.yml up -d

# Verify
docker logs atlasml
curl http://localhost/api/v1/health

Rollback via GitHub Actions

  1. Go to Actions → "AtlasML - Deploy to Production"
  2. Click "Run workflow"
  3. Enter previous image tag: v1.1.0
  4. Click "Run workflow"

Zero-Downtime Deployment

For high-availability setups:

Using Load Balancer

Process:

  1. Deploy new version to Instance 2
  2. Wait for health check to pass
  3. Load balancer routes traffic to Instance 2
  4. Update Instance 1
  5. Both instances now running new version

Using Docker Compose

# docker-compose.prod.yml
services:
atlasml:
image: ghcr.io/ls1intum/edutelligence/atlasml:${IMAGE_TAG}
deploy:
replicas: 2
update_config:
parallelism: 1
delay: 10s
order: start-first

Deploy:

docker-compose -f docker-compose.prod.yml up -d --no-deps --build atlasml

This updates containers one at a time, ensuring at least one is always running.


Environment-Specific Configurations

Development

# .env.dev
ENV=development
ATLAS_API_KEYS=dev-key
WEAVIATE_HOST=localhost
OPENAI_API_KEY=dev-key

Staging

# .env.staging
ENV=staging
ATLAS_API_KEYS=staging-key-1,staging-key-2
WEAVIATE_HOST=weaviate-staging.internal
OPENAI_API_KEY=staging-key
SENTRY_DSN=https://...@sentry.../staging

Production

# .env.production
ENV=production
ATLAS_API_KEYS=prod-key-1,prod-key-2,prod-key-3
WEAVIATE_HOST=weaviate.internal
OPENAI_API_KEY=prod-key
SENTRY_DSN=https://...@sentry.../production

Automated Deployment with CI/CD

Full CI/CD Pipeline

# .github/workflows/atlas_ci_cd.yml
name: AtlasML CI/CD

on:
push:
branches: [main, develop]
pull_request:
branches: [main]

jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.13'
- name: Install dependencies
run: |
cd atlas/AtlasMl
pip install poetry
poetry install
- name: Run tests
run: |
cd atlas/AtlasMl
poetry run pytest
- name: Lint
run: |
cd atlas/AtlasMl
poetry run ruff check .

build:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build and push Docker image
# Build image

deploy-test:
needs: build
if: github.ref == 'refs/heads/develop'
runs-on: ubuntu-latest
steps:
- name: Deploy to test
# Deploy to test environment

deploy-prod:
needs: build
if: github.event_name == 'release'
runs-on: ubuntu-latest
environment:
name: production
steps:
- name: Deploy to production
# Deploy to production

Deployment Checklist

Pre-Deployment

  • Tests passing locally
  • Code reviewed and approved
  • Documentation updated
  • CHANGELOG updated
  • Version bumped (if applicable)
  • Database migrations prepared (if needed)
  • Backup of current production data

During Deployment

  • Monitor GitHub Actions logs
  • Watch container startup logs
  • Verify health check passing
  • Check Sentry for errors
  • Test critical endpoints

Post-Deployment

  • Verify Artemis integration
  • Check response times
  • Monitor error rates
  • Update deployment docs
  • Notify team of successful deployment

Deployment Troubleshooting

Issue: Image Pull Fails

Error: Error response from daemon: pull access denied

Solution:

# Login to GHCR
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

# Or ensure image is public

Issue: Container Starts But Unhealthy

Check:

# View health check logs
docker inspect atlasml | grep -A 10 Health

# Test endpoint manually
docker exec atlasml curl http://localhost:8000/api/v1/health

# Check application logs
docker logs atlasml

Common causes:

  • Weaviate connection failed
  • Missing environment variables
  • OpenAI API key invalid

Issue: Old Version Still Running

Solution:

# Force pull new image
docker-compose -f docker-compose.prod.yml pull

# Remove old container
docker-compose -f docker-compose.prod.yml down

# Start with new image
docker-compose -f docker-compose.prod.yml up -d

# Verify version (check logs for startup message)
docker logs atlasml | grep "Starting"

Monitoring Deployments

Deployment Metrics

Track these metrics during and after deployment:

  • Response Time: Should remain stable
  • Error Rate: Should not increase
  • Memory Usage: Check for memory leaks
  • CPU Usage: Should be within normal range

Using Sentry

If Sentry is configured, monitor:

# Check Sentry dashboard at
https://sentry.io/organizations/your-org/issues/

# Filter by:
- Environment: production
- Time: Last hour
- Release: v1.2.0

Using Docker Stats

# Monitor resource usage
docker stats atlasml

# Output:
# CONTAINER CPU % MEM USAGE/LIMIT MEM %
# atlasml 2.5% 256MB/2GB 12.8%

Next Steps


Resources