AtlasML Deployment Guide
This guide covers deployment workflows, CI/CD integration, and production deployment strategies for AtlasML.
Deployment Overview
AtlasML uses a containerized deployment model with GitHub Actions for CI/CD:
GitHub Actions Workflows
1. Build and Push Docker Image
File: .github/workflows/atlas_build-and-push-docker.yml
This workflow automatically builds and pushes Docker images when code changes.
Trigger Conditions
on:
push:
paths:
- 'atlas/**'
- '.github/workflows/atlas_build-and-push-docker.yml'
release:
types:
- created
Triggers on:
- Any push to
atlas/directory - Workflow file changes
- New GitHub release
What It Does
- Builds Docker image using
/atlas/AtlasMl/Dockerfile - Tags image with:
- Branch name (e.g.,
main,feature-branch) - PR number (e.g.,
pr-123) - Release version (e.g.,
v1.2.0)
- Branch name (e.g.,
- Pushes to GitHub Container Registry (
ghcr.io)
Usage
Automatic (on push):
# Make changes
git add atlas/AtlasMl/
git commit -m "Update AtlasML"
git push
# GitHub Actions automatically builds and pushes image
# Result: ghcr.io/ls1intum/edutelligence/atlasml:main
Manual trigger:
- Go to GitHub Actions tab
- Select "AtlasML - Build Docker Images"
- Click "Run workflow"
- Select branch
- Click "Run workflow"
2. Deploy to Test Environment
File: .github/workflows/atlas_deploy-test.yml
Manual deployment workflow for test/staging environments.
Workflow Configuration
name: AtlasML - Deploy to Test 1
on:
workflow_dispatch:
inputs:
image-tag:
type: string
description: 'Image tag to deploy'
required: true
deploy-atlasml:
type: boolean
default: true
description: (Re-)deploys AtlasML
Steps
Step 1: Provision Environment
Writes .env file to remote server:
- name: Write .env to remote host
uses: appleboy/ssh-action@v1.0.3
with:
host: ${{ secrets.SSH_HOST }}
username: ${{ secrets.SSH_USERNAME }}
key: ${{ secrets.SSH_PRIVATE_KEY }}
script: |
sudo mkdir -p /opt/atlasml
cat << EOF | sudo tee /opt/atlasml/.env > /dev/null
PYTHONPATH='${{ secrets.PYTHONPATH }}'
WEAVIATE_HOST='${{ secrets.WEAVIATE_HOST }}'
WEAVIATE_PORT='${{ secrets.WEAVIATE_PORT }}'
ATLAS_API_KEYS='${{ secrets.ATLAS_API_KEYS }}'
OPENAI_API_KEY='${{ secrets.OPENAI_API_KEY }}'
OPENAI_API_URL='${{ secrets.OPENAI_API_URL }}'
ENV='${{ secrets.ENV }}'
EOF
sudo chmod 600 /opt/atlasml/.env
Step 2: Deploy AtlasML
Uses reusable workflow to deploy:
- name: Deploy AtlasML
uses: ls1intum/.github/.github/workflows/deploy-docker-compose.yml@main
with:
environment: 'AtlasML - Test 1'
docker-compose-file: './atlas/docker-compose.prod.yml'
main-image-name: ls1intum/edutelligence/atlasml
image-tag: ${{ inputs.image-tag }}
deployment-base-path: '/opt/atlasml'
How to Deploy
-
Go to GitHub Actions
- Navigate to repository → Actions tab
-
Select Workflow
- Click "AtlasML - Deploy to Test 1"
-
Run Workflow
- Click "Run workflow" button
- Select branch (usually
main) - Enter image tag:
- Branch name:
main - PR number:
pr-123 - Version:
v1.2.0
- Branch name:
- Click "Run workflow"
-
Monitor Progress
- Watch workflow execution in real-time
- Check logs for any errors
Example:
Image tag: main
Deploy AtlasML: ✓ (checked)
Deployment Strategies
Strategy 1: Continuous Deployment (Dev/Test)
For development and test environments, deploy automatically on every push.
# Add to .github/workflows/atlas_deploy-test.yml
on:
push:
branches:
- develop
paths:
- 'atlas/**'
Flow:
Code Push → Auto Build → Auto Deploy to Test
Strategy 2: Manual Deployment (Staging)
For staging, require manual approval.
jobs:
deploy:
environment:
name: staging
# Requires manual approval
Flow:
Code Push → Auto Build → Manual Approval → Deploy to Staging
Strategy 3: Release-Based Deployment (Production)
For production, only deploy tagged releases.
on:
release:
types:
- published
Flow:
Create Release → Auto Build → Manual Deploy with Version Tag → Production
Production Deployment Process
Step-by-Step Production Deployment
1. Prepare Release
# Ensure main branch is stable
git checkout main
git pull origin main
# Run tests locally
cd atlas/AtlasMl
poetry run pytest
# Check for linting issues
poetry run ruff check .
poetry run black --check .
2. Create GitHub Release
# Create and push tag
git tag -a v1.2.0 -m "Release version 1.2.0"
git push origin v1.2.0
Or via GitHub UI:
- Go to Releases → Draft a new release
- Choose tag: Create new tag
v1.2.0 - Title:
AtlasML v1.2.0 - Description: Release notes
- Click "Publish release"
3. Wait for Image Build
GitHub Actions automatically:
- Builds Docker image
- Tags as
v1.2.0 - Pushes to
ghcr.io/ls1intum/edutelligence/atlasml:v1.2.0
Monitor at: Actions → "AtlasML - Build Docker Images"
4. Deploy to Production
- Go to Actions → "AtlasML - Deploy to Production"
- Click "Run workflow"
- Enter image tag:
v1.2.0 - Click "Run workflow"
- Monitor deployment logs
5. Verify Deployment
# SSH to production server
ssh production-server
# Check container status
docker ps | grep atlasml
# Check health
curl http://localhost/api/v1/health
# Check logs
docker logs atlasml --tail 50
# Test endpoint
curl -X POST http://localhost/api/v1/competency/suggest \
-H "Authorization: your-api-key" \
-H "Content-Type: application/json" \
-d '{"description":"Python programming","course_id":1}'
6. Monitor Post-Deployment
- Check Sentry for errors (if configured)
- Monitor logs for first hour
- Verify Artemis integration working
- Check response times
Rollback Procedure
If deployment fails or issues are discovered:
Quick Rollback
# SSH to server
ssh production-server
cd /opt/atlasml
# Set previous version in .env
echo "IMAGE_TAG=v1.1.0" > .env.new
cat .env >> .env.new
mv .env.new .env
# Pull and restart
docker-compose -f docker-compose.prod.yml pull
docker-compose -f docker-compose.prod.yml up -d
# Verify
docker logs atlasml
curl http://localhost/api/v1/health
Rollback via GitHub Actions
- Go to Actions → "AtlasML - Deploy to Production"
- Click "Run workflow"
- Enter previous image tag:
v1.1.0 - Click "Run workflow"
Zero-Downtime Deployment
For high-availability setups:
Using Load Balancer
Process:
- Deploy new version to Instance 2
- Wait for health check to pass
- Load balancer routes traffic to Instance 2
- Update Instance 1
- Both instances now running new version
Using Docker Compose
# docker-compose.prod.yml
services:
atlasml:
image: ghcr.io/ls1intum/edutelligence/atlasml:${IMAGE_TAG}
deploy:
replicas: 2
update_config:
parallelism: 1
delay: 10s
order: start-first
Deploy:
docker-compose -f docker-compose.prod.yml up -d --no-deps --build atlasml
This updates containers one at a time, ensuring at least one is always running.
Environment-Specific Configurations
Development
# .env.dev
ENV=development
ATLAS_API_KEYS=dev-key
WEAVIATE_HOST=localhost
OPENAI_API_KEY=dev-key
Staging
# .env.staging
ENV=staging
ATLAS_API_KEYS=staging-key-1,staging-key-2
WEAVIATE_HOST=weaviate-staging.internal
OPENAI_API_KEY=staging-key
SENTRY_DSN=https://...@sentry.../staging
Production
# .env.production
ENV=production
ATLAS_API_KEYS=prod-key-1,prod-key-2,prod-key-3
WEAVIATE_HOST=weaviate.internal
OPENAI_API_KEY=prod-key
SENTRY_DSN=https://...@sentry.../production
Automated Deployment with CI/CD
Full CI/CD Pipeline
# .github/workflows/atlas_ci_cd.yml
name: AtlasML CI/CD
on:
push:
branches: [main, develop]
pull_request:
branches: [main]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.13'
- name: Install dependencies
run: |
cd atlas/AtlasMl
pip install poetry
poetry install
- name: Run tests
run: |
cd atlas/AtlasMl
poetry run pytest
- name: Lint
run: |
cd atlas/AtlasMl
poetry run ruff check .
build:
needs: test
if: github.ref == 'refs/heads/main'
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Build and push Docker image
# Build image
deploy-test:
needs: build
if: github.ref == 'refs/heads/develop'
runs-on: ubuntu-latest
steps:
- name: Deploy to test
# Deploy to test environment
deploy-prod:
needs: build
if: github.event_name == 'release'
runs-on: ubuntu-latest
environment:
name: production
steps:
- name: Deploy to production
# Deploy to production
Deployment Checklist
Pre-Deployment
- Tests passing locally
- Code reviewed and approved
- Documentation updated
- CHANGELOG updated
- Version bumped (if applicable)
- Database migrations prepared (if needed)
- Backup of current production data
During Deployment
- Monitor GitHub Actions logs
- Watch container startup logs
- Verify health check passing
- Check Sentry for errors
- Test critical endpoints
Post-Deployment
- Verify Artemis integration
- Check response times
- Monitor error rates
- Update deployment docs
- Notify team of successful deployment
Deployment Troubleshooting
Issue: Image Pull Fails
Error: Error response from daemon: pull access denied
Solution:
# Login to GHCR
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
# Or ensure image is public
Issue: Container Starts But Unhealthy
Check:
# View health check logs
docker inspect atlasml | grep -A 10 Health
# Test endpoint manually
docker exec atlasml curl http://localhost:8000/api/v1/health
# Check application logs
docker logs atlasml
Common causes:
- Weaviate connection failed
- Missing environment variables
- OpenAI API key invalid
Issue: Old Version Still Running
Solution:
# Force pull new image
docker-compose -f docker-compose.prod.yml pull
# Remove old container
docker-compose -f docker-compose.prod.yml down
# Start with new image
docker-compose -f docker-compose.prod.yml up -d
# Verify version (check logs for startup message)
docker logs atlasml | grep "Starting"
Monitoring Deployments
Deployment Metrics
Track these metrics during and after deployment:
- Response Time: Should remain stable
- Error Rate: Should not increase
- Memory Usage: Check for memory leaks
- CPU Usage: Should be within normal range
Using Sentry
If Sentry is configured, monitor:
# Check Sentry dashboard at
https://sentry.io/organizations/your-org/issues/
# Filter by:
- Environment: production
- Time: Last hour
- Release: v1.2.0
Using Docker Stats
# Monitor resource usage
docker stats atlasml
# Output:
# CONTAINER CPU % MEM USAGE/LIMIT MEM %
# atlasml 2.5% 256MB/2GB 12.8%
Next Steps
- Configuration: Manage environment variables and secrets
- Monitoring: Set up comprehensive monitoring
- Troubleshooting: Resolve production issues
Resources
- GitHub Actions Documentation: https://docs.github.com/en/actions
- Docker Compose: https://docs.docker.com/compose/
- GHCR: https://docs.github.com/en/packages
- SSH Actions: https://github.com/appleboy/ssh-action