Production VM Guide

We maintain two servers: one for production and one for staging:

  • Production server: https://helios.aet.cit.tum.de/

  • Staging server: https://helios-staging.aet.cit.tum.de/

Only required package is Docker. The production server runs the latest stable version of Helios (main branch), while the staging server runs the latest development version (staging branch).

Both environments use the same Compose file:

  • compose.prod.yml is used for both production and staging deployments.

Deployment Strategy

  1. Staging Builds Whenever a push is made to the staging branch, a GitHub Actions workflow triggers a build and deployment to the staging environment.

  2. Production Builds When a release is created in GitHub’s “Releases” section, a build starts for production. - After building the new version, the workflow pauses and waits for approval from a Helios team member before proceeding with deployment.

Before deploying to production, you need to merge the staging branch into the main (production) branch:

git checkout main
git merge --ff-only staging
git push origin main

Deployment Directory

All deployments (both staging and production) happen under:

/opt/helios/

On each server, you will find these files in /opt/helios:

  • compose.prod.yml

  • .env

  • heliosapp.converted_key_pkcs8.pem

  • helios-realm.json

File Descriptions

  • compose.prod.yml Docker Compose file used to build and run all Helios services in production or staging mode.

  • .env Environment variable file.

    • Some values are secret (e.g., service credentials).

    • Some values are configuration settings.

    • This file is overwritten on each deployment by the GitHub Actions workflow.

    • If you add a new environment variable, update the workflow to include it.

    • Secrets and variables are stored in GitHub Environments. In the Helios repository settings, there are two environments—“staging” and “production”—each with 20+ variables already configured.

  • heliosapp.converted_key_pkcs8.pem The PEM file for the GitHub App.

    • Used as credentials when making API requests to GitHub.

    • This file is generated by following the Generate the Private Key step in the Creating a GitHub App.

  • helios-realm.json An exported Keycloak realm configuration.

    • Instead of wiping the database, we export/import Keycloak settings via this file.

    • It contains client IDs, client secrets, login page settings, token exchange rules, etc.

Environment Variables

The .env file in /opt/helios contains all environment variables for production/staging deployments. GitHub Actions fills this file during deployment.

Below is the complete list of variables, their purpose, and where they are used.

Core Infrastructure

  • ENVIRONMENT Environment identifier (prod, staging, dev). Status: Unused in compose (informational only).

  • POSTGRES_DB Name of the PostgreSQL database for the application. Used by: postgres (also Keycloak indirectly via DB creation).

  • POSTGRES_PASSWORD PostgreSQL password. Used by: postgres, keycloak.

  • POSTGRES_USER PostgreSQL username. Used by: postgres, keycloak.

  • SPRING_PROFILES_ACTIVE Spring Boot active profile (prod for production). Used by: application-server, notification.

  • DATASOURCE_URL JDBC connection string to PostgreSQL. Used by: application-server.

  • DATASOURCE_USERNAME DB username for application server JDBC. Should match POSTGRES_USER. Used by: application-server.

  • DATASOURCE_PASSWORD DB password for application server JDBC. Should match POSTGRES_PASSWORD. Used by: application-server.

NATS Messaging

  • NATS_SERVER Host:port of NATS server. Used by: application-server, notification.

  • NATS_AUTH_TOKEN Token for authenticating with NATS. Used by: nats-server, webhook-listener, application-server, notification.

  • NATS_DURABLE_CONSUMER_NAME Durable consumer name for message replay. Used by: application-server (value for notification is written hardcoded in compose.prod.yaml as notification-consumer).

  • NATS_CONSUMER_INACTIVE_THRESHOLD_MINUTES Consumer inactivity threshold. Used by: application-server, notification.

  • NATS_CONSUMER_ACK_WAIT_SECONDS Ack wait time for durable consumers. Used by: application-server, notification.

GitHub App / Repository Sync

  • WEBHOOK_SECRET HMAC secret for GitHub webhook validation. Used by: webhook-listener.

  • REPOSITORY_NAME Comma-separated list of repositories to sync. This value can be empty since all the repositories which install the GitHub App will be synced automatically. Used by: application-server.

  • ORGANIZATION_NAME GitHub organization name for auto-detection of installation ID. Used by: application-server. Note: Set this value and leave GITHUB_INSTALLATION_ID empty for auto-detection of the GitHub App installation ID.

  • GITHUB_AUTH_TOKEN GitHub Personal Access Token (if not using GitHub App, we are right now using the GitHub App, so leave this empty). Used by: application-server.

  • RUN_ON_STARTUP_COOLDOWN Minimum minutes since last sync to run sync on startup. Used by: application-server.

  • SENTRY_DSN Sentry DSN for error reporting. Used by: application-server.

  • DATA_SYNC_RUN_ON_STARTUP Whether to run repository sync on startup. Deploying a new version takes couple of minutes, setting this value to false``is safe since syncing takes quite some time and we do not want to run it on every deployment. *Used by:* ``application-server.

  • GITHUB_APP_NAME GitHub App URL-safe name. Used by: application-server.

  • GITHUB_APP_ID Numeric ID of GitHub App. Used by: application-server.

  • GITHUB_CLIENT_ID OAuth Client ID for GitHub App. Used by: application-server.

  • GITHUB_INSTALLATION_ID GitHub App installation ID. Empty if auto-detecting. Used by: application-server.

  • GITHUB_PRIVATE_KEY_PATH Path to PKCS#8-formatted GitHub App private key. Used by: application-server.

Authentication / Keycloak

  • KC_BOOTSTRAP_ADMIN_USERNAME Initial Keycloak admin username. Used by: keycloak.

  • KC_BOOTSTRAP_ADMIN_PASSWORD Initial Keycloak admin password. Used by: keycloak.

  • KC_HOSTNAME Public hostname for Keycloak. Used by: keycloak.

  • KC_HTTP_ENABLED Whether to enable HTTP in Keycloak. Used by: keycloak.

  • OAUTH_ISSUER_URL Keycloak realm issuer URL. Used by: application-server.

  • HELIOS_TOKEN_EXCHANGE_CLIENT Keycloak client ID for token exchange. Used by: application-server.

  • HELIOS_TOKEN_EXCHANGE_SECRET Keycloak client secret for token exchange. Used by: application-server.

Notification Service

  • MAIL_HOST SMTP host for sending emails. Used by: notification.

  • MAIL_PORT SMTP port. Used by: notification.

  • EMAIL_ENABLED Enable/disable email sending. Used by: notification.

  • EMAIL_FROM Sender email address. Used by: notification.

Image Tags (Deployment Control)

  • CLIENT_IMAGE_TAG Docker image tag for client. Used by: client.

  • APPLICATION_SERVER_IMAGE_TAG Docker image tag for application server. Used by: application-server.

  • NOTIFICATION_SERVER_IMAGE_TAG Docker image tag for notification service. Used by: notification.

  • WEBHOOK_LISTENER_IMAGE_TAG Docker image tag for webhook listener. Used by: webhook-listener.

  • KEYCLOAK_IMAGE_TAG Docker image tag for Keycloak. Used by: keycloak.

Other Application Server Settings

  • CLEANUP_WORKFLOW_RUN_DRY_RUN If true, cleanup workflow runs in dry-run mode. Used by: application-server.

  • HELIOS_ENVIRONMENT_NAME Used for push-based status updates to Helios. Used by: application-server.

  • HELIOS_PROD_SECRET_KEY Used for push-based status updates to Helios. Used by: application-server.

  • HELIOS_STAGING_SECRET_KEY Used for push-based status updates to Helios. Used by: application-server.

Runtime Containers

A typical production (or staging) environment runs multiple Docker containers under the Helios Compose network. For example, on the staging server

ge89paj@helios-staging:/opt/helios$ docker ps
CONTAINER ID   IMAGE                                                COMMAND                  CREATED          STATUS                    PORTS                                                                                            NAMES
26207bf832fe   ghcr.io/ls1intum/helios/application-server:staging   "java -javaagent:/ap…"   15 minutes ago   Up 15 minutes             0.0.0.0:8080->8080/tcp, :::8080->8080/tcp                                                        helios-application-server-1
a20e1d75dbc0   ghcr.io/ls1intum/helios/keycloak:staging             "/opt/keycloak/bin/k…"   15 minutes ago   Up 15 minutes             8080/tcp, 8443/tcp, 9000/tcp, 0.0.0.0:8081->8081/tcp, :::8081->8081/tcp                          keycloak
b080f449acb6   ghcr.io/ls1intum/helios/notification:staging         "java -javaagent:/ap…"   15 minutes ago   Up 15 minutes             8080/tcp                                                                                         helios-notification-1
d339928ea5c6   ghcr.io/ls1intum/helios/webhook-listener:staging     "uvicorn app.main:ap…"   15 minutes ago   Up 15 minutes             0.0.0.0:4200->4200/tcp, :::4200->4200/tcp                                                        helios-webhook-listener-1
43bba36b647e   ghcr.io/ls1intum/helios/client:staging               "/docker-entrypoint.…"   15 minutes ago   Up 15 minutes             0.0.0.0:90->80/tcp, :::90->80/tcp                                                                helios-client-1
af2a9ccee144   postgres:16                                          "docker-entrypoint.s…"   15 minutes ago   Up 15 minutes             0.0.0.0:5432->5432/tcp, :::5432->5432/tcp                                                        helios-postgres-1
cf206e171655   nats:2.10.26-alpine                                  "docker-entrypoint.s…"   15 minutes ago   Up 15 minutes (healthy)   0.0.0.0:4222->4222/tcp, :::4222->4222/tcp, 0.0.0.0:8222->8222/tcp, :::8222->8222/tcp, 6222/tcp   helios-nats-server-1
1eda53002e85   nginx:latest                                         "/docker-entrypoint.…"   4 weeks ago      Up 15 minutes             0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp                         nginx
a15495daeb2e   gcr.io/cadvisor/cadvisor                             "/usr/bin/cadvisor -…"   2 months ago     Up 2 months (healthy)     0.0.0.0:9111->8080/tcp                                                                           cadvisor

All containers except nginx and cAdvisor are launched by Compose. The Compose file handles:

  • Application Server

  • Keycloak

  • Notification Service

  • Webhook Listener

  • Client (frontend)

  • PostgreSQL

  • NATS Server

Additional Containers

  • cAdvisor - Installed by the ITG admins to feed metrics into Grafana dashboards. - Runs independently; not managed by the Helios Compose file.

  • nginx - Added manually to the same Docker network as the Compose stack. - Created with:

    docker run -d \
      --name nginx \
      --restart unless-stopped \
      -p 80:80 -p 443:443 \
      -v /etc/nginx/conf/nginx.conf:/etc/nginx/nginx.conf:ro \
      -v /var/lib/rbg-cert:/var/lib/rbg-cert:ro \
      --net helios-network \
      nginx:latest
    

    The nginx configuration files for each environment are in the repository root as: nginx.prod.conf and nginx.staging.conf. There is no automation to copy these files to the server; you must manually copy the appropriate file to /etc/nginx/conf/nginx.conf on the server.

    • SSL/TLS Certificates:

      We are using SSL certificates provided by TUM, which are officially issued and valid for 1 year.

      The certificate files are symlinked to auto-generated paths within /var/lib/rbg-cert, and nginx is configured to use them directly. Because these are symlinks, nginx only needs to be restarted once a year—when the certificates are renewed—to pick up the updated files.

      Production

      ssl_certificate     /var/lib/rbg-cert/live/host:f:asevm84.cit.tum.de.fullchain.pem;
      ssl_certificate_key /var/lib/rbg-cert/live/host:f:asevm84.cit.tum.de.privkey.pem;
      

      Staging

      ssl_certificate     /var/lib/rbg-cert/live/host:f:asevm90.cit.tum.de.fullchain.pem;
      ssl_certificate_key /var/lib/rbg-cert/live/host:f:asevm90.cit.tum.de.privkey.pem;
      

      After each deployment from GitHub, the deployment script runs

      docker restart nginx
      

      This ensures that nginx’s internal routing rules and certificate references are reloaded and point to the newly created container IPs.

    Warning

    The renewal process of certificates is handled by the TUM ITG team. Every year, we need to restart the nginx container to apply the new certificates.

    docker restart nginx
    

Helios Network

ge89paj@helios-staging:/opt/helios$ docker network ls
NETWORK ID     NAME             DRIVER    SCOPE
bc2e43954dc6   bridge           bridge    local
c67bf6ea6aa7   helios-network   bridge    local
5180e745d32e   host             host      local
40c45d8673a4   none             null      local

The Compose file defines a custom network named helios-network (see the end of compose.prod.yml). All Helios containers (application server, Keycloak, notification service, webhook listener, client, PostgreSQL, NATS) connect to this network. The manually‐run nginx container must also join helios-network so that it can route traffic to and from these services.

Docker Volumes

ge89paj@helios-staging:/opt/helios$ docker volume ls
DRIVER    VOLUME NAME
local     helios_db-data
local     helios_nats-data
  • helios_db-data: Stores the PostgreSQL database data. Warning: Do not remove this volume, as there is currently no backup of the database.

  • helios_nats-data: Stores NATS JetStream data for event persistence. If you need to reclaim disk space, you can safely remove helios_nats-data; doing so will clear all persisted NATS state, but won’t impact the PostgreSQL data.