Local LLM Setup

TUMApply uses Spring AI for AI-powered features like job description generation and translation. To run these locally, you need a local LLM instance.

This guide explains how to set up a local LLM service for development using LM Studio.

Prerequisites

A running server instance with the local profile
Newest version of LM Studio
Sufficient disk space for downloading LLM models (typically 10-40GB per model)

Setting up LM Studio

You can set up and use LM Studio either via the GUI or CLI as follows:

LM Studio GUI

1. Install LM Studio

Download LM Studio using the App Installer from here: https://lmstudio.ai/download

2. Download the Model

Open the LM Studio GUI
Go into the Discover tab in the side navigation bar
Download the gpt-oss-20b model (this will take a few moments depending on your internet connection)
Make sure "Enable Local LLM Service" is active

3. Configure and activate the model

Go into the Developer tab in the side navigation bar
Open the "Select model to load" dropdown menu at the top (middle):
- Enable "Manually choose model load parameters"
- Click on "OpenAI's gpt-oss 20B"
- Give it context 32000
- Click on the button "Load model"

4. Start the LMS server Toggle the "Status" switch to "Running"

The LLM service should now be running on http://localhost:1234.

LM Studio CLI

1. Install LM Studio

Install LM Studio using Homebrew by running the command below. This will will install the LM Studio app (GUI), which comes packaged with the CLI. For more information, see the LM Studio CLI documentation.

brew install --cask lm-studio

2. Download the Model

lms get openai/gpt-oss-20b

3. Configure and activate the model

lms load openai/gpt-oss-20b --context-length 32000

4. Start the LMS server

lms server start

The LLM service should now be running on http://localhost:1234.

Verify the Service Works

Verify http://localhost:1234/api/v0/models lists the models.

The model openai/gpt-oss-20b should appear in the list.

Testing the Integration

Start the TUMApply server
Call the storyWithStream API endpoint, i.e. GET /api/ai/generate?message={message} either on Swagger http://localhost:8080/swagger-ui.html or via clients like Postman

note

Calling the API requires authentication

View the LMS Server logs
1. Via CLI
```
lms log stream
```
2. By opening LM Studio GUI and going into the Developer view
Verify the logs
1. include the prompt sent
2. show progress status
3. show the generated prediction

Lightweight Alternative

If gpt-oss-20b demands too much RAM for your machine, you can use a smaller model instead:

GUI: Search for google/gemma-3-1b in the Discover tab and download it.

CLI:

lms get google/gemma-3-1b
lms load google/gemma-3-1b --context-length 32000

note

The lighter model will produce lower-quality results but is sufficient for local development and testing the AI integration.

Performance Considerations

Model Size: Larger models provide better results but require more RAM and processing power. The gpt-oss-20b model is a good balance for development. Use google/gemma-3-1b if your machine cannot handle it.
Context Length: The --context-length 32000 parameter controls how much text the model can process at once. Adjust based on your hardware capabilities.
GPU Acceleration: LM Studio automatically uses GPU acceleration when available (Metal on macOS, CUDA on Linux/Windows).

🛠 Troubleshooting

If you encounter the following error:

🥲 Failed to load the model
Error loading model. (Exit code: 6)

warning

This usually means the model cannot load because the required runtime engine is missing or broken.

✅ Fix: Repair the MLX Runtime (macOS Apple Silicon)

Open LM Studio
In the left menu, click LM Runtimes
Under Runtime Extension Packs, find:

LM Studio MLX

(Engine used for MLX / Apple Silicon models)
If LM Studio detects an issue, a Fix button will appear
Click Fix to reinstall, update, or repair the engine

After the repair completes, try loading your model again.

Prerequisites​

Setting up LM Studio​

LM Studio GUI​

LM Studio CLI​

Verify the Service Works​

Testing the Integration​

Lightweight Alternative​

Performance Considerations​

🛠 Troubleshooting​

Exit Code 6 When Loading a Model​

✅ Fix: Repair the MLX Runtime (macOS Apple Silicon)​