Architecture
This page explains how Iris works at a conceptual level. For implementation details, see the Developer Guide.
High-Level Overview
Iris sits between Artemis and one or more Large Language Models (LLMs). The overall flow is:
- Artemis sends a request to Iris via REST API (e.g., "the student asked a question in the exercise chat").
- An Iris pipeline processes the request — selecting the right strategy, gathering context, and orchestrating LLM calls.
- The LLM generates a response, potentially calling tools to gather additional information.
- Status callbacks return results to Artemis incrementally, so students see a streaming response.
Pipeline System
Iris uses a pipeline architecture where each type of interaction has its own pipeline. Examples include:
- Course Chat Pipeline — general course-related questions
- Exercise Chat Pipeline — programming and text exercise support
- Lecture Chat Pipeline — questions about lecture content
- Competency Generation Pipeline — generating learning objectives
- Ingestion Pipeline — processing lecture slides and transcripts for RAG
Each pipeline defines which LLM roles it needs (e.g., a primary chat model, a tool-calling model, a reranking model) and can declare dependencies on other pipelines. This makes the system modular: you can swap models per role, and pipelines can reuse shared logic without tight coupling.
Agent Execution Flow
For chat pipelines, Iris uses an agent-based execution model. Rather than making a single LLM call, the agent works iteratively:
- Receive the conversation history and student context.
- Decide whether additional information is needed (e.g., current code state, build errors, relevant lecture content).
- Call tools to retrieve that information — code execution analysis, RAG retrieval, Artemis API queries, etc.
- Repeat steps 2–3 until the agent has enough context.
- Generate the final response with citations and appropriate scaffolding level.
This tool-calling loop allows Iris to adapt its behavior dynamically. A simple greeting might require zero tool calls, while a complex debugging question might involve fetching code, running retrieval, and checking test results before responding.
Retrieval-Augmented Generation (RAG)
Iris uses RAG to ground responses in actual course content rather than relying solely on the LLM's training data. The RAG system has two phases:
Ingestion (Offline)
- Collect course materials — lecture slides, transcripts, FAQs.
- Chunk the content into semantically meaningful segments.
- Embed each chunk into a vector representation.
- Store the vectors in Weaviate (the vector database).
Retrieval (At Query Time)
- Rewrite the student's query to improve retrieval quality.
- Retrieve the most relevant chunks from Weaviate using vector similarity search.
- Rerank the retrieved chunks to surface the best matches.
- Generate a response that incorporates the retrieved content with transparent citations.
This ensures that when a student asks about a concept covered in lectures, Iris can point to the specific slide or transcript segment rather than producing a generic explanation.
What's Next?
- EduTelligence Ecosystem — how Iris connects to other services
- Developer Guide — deep dive into implementation details