RAG Pipeline

Iris uses Retrieval-Augmented Generation (RAG) to ground LLM responses in actual course content. This page covers the ingestion and retrieval pipelines that power lecture and FAQ content lookup.

Architecture Overview

The RAG system has two phases:

Ingestion — Course content (lecture PDFs, transcriptions, FAQs) is processed, chunked, and stored as vectors in Weaviate.
Retrieval — At query time, the student's question is used to find relevant content, which is then provided to the LLM agent as tool output.

Ingestion Phase:
  Artemis → Iris API → Ingestion Pipeline → Weaviate

Retrieval Phase:
  Student Query → Query Rewriting → Vector Search → Reranking → Agent Context

Weaviate Collections

The vector database stores content in five collections, each defined as a schema in src/iris/vector_database/:

Collection	Schema File	Content
`Lectures`	`lecture_unit_page_chunk_schema.py`	Chunked text from lecture PDF pages
`LectureTranscriptions`	`lecture_transcription_schema.py`	Lecture video/audio transcriptions
`LectureUnitSegments`	`lecture_unit_segment_schema.py`	Summaries combining slide + transcription content
`LectureUnits`	`lecture_unit_schema.py`	Lecture unit metadata
`FAQs`	`faq_schema.py`	FAQ question-answer pairs

Each collection stores:

Vector embeddings — Generated using an embedding model (e.g., text-embedding-3-small).
Text content — The original text for display in responses.
Metadata — Course ID, lecture ID, page numbers, etc., used for filtering.

Schema Example

From lecture_unit_page_chunk_schema.py:

class LectureUnitPageChunkSchema(Enum):
    COLLECTION_NAME = "Lectures"
    COURSE_ID = "course_id"
    COURSE_LANGUAGE = "course_language"
    LECTURE_ID = "lecture_id"
    LECTURE_UNIT_ID = "lecture_unit_id"
    PAGE_TEXT_CONTENT = "page_text_content"
    PAGE_NUMBER = "page_number"
    BASE_URL = "base_url"
    PAGE_VERSION = "attachment_version"

Ingestion Pipelines

Lecture PDF Ingestion

Pipeline: LectureUnitPageIngestionPipeline (src/iris/pipeline/lecture_ingestion_pipeline.py)

This is the most complex ingestion pipeline. The flow is:

Receive PDF — Artemis sends a base64-encoded PDF via the webhook API.
Save to temp file — The PDF is decoded and saved to a temporary file.
Extract pages — PyMuPDF (fitz) extracts text and images from each page.
Image interpretation — If pages contain images, an LLM interprets the image content and merges it with the text.
Text chunking — RecursiveCharacterTextSplitter from LangChain breaks large pages into smaller chunks.
Generate embeddings — Each chunk is embedded using the configured embedding model.
Store in Weaviate — Chunks are batch-inserted into the Lectures collection with metadata.
Cleanup — The temporary PDF file is deleted.

The ingestion runs in a separate process managed by IngestionJobHandler. If the same lecture unit is re-ingested, the handler terminates the old process first:

class IngestionJobHandler:
    def add_job(self, process, course_id, lecture_id, lecture_unit_id):
        # If a job already exists for this lecture unit, terminate it
        old_process = self.job_list.get(course_id, {}).get(lecture_id, {}).get(lecture_unit_id)
        if old_process:
            old_process.terminate()
            old_process.join()
        # Start the new process
        process.start()

Transcription Ingestion

Pipeline: TranscriptionIngestionPipeline (src/iris/pipeline/transcription_ingestion_pipeline.py)

Processes lecture video/audio transcriptions:

Receives transcription text from Artemis.
Segments and chunks the transcription.
Generates embeddings and stores in the LectureTranscriptions collection.

FAQ Ingestion

Pipeline: FaqIngestionPipeline (src/iris/pipeline/faq_ingestion_pipeline.py)

Ingests FAQ entries:

Receives FAQ question-answer pairs from Artemis.
Embeds the combined question + answer text.
Stores in the FAQs collection.

Lecture Update Ingestion

Pipeline: LectureIngestionUpdatePipeline (src/iris/pipeline/lecture_ingestion_update_pipeline.py)

Handles updates to already-ingested lectures:

Deletes existing chunks for the lecture unit.
Re-runs the full ingestion pipeline with the updated content.

Lecture Unit Deletion

Pipeline: LectureUnitDeletionPipeline (src/iris/pipeline/delete_lecture_units_pipeline.py)

Removes all stored vectors for deleted lecture units.

Retrieval Pipeline

Lecture Content Retrieval

Location: src/iris/retrieval/lecture/lecture_retrieval.py

The LectureRetrieval class is a SubPipeline that orchestrates multi-source retrieval:

class LectureRetrieval(SubPipeline):
    def __call__(self, query, course_id, chat_history, ...) -> LectureRetrievalDTO:
        # 1. Get lecture unit metadata
        lecture_unit = self.get_lecture_unit(course_id, lecture_id, lecture_unit_id)

        # 2. Rewrite the student query for better retrieval
        rewritten_query = self.rewrite_query(query, chat_history)

        # 3. Retrieve from three sources in parallel
        #    - Lecture page chunks
        #    - Lecture transcriptions
        #    - Lecture unit segments

        # 4. Rerank results using Cohere
        reranked_results = self.cohere_client.rerank(...)

        # 5. Return combined results as LectureRetrievalDTO

The retrieval process has several stages:

1. Query Rewriting

The student's query is rewritten by an LLM to be better suited for vector search. This uses prompts from lecture_retrieval_prompts.py:

Standard rewriting — Reformulates the question for semantic search.
Hypothetical answer generation — Generates a hypothetical answer that would appear in the lecture content (HyDE technique).

2. Multi-source Retrieval

Three sub-retrievers run in parallel using TracedThreadPoolExecutor:

Retriever	Collection	Returns
`LecturePageChunkRetrieval`	Lectures	Page text chunks with page numbers
`LectureTranscriptionRetrieval`	LectureTranscriptions	Transcription segments
`LectureUnitSegmentRetrieval`	LectureUnitSegments	Combined slide+transcription summaries

Each retriever performs vector similarity search filtered by course_id (and optionally lecture_id / lecture_unit_id).

3. Reranking

Retrieved page chunks and transcriptions are reranked using Cohere's reranker (rerank-multilingual-v3.5) to improve relevance ordering. Lecture unit segments are retrieved separately and are not reranked. The reranker is configured in llm_config.yml:

- id: cohere
  name: Cohere Client V2
  type: cohere_azure
  model: "rerank-multilingual-v3.5"
  endpoint: "your_cohere-endpoint"
  api_key: "..."

4. Result Assembly

Results are assembled into a LectureRetrievalDTO:

@dataclass
class LectureRetrievalDTO:
    lecture_unit_page_chunks: list[LectureUnitPageChunkRetrievalDTO]
    lecture_transcriptions: list[LectureTranscriptionRetrievalDTO]
    lecture_unit_segments: list[LectureUnitSegmentRetrievalDTO]

FAQ Retrieval

Location: src/iris/retrieval/faq_retrieval.py

Similar to lecture retrieval but queries the FAQs collection. Used by the course chat and exercise chat pipelines when FAQ content is available.

Citation Generation

After the agent produces a response using retrieved content, the CitationPipeline (src/iris/pipeline/shared/citation_pipeline.py) generates citations that link back to specific lecture slides or pages. This runs as a post-processing step to provide source attribution.

The VectorDatabase Class

Location: src/iris/vector_database/database.py

The VectorDatabase class manages the Weaviate connection as a singleton:

class VectorDatabase:
    _lock = threading.Lock()
    static_client_instance = None

    def __init__(self):
        with VectorDatabase._lock:
            if not VectorDatabase.static_client_instance:
                VectorDatabase.static_client_instance = weaviate.connect_to_local(
                    host=settings.weaviate.host,
                    port=settings.weaviate.port,
                    grpc_port=settings.weaviate.grpc_port,
                )
        self.client = VectorDatabase.static_client_instance
        self.lectures = init_lecture_unit_page_chunk_schema(self.client)
        self.transcriptions = init_lecture_transcription_schema(self.client)
        self.lecture_segments = init_lecture_unit_segment_schema(self.client)
        self.lecture_units = init_lecture_unit_schema(self.client)
        self.faqs = init_faq_schema(self.client)

Collections are lazily initialized — schemas are created in Weaviate if they do not already exist.

Architecture Overview​

Weaviate Collections​

Schema Example​

Ingestion Pipelines​

Lecture PDF Ingestion​

Transcription Ingestion​

FAQ Ingestion​

Lecture Update Ingestion​

Lecture Unit Deletion​

Retrieval Pipeline​

Lecture Content Retrieval​

1. Query Rewriting​

2. Multi-source Retrieval​

3. Reranking​

4. Result Assembly​

FAQ Retrieval​

Citation Generation​

The VectorDatabase Class​