Playground

Welcome to the Athena Playground Interface, a versatile tool designed for developing, testing, and evaluation Athena modules. This document provides an overview of the Playground’s features, illustrating its capabilities and how to use them effectively.

Base Configuration

The Base Configuration section is your starting point in the Athena Playground. Here, you connect to the Athena instance, monitor the health status of Athena and its modules, and set up your working environment. You can switch between example and evaluation datasets, and choose between Module Requests and Evaluation Mode for varied testing experiences.

Base Info Header Interface of the Athena Playground

Module Requests

This section is designed to test individual requests to Athena modules, allowing you to observe and understand their responses in isolation. First, select a healthy module from the dropdown menu. Then, you can optionally choose to use a custom configuration for all subsequent requests. Afterward, you can test the following requests.

Get Config Schema

This feature enables you to fetch and view the JSON configuration schema of a module. It’s a critical tool for understanding the expected runtime configuration options for different modules, ensuring seamless integration and functioning with your system.

Send Submissions

Send Submissions is a key feature for pushing exercise materials and submissions to Athena modules. It’s a foundational step, allowing modules to process and analyze data for later.

Select Submission

Selecting submissions is crucial for improving the efficiency of generated feedback suggestions. This feature allows modules to propose a specific submissions, which can then be used to generate feedback suggestions. For instance, CoFee uses this to select the submission with the highest information gain so it can generate more relevant feedback suggestions for the remaining submissions.

Send Feedback

Send Feedback enables the transmission of (tutor) feedback to Athena modules. This feature is pivotal in creating a learning loop, where modules can refine their responses based on real feedback.

Generate Feedback Suggestions

This function is at the heart of Athena’s feedback mechanism. It responds with generated feedback suggestions for a given submission.

Request Evaluation

Request Evaluation is essential for assessing the quality of feedback provided by Athena modules. It allows the comparison between module-generated feedback and historical tutor feedback, offering a quantitative analysis of the module’s performance.

Module Requests: Evaluation Request Interface of the Athena Playground

Evaluation Mode

Evaluation Mode enables comprehensive evaluation and comparison of different modules through experiments.

Define Experiment

Define Experiment allows you to set up and customize experiments. You can choose execution modes, exercise types, and manage training and evaluation data, laying the groundwork for in-depth structured module comparison and analysis. Experiments can be exported and imported, allowing you to reuse and share them with others as benchmarks.

Configure Modules

Here, you can select and configure the modules for your experiment. This step is crucial for ensuring that each module is set up with the appropriate parameters for effective comparison and analysis. Module configurations can be exported and imported, allowing you to reuse them in other experiments and share them with others for reproducibility.

Conduct Experiment

You can conduct experiments with modules on exercises. This feature allows you to analyze module performance in generating and evaluating feedback on submissions. The interface is column-based, with the first column displaying the exercise details, the second column displaying the selected submission with historical feedback, and the next columns displaying the generated feedback suggestions from each module.

Currently, only the batch mode is supported, where all submissions are processed at once and the following steps are performed: 1. Send submissions 2. Send feedback for training submissions if there are any 3. Generate feedback suggestions for all evaluation submissions 4. Run automatic evaluation

Additionally, you can annotate the generated feedback suggestions like a tutor would do in the Artemis interface with: Accept or Reject.

The results, manual ratings, and automatic evaluation can be exported and imported, allowing you to analyze and visualize the results in other tools, or continue the experiment at a later time.

For Text Exercises

Evaluation Mode: Conduct Experiment Interface for a Text Exercise of the Athena Playground

For Programming Exercises

Expert Evaluation

Expert Evaluation is the process where a researcher enlists experts to assess the quality of feedback provided on student submissions. These experts evaluate how well the feedback aligns with the content of the submissions and predefined metrics such as accuracy, tone, and adaptability. The goal is to gather structured and reliable assessments to improve feedback quality or validate feedback generation methods.

The playground provides two key Expert Evaluation views:

Researcher View: Enables researchers to configure the evaluation process, define metrics, and generate expert links.
Expert View: Allows experts to review feedback and rate its quality based on the defined evaluation metrics.

Researcher View

Researcher View is accessible from the playground below Evaluation Mode:

The researcher begins creating a new Expert Evaluation by selecting a new name and uploading exercises with submissions and feedback. Now, the expert can define his own metrics, such as actionability and accuracy, and add a short and a long description. Based on these metrics, experts will compare the different feedback types.

Afterwards, the researcher adds a link for each expert participating in the evaluation. This link should then be shared with the corresponding expert. After finishing the configuration, the researcher can define the experiment and start the Expert Evaluation.

Warning

Once the evaluation has started, the exercises and the metrics can no longer be changed! However, additional expert links can be created.

Instead of uploading the exercises and defining the metrics separately, the researcher can also import an existing configuration at the top of the Researcher View.

After the evaluation has been started and the experts have begun to evaluate, the researcher can track each expert’s progress by clicking the Update Progress button. Evaluation results can be exported at any time during the evaluation using the Download Results button.

Expert View

The Expert View can be accessed through generated expert links. The Side-by-Side tool is used for evaluation.

Upon clicking the link for the first time, the expert is greeted by a welcome screen that introduces the tutorial. The following steps are shown and briefly described:

The expert first reads the exercise details to get familiar with the exercise. The details include the problem statement, grading instructions, and a sample solution.

After understanding the exercise, the expert reads through the submission and the corresponding feedback.

The expert then evaluates the feedback using a 5-point Likert scale based on the previously defined metrics.

If the meaning of a metric is unclear, a more detailed explanation can be accessed by clicking the info icon or the Metric Details button.

After evaluating all the different types of feedback, the expert can move on to the next submissions and repeat the process. When ready to take a break, the expert clicks on the Continue Later button, which saves their progress.