Playground

Welcome to the Athena Playground Interface, a versatile tool designed for developing, testing, and evaluation Athena modules. This document provides an overview of the Playground’s features, illustrating its capabilities and how to use them effectively.

Base Configuration

The Base Configuration section is your starting point in the Athena Playground. Here, you connect to the Athena instance, monitor the health status of Athena and its modules, and set up your working environment. You can switch between example and evaluation datasets, and choose between Module Requests and Evaluation Mode for varied testing experiences.

Base Info Header Interface of the Athena Playground

Base Info Header Interface of the Athena Playground

Module Requests

This section is designed to test individual requests to Athena modules, allowing you to observe and understand their responses in isolation. First, select a healthy module from the dropdown menu. Then, you can optionally choose to use a custom configuration for all subsequent requests. Afterward, you can test the following requests.

Module Requests Select Module Interface of the Athena Playground

Module Requests: Select Module Interface of the Athena Playground

Get Config Schema

This feature enables you to fetch and view the JSON configuration schema of a module. It’s a critical tool for understanding the expected runtime configuration options for different modules, ensuring seamless integration and functioning with your system.

Get Config Schema Request Interface of the Athena Playground

Module Requests: Get Config Schema Request Interface of the Athena Playground

Send Submissions

Send Submissions is a key feature for pushing exercise materials and submissions to Athena modules. It’s a foundational step, allowing modules to process and analyze data for later.

Send Submissions Request Interface of the Athena Playground

Module Requests: Send Submissions Request Interface of the Athena Playground

Select Submission

Selecting submissions is crucial for improving the efficiency of generated feedback suggestions. This feature allows modules to propose a specific submissions, which can then be used to generate feedback suggestions. For instance, CoFee uses this to select the submission with the highest information gain so it can generate more relevant feedback suggestions for the remaining submissions.

Select Submission Request Interface of the Athena Playground

Module Requests: Select Submission Request Interface of the Athena Playground

Send Feedback

Send Feedback enables the transmission of (tutor) feedback to Athena modules. This feature is pivotal in creating a learning loop, where modules can refine their responses based on real feedback.

Send Feedback Request Interface of the Athena Playground

Module Requests: Send Feedback Request Interface of the Athena Playground

Generate Feedback Suggestions

This function is at the heart of Athena’s feedback mechanism. It responds with generated feedback suggestions for a given submission.

Generate Feedback Suggestions Request Interface of the Athena Playground

Module Requests: Generate Feedback Suggestions Request Interface of the Athena Playground

Request Evaluation

Request Evaluation is essential for assessing the quality of feedback provided by Athena modules. It allows the comparison between module-generated feedback and historical tutor feedback, offering a quantitative analysis of the module’s performance.

Evaluation Request Interface of the Athena Playground

Module Requests: Evaluation Request Interface of the Athena Playground

Evaluation Mode

Evaluation Mode enables comprehensive evaluation and comparison of different modules through experiments.

Define Experiment

Define Experiment allows you to set up and customize experiments. You can choose execution modes, exercise types, and manage training and evaluation data, laying the groundwork for in-depth structured module comparison and analysis. Experiments can be exported and imported, allowing you to reuse and share them with others as benchmarks.

Define Experiment Interface of the Athena Playground

Evaluation Mode: Define Experiment Interface of the Athena Playground

Configure Modules

Here, you can select and configure the modules for your experiment. This step is crucial for ensuring that each module is set up with the appropriate parameters for effective comparison and analysis. Module configurations can be exported and imported, allowing you to reuse them in other experiments and share them with others for reproducibility.

Configure Modules Interface of the Athena Playground

Evaluation Mode: Configure Modules Interface of the Athena Playground

Conduct Experiment

You can conduct experiments with modules on exercises. This feature allows you to analyze module performance in generating and evaluating feedback on submissions. The interface is column-based, with the first column displaying the exercise details, the second column displaying the selected submission with historical feedback, and the next columns displaying the generated feedback suggestions from each module.

Currently, only the batch mode is supported, where all submissions are processed at once and the following steps are performed: 1. Send submissions 2. Send feedback for training submissions if there are any 3. Generate feedback suggestions for all evaluation submissions 4. Run automatic evaluation

Additionally, you can annotate the generated feedback suggestions like a tutor would do in the Artemis interface with: Accept or Reject.

The results, manual ratings, and automatic evaluation can be exported and imported, allowing you to analyze and visualize the results in other tools, or continue the experiment at a later time.

For Text Exercises

Conduct Experiment Interface for a Text Exercise of the Athena Playground

Evaluation Mode: Conduct Experiment Interface for a Text Exercise of the Athena Playground

For Programming Exercises

Expert Evaluation

Expert Evaluation is the process where a researcher enlists experts to assess the quality of feedback provided on student submissions. These experts evaluate how well the feedback aligns with the content of the submissions and predefined metrics such as accuracy, tone, and adaptability. The goal is to gather structured and reliable assessments to improve feedback quality or validate feedback generation methods.

The playground provides two key Expert Evaluation views:

  1. Researcher View: Enables researchers to configure the evaluation process, define metrics, and generate expert links.

  2. Expert View: Allows experts to review feedback and rate its quality based on the defined evaluation metrics.

Researcher View

Researcher View is accessible from the playground below Evaluation Mode:

Location of the Researcher View

The researcher begins creating a new Expert Evaluation by selecting a new name and uploading exercises with submissions and feedback. Now, the expert can define his own metrics, such as actionability and accuracy, and add a short and a long description. Based on these metrics, experts will compare the different feedback types.

Defining metrics

Afterwards, the researcher adds a link for each expert participating in the evaluation. This link should then be shared with the corresponding expert. After finishing the configuration, the researcher can define the experiment and start the Expert Evaluation.

Define experiment

Warning

Once the evaluation has started, the exercises and the metrics can no longer be changed! However, additional expert links can be created.

Instead of uploading the exercises and defining the metrics separately, the researcher can also import an existing configuration at the top of the Researcher View.

After the evaluation has been started and the experts have begun to evaluate, the researcher can track each expert’s progress by clicking the Update Progress button. Evaluation results can be exported at any time during the evaluation using the Download Results button.

View Expert Evaluation progress

Expert View

The Expert View can be accessed through generated expert links. The Side-by-Side tool is used for evaluation.

Side-by-Side tool

Upon clicking the link for the first time, the expert is greeted by a welcome screen that introduces the tutorial. The following steps are shown and briefly described:

The expert first reads the exercise details to get familiar with the exercise. The details include the problem statement, grading instructions, and a sample solution.

After understanding the exercise, the expert reads through the submission and the corresponding feedback.

The expert then evaluates the feedback using a 5-point Likert scale based on the previously defined metrics.

If the meaning of a metric is unclear, a more detailed explanation can be accessed by clicking the info icon or the Metric Details button.

After evaluating all the different types of feedback, the expert can move on to the next submissions and repeat the process. When ready to take a break, the expert clicks on the Continue Later button, which saves their progress.