AtlasML Module Reference
This comprehensive guide documents every module in the AtlasML codebase. Use this as a reference when reading code, adding features, or debugging issues.
Module Organization
atlasml/
├── app.py # Application entry point
├── config.py # Configuration management
├── dependencies.py # FastAPI dependencies (auth)
├── utils.py # Utility functions
├── clients/
│ └── weaviate.py # Weaviate database client
├── routers/
│ ├── health.py # Health check endpoint
│ └── competency.py # Competency endpoints
├── models/
│ └── competency.py # Pydantic data models
├── ml/
│ ├── embeddings.py # Embedding generation
│ ├── clustering.py # Clustering algorithms
│ ├── similarity_measures.py # Similarity computations
│ ├── centroid_similarity.py # Centroid calculations
│ ├── pipeline_workflows.py # ML workflow orchestration
│ ├── generate_competency_relationship.py # Relationship generation
│ └── feedback_loop.py # Feedback mechanisms
└── common/
└── exceptions.py # Custom exceptions
Core Application Modules
app.py - Application Entry Point
Purpose: Creates and configures the FastAPI application, registers middleware, routers, and manages lifecycle events.
Key Components:
RequestLoggingMiddleware
class RequestLoggingMiddleware(BaseHTTPMiddleware):
async def dispatch(self, request: Request, call_next):
# Logs request/response details
Logs every request with:
- HTTP method and path
- Request body (for POST requests)
- Response status code
- Processing duration
lifespan(app)
@asynccontextmanager
async def lifespan(app):
# Startup: Initialize Weaviate, log status
yield
# Shutdown: Close connections
Manages application lifecycle:
- Startup: Initialize Weaviate client, verify connection
- Shutdown: Close Weaviate connection gracefully
validation_exception_handler
@app.exception_handler(RequestValidationError)
async def validation_exception_handler(request, exc):
# Returns 422 with detailed validation errors
Handles Pydantic validation errors with detailed debugging info.
When to Modify:
- Adding global middleware
- Registering new routers
- Adding startup/shutdown tasks
- Custom exception handlers
File: atlasml/app.py
config.py - Configuration Management
Purpose: Loads and validates configuration from environment variables using Pydantic.
Key Classes:
APIKeyConfig
class APIKeyConfig(BaseModel):
token: str
Single API key definition.
WeaviateSettings
class WeaviateSettings(BaseModel):
host: str # e.g., "weaviate.example.com"
port: int # e.g., 443
api_key: str|None # optional API key
scheme: str # "http" or "https"
Weaviate connection parameters.
Settings
class Settings(BaseModel):
api_keys: list[APIKeyConfig] # API keys for auth
weaviate: WeaviateSettings # Weaviate config
sentry_dsn: str | None = None # Optional Sentry DSN
env: str = "development" # Environment name
@classmethod
def get_settings(cls, use_defaults: bool = False):
# Load from env or return defaults
Main settings container.
Key Functions:
def get_settings(use_defaults: bool = False) -> Settings:
# Returns singleton Settings instance
Environment Variables:
ATLAS_API_KEYS: Comma-separated list of API keysWEAVIATE_HOST,WEAVIATE_PORT,WEAVIATE_API_KEY(optional)SENTRY_DSN,ENV
When to Modify:
- Adding new configuration options
- Changing environment variable names
- Adding validation logic
File: atlasml/config.py
dependencies.py - Authentication Dependencies
Purpose: Provides FastAPI dependencies for API key authentication.
Key Components:
_get_api_key(request)
def _get_api_key(request: Request) -> str:
# Extracts Authorization header
# Raises 401 if missing
TokenValidator
class TokenValidator:
def __init__(self, api_keys: List[APIKeyConfig] = Depends(get_api_keys)):
self.api_keys = api_keys
async def __call__(self, api_key: str = Depends(_get_api_key)) -> APIKeyConfig:
# Validates API key
# Returns APIKeyConfig if valid
# Raises HTTPException(401) if invalid
Callable class for dependency injection.
validate_token(api_key, api_keys)
def validate_token(
api_key: str = Depends(_get_api_key),
api_keys: List[APIKeyConfig] = Depends(get_api_keys),
) -> APIKeyConfig:
# Functional alternative to TokenValidator
Usage Example:
@router.post("/suggest", dependencies=[Depends(TokenValidator)])
async def suggest_competencies(request: SuggestCompetencyRequest):
# Requires valid API key
When to Modify:
- Changing authentication mechanism
- Adding additional auth methods (JWT, OAuth)
- Custom authorization logic
File: atlasml/dependencies.py
Router Modules
routers/health.py - Health Check
Purpose: Provides a simple liveness/readiness endpoint for container orchestrators.
Endpoints:
@router.get("/")
async def health():
return []
- Path:
/api/v1/health/ - Method: GET
- Authentication: None
- Response:
[](200 OK)
Use Cases:
- Docker healthchecks
- Load balancer health checks
When to Modify:
- Adding detailed health status (database, external APIs)
- Changing response format
File: atlasml/routers/health.py
routers/competency.py - Competency Operations
Purpose: Handles all competency-related API endpoints.
Endpoints:
1. POST /suggest
async def suggest_competencies(
request: SuggestCompetencyRequest,
) -> SuggestCompetencyResponse:
# Suggests competencies based on description
Flow:
- Generate embedding for description
- Search Weaviate for similar competencies in course
- Rank by similarity
- Return top suggestions
2. POST /save
async def save_competencies(request: SaveCompetencyRequest):
# Saves or deletes competencies/exercises
Flow:
- Parse operation type (UPDATE/DELETE)
- For UPDATE: Generate embeddings, store in Weaviate
- For DELETE: Remove from Weaviate
- Return success
3. GET /relations/suggest/{course_id}
async def suggest_competency_relations(course_id: int) -> CompetencyRelationSuggestionResponse:
# Generates relationship suggestions
Flow:
- Fetch all competencies for course
- Analyze semantic relationships
- Generate candidate relations (MATCHES, EXTENDS, REQUIRES)
- Return relation graph
When to Modify:
- Adding new competency endpoints
- Changing suggestion algorithms
- Modifying response formats
File: atlasml/routers/competency.py
Data Model Modules
models/competency.py - Pydantic Models
Purpose: Defines API contracts and domain models using Pydantic.
Core Types:
OperationType
class OperationType(str, Enum):
UPDATE = "UPDATE"
DELETE = "DELETE"
Competency
class Competency(BaseModel):
id: int
title: str
description: Optional[str]
course_id: int
ExerciseWithCompetencies
class ExerciseWithCompetencies(BaseModel):
id: int
title: str
description: str
competencies: Optional[list[int]]
course_id: int
SemanticCluster
class SemanticCluster(BaseModel):
cluster_id: str
course_id: int
vector_embedding: list[float]
Request/Response Models:
SuggestCompetencyRequest
class SuggestCompetencyRequest(BaseModel):
description: str
course_id: int
SuggestCompetencyResponse
class SuggestCompetencyResponse(BaseModel):
competencies: list[Competency]
SaveCompetencyRequest
class SaveCompetencyRequest(BaseModel):
competencies: Optional[list[Competency]]
exercise: Optional[ExerciseWithCompetencies]
operation_type: OperationType
RelationType
class RelationType(str, Enum):
MATCHES = "MATCHES" # Similar content
EXTENDS = "EXTENDS" # Builds upon
REQUIRES = "REQUIRES" # Prerequisite
CompetencyRelation
class CompetencyRelation(BaseModel):
tail_id: int # Source competency
head_id: int # Target competency
relation_type: RelationType
CompetencyRelationSuggestionResponse
class CompetencyRelationSuggestionResponse(BaseModel):
relations: list[CompetencyRelation]
When to Modify:
- Adding new API endpoints (new request/response models)
- Changing data structures
- Adding validation rules
File: atlasml/models/competency.py
Client Modules
clients/weaviate.py - Weaviate Database Client
Purpose: Provides a high-level interface to Weaviate vector database with schema management and CRUD operations.
Key Classes:
CollectionNames
class CollectionNames(str, Enum):
EXERCISE = "Exercise"
COMPETENCY = "Competency"
SEMANTIC_CLUSTER = "SemanticCluster"
Defines available collections.
WeaviateClient
class WeaviateClient:
def __init__(self, weaviate_settings: WeaviateSettings = None):
# Initialize connection
# Ensure collections exist
Main client class with methods:
CRUD Operations:
def add_embeddings(
self,
collection_name: str,
embeddings: list[float],
properties: dict | None = None,
) -> str:
# Insert vector and properties
# Returns: UUID of inserted object
def get_embeddings(self, collection_name: str, id: str):
# Get object by UUID
# Returns: Object with vector and properties
def get_all_embeddings(self, collection_name: str) -> List[Dict[str, Any]]:
# Fetch all objects from collection
# Returns: List of objects with vectors
def get_embeddings_by_property(
self,
collection_name: str,
property_name: str,
property_value: int | str
) -> List[Dict[str, Any]]:
# Query by property filter
# Returns: Filtered objects
def search_by_multiple_properties(
self,
collection_name: str,
property_filters: dict
):
# Query with multiple filters (AND logic)
# Returns: Filtered objects
def update_property_by_id(
self,
collection_name: str,
id: str,
properties: dict,
vector: Optional[List[float]] = None,
) -> bool:
# Update object properties/vector
# Returns: True if successful
def delete_by_property(
self,
collection_name: str,
property_name: str,
property_value: str | int,
) -> int:
# Delete objects matching filter
# Returns: Number of deleted objects
Utility Methods:
def is_alive(self):
# Check if Weaviate is responsive
# Returns: bool
def close(self):
# Close Weaviate connection
def delete_all_data_from_collection(self, collection_name: str):
# Delete and recreate collection
def recreate_collection(self, collection_name: str):
# Create collection from schema
WeaviateClientSingleton
class WeaviateClientSingleton:
@classmethod
def get_instance(cls, weaviate_settings=None) -> WeaviateClient:
# Returns cached instance
Ensures only one client instance per settings configuration.
Helper Functions:
def get_weaviate_client(weaviate_settings=None) -> WeaviateClient:
# Convenience accessor for singleton
def create_weaviate_client(weaviate_settings=None) -> WeaviateClient:
# Create or fetch client
When to Modify:
- Adding new collections
- Changing schema definitions
- Adding specialized query methods
- Optimizing batch operations
File: atlasml/clients/weaviate.py
Machine Learning Modules
ml/embeddings.py - Embedding Generation
Purpose: Generate vector embeddings from text using OpenAI or local models.
Key Classes:
ModelDimension
class ModelDimension(Enum):
TEXT_EMBEDDING_THREE_SMALL = 1536
TEXT_EMBEDDING_THREE_LARGE = 3072
ALL_MINILM_L6_V2 = 384
Defines embedding dimensions for different models.
EmbeddingGenerator
class EmbeddingGenerator:
def __init__(self):
self._local_model = None
self._azure_client = None
@property
def local_model(self) -> SentenceTransformer:
# Lazy-load local model
@property
def azure_client(self) -> AzureOpenAI:
# Lazy-load Azure OpenAI client
def generate_embeddings_openai(self, description: str) -> List[float]:
# Generate embedding using Azure OpenAI
# Returns: 1536-dimensional vector
def generate_embeddings(self, description: str) -> List[float]:
# Generate embedding using local model
# Returns: 384-dimensional vector
Usage Example:
generator = EmbeddingGenerator()
embedding = generator.generate_embeddings_openai("Python programming")
# Returns: [0.123, -0.456, 0.789, ...] (1536 dims)
When to Use:
- OpenAI: Higher quality, API costs, requires internet
- Local: Free, offline, faster, slightly lower quality
When to Modify:
- Adding new embedding models
- Switching providers (Cohere, Hugging Face, etc.)
- Implementing caching
File: atlasml/ml/embeddings.py
ml/clustering.py - Clustering Algorithms
Purpose: Apply clustering algorithms to group similar items.
Key Functions:
apply_hdbscan
def apply_hdbscan(
matrix: np.ndarray,
eps: float = 0.5,
min_samples: int = 5,
metric: str = "euclidean",
min_cluster_size: int = 10,
) -> Tuple[np.ndarray, Optional[np.ndarray], Optional[np.ndarray]]:
# Apply HDBSCAN clustering
# Returns: (labels, centroids, medoids)
Parameters:
matrix: n x d embedding matrixeps: Max distance for neighborhoodmin_samples: Min samples in neighborhoodmetric: Distance metric (euclidean, cosine)min_cluster_size: Min cluster size
Returns:
labels: Cluster ID for each point (-1 = noise)centroids: Cluster center pointsmedoids: Representative points
apply_kmeans
def apply_kmeans(
matrix: np.ndarray,
n_clusters: int = 5,
random_state: int = 42,
) -> Tuple[np.ndarray, np.ndarray]:
# Apply K-Means clustering
# Returns: (labels, centroids)
apply_tsne
def apply_tsne(
matrix: np.ndarray,
n_components: int = 2,
perplexity: float = 5.0,
) -> np.ndarray:
# Reduce dimensionality for visualization
# Returns: Low-dimensional representation
When to Use:
- HDBSCAN: Variable cluster count, handles noise
- K-Means: Fixed cluster count, faster
- t-SNE: Visualization only (not for inference)
When to Modify:
- Adding new clustering algorithms
- Tuning hyperparameters
- Implementing custom distance metrics
File: atlasml/ml/clustering.py
ml/similarity_measures.py - Similarity Computations
Purpose: Compute similarity between vectors.
Key Functions:
compute_cosine_similarity
def compute_cosine_similarity(
vector1: List[float],
vector2: List[float]
) -> float:
# Compute cosine similarity
# Returns: Score between -1 (opposite) and 1 (identical)
Formula: cos(θ) = (A · B) / (||A|| * ||B||)
compute_euclidean_distance
def compute_euclidean_distance(
vector1: List[float],
vector2: List[float]
) -> float:
# Compute Euclidean distance
# Returns: Distance (0 = identical, higher = more different)
When to Use:
- Cosine: Semantic similarity (text, embeddings)
- Euclidean: Spatial distance (coordinates, features)
When to Modify:
- Adding new similarity metrics (Jaccard, Hamming, etc.)
- Optimizing computations (batch processing)
File: atlasml/ml/similarity_measures.py
ml/centroid_similarity.py - Centroid Calculations
Purpose: Update cluster centroids incrementally.
Key Functions:
def update_cluster_centroid_on_addition(
current_centroid: np.ndarray,
new_vector: np.ndarray,
cluster_size: int
) -> np.ndarray:
# Update centroid when adding a point
def update_cluster_centroid_on_removal(
current_centroid: np.ndarray,
removed_vector: np.ndarray,
cluster_size: int
) -> np.ndarray:
# Update centroid when removing a point
Use Cases:
- Incremental clustering updates
- Real-time feedback incorporation
- Avoiding full recomputation
File: atlasml/ml/centroid_similarity.py
ml/pipeline_workflows.py - ML Workflow Orchestration
Purpose: Orchestrates complex ML workflows that combine multiple operations.
Key Class:
class PipelineWorkflows:
def __init__(self, weaviate_client=None):
# Initialize with Weaviate client
def get_competency_suggestions(
self,
description: str,
course_id: int
) -> list[Competency]:
# Complete suggestion pipeline
def map_new_competency_to_exercise(
self,
exercise_id: int,
competency_id: int
):
# Associate competency with exercise
def save_competency(self, competency: Competency):
# Store competency with embedding
def save_exercise(self, exercise: ExerciseWithCompetencies):
# Store exercise with competencies
Workflow Example (Competency Suggestion):
- Generate embedding for description
- Fetch all competencies for course from Weaviate
- Compute cosine similarity with each
- Rank by similarity score
- Return top N suggestions
When to Modify:
- Adding new ML workflows
- Changing suggestion algorithms
- Implementing new features
File: atlasml/ml/pipeline_workflows.py
ml/generate_competency_relationship.py - Relationship Generation
Purpose: Generate semantic relationships between competencies.
Key Function:
def generate_competency_relationship(
competencies: list[Competency]
) -> list[CompetencyRelation]:
# Analyze competency pairs
# Generate relations (MATCHES, EXTENDS, REQUIRES)
# Returns: List of relationships
Relationship Types:
- MATCHES: Similar or equivalent content
- EXTENDS: Builds upon or expands
- REQUIRES: Prerequisite dependency
Algorithm (Simplified):
- For each competency pair:
- Compare embeddings (cosine similarity)
- Analyze descriptions (keyword matching)
- Determine relationship type based on similarity + patterns
- Filter weak relationships
When to Modify:
- Improving relationship detection accuracy
- Adding new relationship types
- Implementing ML-based classification
File: atlasml/ml/generate_competency_relationship.py
ml/feedback_loop.py - Feedback Mechanisms
Purpose: Incorporate user feedback to improve suggestions.
Key Functionality:
- Track which suggestions users accept/reject
- Adjust embeddings or ranking based on feedback
- Update cluster centroids
When to Modify:
- Implementing feedback collection
- Designing feedback-based learning algorithms
File: atlasml/ml/feedback_loop.py
Utility Modules
common/exceptions.py - Custom Exceptions
Purpose: Define custom HTTP exceptions for consistent error handling.
Key Classes:
class RequiresAuthenticationException(HTTPException):
def __init__(self):
super().__init__(
status_code=401,
detail={"type": "not_authenticated", "errorMessage": "Requires authentication"},
)
class PermissionDeniedException(HTTPException):
def __init__(self):
super().__init__(
status_code=403,
detail={"type": "not_authorized", "errorMessage": "Permission denied"},
)
Usage:
if not is_authorized(user):
raise PermissionDeniedException()
When to Modify:
- Adding new exception types
- Customizing error messages
- Adding error context
File: atlasml/common/exceptions.py
utils.py - Utility Functions
Purpose: Shared utility functions used across modules.
Typical Contents:
- String manipulation helpers
- Data transformation utilities
- Common validation functions
File: atlasml/utils.py
Module Dependencies
Import Graph
Next Steps
Now that you understand all the modules:
- API Endpoints: Learn how endpoints use these modules
- Weaviate Integration: Deep dive into the database client
- ML Pipelines: Understand ML workflow details
- Setup Guide: Set up your development environment
Use your IDE's "Go to Definition" feature to navigate between modules while reading this documentation!