Expanding Rhea for Automated Workflow Generation

Project Repository Video Presentation Edit

Project Themes:

  • Automated Workflow Generation and Execution

Team Name: Rhea Workflow Team

Team Lead(s):

Suggested Team Members and Roles [4-6 members]

NameAffiliationRole / Expertise
Arvind RamanathanArgonne National LaboratoryAI Systems Integration Advisor
Chris GramsArgonne National Laboratory, BV-BRC/ANLLead Developer, Galaxy Integration
Marius van den BeekSCI-SCALE, BRC AnalyticsWorkflow Automation Engineer
Maulik ShuklaArgonne National Laboratory, BV-BRCUse-case Alignment and Testing
Oleksandr NarykovArgonne National Laboratory, BV-BRC/ANLWorkflow Automation Engineer

Project Summary

This project extends the Rhea platform—an MCP+RAG-based environment that dynamically serves scientific tools—to enable automated workflow generation leveraging Galaxy and BV-BRC functionalities. By integrating BV-BRC’s pathogen data pipelines with Galaxy’s execution engine, the system will auto-generate end-to-end workflows for infectious disease analysis. This supports rapid hypothesis testing, reproducibility, and scalable execution of complex analyses across both ecosystems.

Goals and Objectives

  • Goal 1: Integrate BV-BRC APIs and Galaxy tool registries into Rhea’s orchestration layer
  • Goal 2: Implement LLM-driven workflow synthesis that identifies compatible BV-BRC and Galaxy modules
  • Goal 3: Demonstrate automated workflow generation for at least two priority pathogens

Approach

Rhea will use retrieval-augmented generation (RAG) to identify relevant BV-BRC tools and Galaxy workflows via semantic search and metadata tagging. The system will then employ an LLM-based synthesis engine to auto-compose workflow specifications (e.g., CWL/YAML) executable within Galaxy. The team will test interoperability, runtime efficiency, and reproducibility using selected priority pathogens.

Data and Resources Required

Resource TypeSource / LinkDescription / Purpose
DataBV-BRC genomes, priority pathogen datasetsTraining and workflow inputs
Tools / ServicesBV-BRC APIs, Galaxy serverWorkflow integration and execution
LLMs / AI ModelsRhea MCP model, GPT-5Tool reasoning and synthesis
Compute / StorageArgonne HPC / Galaxy clusterExecution backend

Expected Outcomes / Deliverables

  • Integrated prototype showing end-to-end automated workflow synthesis
  • Validation datasets
  • Interoperability report for BV-BRC + Galaxy toolchains

Potential Impact and Next Steps

This project will provide a foundation for automated workflow assembly in bioinformatics, enhancing reproducibility and scalability for BV-BRC researchers. Follow-up activities may include connecting Rhea to HiPerRAG for data retrieval and integrating Co-Scientist reasoning modules for workflow evaluation.

Technical Support Needed

  • GPU / LLM access
  • API keys
  • Mentor support

Additional Comments