Expanding Rhea for Automated Workflow Generation

Project Repository Edit

Project Themes:

  • Automated Workflow Generation and Execution

Team Name: Rhea Workflow Team

Team Lead(s):

  • Name: Chris Grams, Oleksandr Narykov
  • Affiliation: Argonne National Laboratory, BV-BRC
  • Email: [To be added]

Suggested Team Members and Roles [4-6 members]

NameAffiliationRole / Expertise
Chris GramsArgonne National LaboratoryLead Developer, Galaxy Integration
Oleksandr NarykovArgonne National LaboratoryWorkflow Automation Engineer
Arvind RamanathanArgonne National LaboratoryAI Systems Integration Advisor
CEPI collaborators-Use-case Alignment and Testing

Project Summary

This project extends the Rhea platform—an MCP+RAG-based environment that dynamically serves scientific tools—to enable automated workflow generation leveraging Galaxy and BV-BRC functionalities. By integrating BV-BRC’s pathogen data pipelines with Galaxy’s execution engine, the system will auto-generate end-to-end workflows for infectious disease analysis. This supports rapid hypothesis testing, reproducibility, and scalable execution of complex analyses across both ecosystems.

Goals and Objectives

  • Goal 1: Integrate BV-BRC APIs and Galaxy tool registries into Rhea’s orchestration layer
  • Goal 2: Implement LLM-driven workflow synthesis that identifies compatible BV-BRC and Galaxy modules
  • Goal 3: Demonstrate automated workflow generation for at least two priority pathogens

Approach

Rhea will use retrieval-augmented generation (RAG) to identify relevant BV-BRC tools and Galaxy workflows via semantic search and metadata tagging. The system will then employ an LLM-based synthesis engine to auto-compose workflow specifications (e.g., CWL/YAML) executable within Galaxy. The team will test interoperability, runtime efficiency, and reproducibility using selected CEPI priority pathogens.

Data and Resources Required

Resource TypeSource / LinkDescription / Purpose
DataBV-BRC genomes, CEPI priority pathogen datasetsTraining and workflow inputs
Tools / ServicesBV-BRC APIs, Galaxy serverWorkflow integration and execution
LLMs / AI ModelsRhea MCP model, GPT-5Tool reasoning and synthesis
Compute / StorageArgonne HPC / Galaxy clusterExecution backend

Expected Outcomes / Deliverables

  • Integrated prototype showing end-to-end automated workflow synthesis
  • Validation datasets
  • Interoperability report for BV-BRC + Galaxy toolchains

Potential Impact and Next Steps

This project will provide a foundation for automated workflow assembly in bioinformatics, enhancing reproducibility and scalability for CEPI and BV-BRC researchers. Follow-up activities may include connecting Rhea to HiPerRAG for data retrieval and integrating Co-Scientist reasoning modules for workflow evaluation.

Technical Support Needed

  • GPU / LLM access
  • API keys
  • Mentor support

Additional Comments