Project Themes:
- Automated Workflow Generation and Execution
Team Name: BRC Automated Workflow Group
Team Lead(s):
- Name: Clark Cucinell, Bob Olson
- Affiliation: Argonne National Lab
- Email: cucinell@anl.gov
Suggested Team Members and Roles [4-6 members]
| Name | Affiliation | Role / Expertise |
|---|---|---|
| Nicole Bowers | Argonne National Lab | Bioinformatician |
| Bruce Parello | Argonne National Lab | Computer Scientist |
| Chris Grams | Argonne National Lab | PhD Student Computer Science |
Project Summary
Analyzing genomic data is essential for understanding infectious disease mechanisms and responses. However, constructing and executing complex bioinformatics workflows often requires specialized expertise and time-intensive manual effort. This project explores how generative AI models, guided through the Model Context Protocol (MCP), can automate and streamline these analyses. By integrating BV-BRC tools and other bioinformatics resources, the system aims to enable AI-assisted workflow generation, execution, and interpretation. The outcome will demonstrate how language models can enhance accessibility and efficiency in infectious disease genomics research.
Goals and Objectives
-
Goal 1: Develop and demonstrate generative AI-driven creation of executable bioinformatics workflows using MCP-connected tools (e.g., BV-BRC APIs and external bioinformatics services).
-
Goal 2: Successfully execute at least one end-to-end genomic analysis workflow generated by an AI model, including data retrieval, processing, and result interpretation.
-
Goal 3: Evaluate and document the performance and accuracy of AI-generated workflows compared to manually constructed pipelines, identifying strengths, limitations, and opportunities for improvement.
Approach
-
We will integrate LLMs (ChatGPT, Claude, Llama-Scout) with custom MCP servers to automatically generate, refine, and execute bioinformatics workflows. Using BV-BRC and other connected tools, the system will translate research questions and data inputs into executable analyses through MCP-based tool calls.
-
Milestones
-
LLM Workflow Generation: Demonstrate that LLMs can correctly generate workflow outlines and tool selections in response to a defined infectious-disease research question.
-
Workflow Refinement with Data Inputs: Enable LLMs to incorporate provided datasets (e.g., genomic or transcriptomic files) to refine workflow steps and parameters for data-driven analysis.
-
Automated Execution via MCP: Implement and validate MCP tool calls that allow AI-generated workflows to execute directly, producing interpretable outputs from real input data.
-
Data and Resources Required
| Resource Type | Source / Link | Description / Purpose |
|---|---|---|
| Data | BV-BRC Solr Data API, Workspace | Provides access to genomic and metadata resources for constructing and testing analytical workflows. |
| Tools / Services | BV-BRC Analysis Services, BV-BRC MCP Servers | Enable programmatic workflow generation, execution, and integration with BV-BRC’s bioinformatics tools. |
| LLMs / AI Models | Claude, ChatGPT, Llama-Scout, Argo | Generate, refine, and evaluate bioinformatics workflows and analysis strategies. |
| Compute / Storage | BV-BRC Workspace, Hosted LLM | Provide computational resources and data storage for running and managing automated analyses. |
Expected Outcomes / Deliverables
By the end of the Codeathon, the team expects to produce:
-
Workflow Generation Prompts: A curated set of effective prompts and templates for generating bioinformatics workflows using LLMs.
-
Model Performance Assessment: Documentation summarizing model strengths, limitations, and common failure modes when designing and executing analyses.
-
AI-Generated Analysis Outputs: Example workflow results and interpretive summaries generated by the models, demonstrating end-to-end workflow execution.
Potential Impact and Next Steps
This project demonstrates how AI-driven workflow automation can accelerate genomic analysis for infectious disease research and surveillance. By enabling LLMs to design and execute workflows through BV-BRC and MCP tools, researchers can rapidly generate insights from pathogen genomic data, supporting faster response to emerging threats. The project also advances AI/ML interpretability by documenting how language models make analytical decisions and identifying where human guidance remains essential. These innovations contribute directly to public health preparedness by improving accessibility to complex analyses, reducing turnaround times for genomic insights, and offering a framework for training researchers in AI-assisted bioinformatics. Future work will focus on implementing and integrating workflow generation directly into the BV-BRC Copilot functionality.
Technical Support Needed
- GPU / LLM access
- API keys
Additional Comments