Project Theme
Outbreak Monitoring and Tracking
Project Summary
Infectious disease outbreaks require rapid detection and response, but information is scattered across news outlets, social media, official reports, and scientific publications. This project develops an AI-powered outbreak monitoring system that aggregates data from multiple sources, uses natural language processing to identify relevant signals, tracks geographic and temporal patterns, and provides real-time alerts and visualizations. By automating outbreak detection and tracking, we can enable faster public health response and better situational awareness.
Goals and Objectives
- Aggregate multi-source outbreak data from news, social media, WHO, CDC, and other sources
- Detect outbreak signals using AI-based filtering and relevance scoring
- Track geographic and temporal patterns with automated mapping and timeline generation
- Generate automated reports and alerts for emerging threats
Approach
Methods and AI/ML Approaches:
- Web scraping and API integration for data collection
- Natural Language Processing for text analysis and entity extraction
- Named Entity Recognition for pathogens, locations, dates, and case counts
- Time series analysis for trend detection and forecasting
- Anomaly detection for identifying unusual patterns
- LLMs for report summarization and synthesis
- Geographic visualization and mapping
Implementation Steps:
- Identify and integrate data sources (ProMED, HealthMap, news APIs, Twitter)
- Build data collection and preprocessing pipeline
- Implement NER for outbreak-relevant entities
- Develop filtering and relevance scoring models
- Create geographic and temporal tracking system
- Build dashboard for visualization and alerts
- Validate against known outbreak timelines
- Integrate with BRC genomic surveillance data
Data and Resources Required
| Resource Type | Source / Link | Description / Purpose |
|---|---|---|
| Data | ProMED, HealthMap, GISAID | Existing outbreak databases |
| Data | News APIs, Twitter API | Real-time information sources |
| Data | WHO, CDC, ECDC reports | Official outbreak data |
| Tools / Services | BV-BRC genomic data | Integration with sequence surveillance |
| LLMs / AI Models | GPT-4, Claude for summarization | Report generation |
| Tools / Services | BeautifulSoup, Scrapy | Web scraping frameworks |
| Tools / Services | Folium, Plotly | Mapping and visualization |
| Compute / Storage | Database for time series data | Store outbreak records |
Expected Outcomes / Deliverables
- Prototype outbreak monitoring dashboard with real-time data feeds
- Automated alert system for emerging threats
- Geographic and temporal visualizations of outbreak patterns
- Validation study comparing system detections with known outbreaks
- API for data access and integration with other systems
- Public repository with code and documentation
- Case study on recent outbreak detection
Potential Impact and Next Steps
Impact on:
- Infectious disease research: Provides early warning for emerging pathogens
- AI/ML automation: Demonstrates practical surveillance automation
- Public health preparedness: Enables faster outbreak detection and response
Next Steps After Codeathon:
- Expand to additional data sources and languages
- Add genomic surveillance integration for variant tracking
- Implement predictive models for outbreak trajectory
- Create mobile app for field epidemiologists
- Partner with public health agencies for deployment
Technical Support Needed
- API access to news and social media feeds
- Historical outbreak data for validation
- BRC genomic surveillance data
- Cloud infrastructure for continuous monitoring
- Mentor support from epidemiologists
Team Information
Teams will be formed during the Codeathon. Ideal team composition:
- Data Engineer: Data pipeline and integration
- NLP/ML Engineer: Text analysis and entity extraction
- Epidemiologist: Domain expertise and validation
- Frontend Developer: Dashboard and visualization
- Backend Developer: API and database management