Ohio Bioinformatics Consortium Challenge Problem for OCCBIO 2009
The Ohio Bioinformatics Consortium (OBC) strives to enhance educational opportunities and research infrastructure throughout the state to make Ohio a world leader in bioinformatics and to facilitate new discoveries in data-intensive biological research. We carry out this mission through several activities, including the following:
- Interacting with the bioinformatics-related industry to understand their needs.
- Developing bioinformatics infrastructure to facilitate new discoveries in biological research.
- Discovering new knowledge by utilizing bioinformatics capabilities.
- Organizing the annual Ohio Collaborative Conference on Bioinformatics (OCCBIO).
To help facilitate the coordination among these activities, the OBC plans to define a series of Challenge Problem statements. Challenge Problems will address the current and anticipated future needs of the bioinformatics-related industry. Members of the Ohio bioinformatics community are invited to develop solutions to the Challenge Problems, to present their solutions at OCCBIO, and to contribute their solutions to Ohio's bioinformatics infrastructure at the Ohio Supercomputer Center.
The first Challenge Problem has been provided by Dr. Victor Chan of the Air Force Research Labs. Solutions will be presented at OCCBIO 2009, and will be judged by a committee chaired by Dr. Chan. All solutions should be submitted using the normal manuscript process of OCCBIO (see http://www.occbio.org/2009/). The team or individual that develops the best solution will be awarded a prize. Students in the Choose Ohio First Scholarship Program are encouraged to participate in this activity. The Challenge Problem definition follows.
Title: Bioinformatics Architecture for Human Performance Optimization
Objective:
The objective of this effort is to develop an end-to-end bioinformatics solution that will enable scientists to:
- Extract publicly available biological information relevant to human performance for knowledge discovery;
- Optimize the designs of human physical and cognitive performance studies;
- Discover molecular correlates and signatures for optimized human system status and performance;
- Develop a model(s) for predicting and monitoring changes in human system status and performance; and
- Identify potential targets for human performance optimization.
This bioinformatics solution is expected to significantly facilitate efforts by exercise physiologists or educators to assess, predict and enhance human physical and cognitive performance.
Background:
Human performance is influenced by three major factors, performance of individuals, human-human interaction and human-machine interaction. Although the interaction between these three factors will ultimately determine the overall performance of an individual, the system status of an individual probably has the most significant role in one's performance. The human system is extremely complex, in that there are nine physiological systems (gastrointestinal, urinary, integumentary, nervous, endocrine, musculoskeletal, respiratory, circulatory and immune systems) in the human body. The function of each of these systems involves the interaction between multiple organs, and the function of each of these organs involves the interaction between multiple cell types. At the cellular level, the cell function involves a network of cellular processes and pathways, which involves multiple biomolecules of metabolites, proteins and the genes encoding these proteins. Because of the complexity and data-intensive nature of human performance research, it is therefore essential to develop a bioinformatics architecture that can capture and organize the concepts related to human performance and linking isolated information in different domains to generate new knowledge in human performance. In addition, this bioinformatics architecture should also contain tools for analysis of high-dimensional data sets (e.g. ~omics data sets) and identification of molecular correlates that are useful for phenotypic prediction and modulation. This tool will therefore significantly facilitate the ultimate goal of human performance optimization.
Problem Statement:
Because of the scale of this problem, it is unlikely that a single research group will have sufficient resources and expertise to develop/implement all the tools of this architecture. Therefore, each PI should focus his/her effort on one area. To maximize the coverage of different areas of human performance and to minimize redundancy, a committee will be created to coordinate these efforts.
To ensure creativity in this research, there is no set rule governing how the research should be done, what approach/strategy should be used, or the scope of the proposed research, etc. Each PI is encouraged to develop a research plan tailored to his/her capability and expertise, provided that the research plan is in alignment with the overall goal of this challenge problem. Below is an example solely for the purpose of stimulating discussion and exchange of ideas.
The research plan may initially involve the collection and analysis of publicly available data sets that are relevant to human systems and human performance. For instance, a quick search of the NCBI Gene Expression Omnibus database indicates that the following relevant data sets exist:
| Cognition/Learning/Memory |
~140 data sets |
| Sleep/Circadian/Fatigue |
~170 data sets |
| Exercise/Physical Training |
~85 data sets |
| Cardiac Function |
~100 data sets |
| Pulmonary Function |
~90 data sets |
| Muscular Function |
~130 data sets |
| Immune Function |
~160 data sets |
| Renal Injury |
~45 data sets |
All these data sets are publicly available for download and analysis. Although not all these data sets are related to human system and human performance, a conservative estimate suggests that at least 10% of these data sets could used for data mining to identify molecular correlates for human performance. Analysis of the combined data sets (meta-analysis) will likely provide a more reliable result. This is because errors and noise will only randomly occur in a small number of data sets. In contrast, molecular correlates will be detected in the majority of data sets. Publicly available genotyping, proteomics or metabonomics data sets may also be in the study, although the number of these data sets is quite small.
CHALLENGE 1A: Select an area of focus (e.g., cardiac function) from the above list of data sets (or any other areas relevant to human performance). Retrieve the relevant data sets from the public repositories. The use of a meta-analysis approach is especially encouraged.
CHALLENGE 1B: Collect and catalog publicly available bioinformatics tools that are suitable for data analysis. If your search does not find any suitable bioinformatics tools or you have already developed a tool for data analysis, you may use your own tool.
CHALLENGE 1C: Apply the bioinformatics tools identified in CHALLENGE 1B to analyze the data sets retrieved in CHALLENGE 1A. The goal of this challenge is to identify molecular correlates of human performance.
CHALLENGE 1D: Demonstrate that the sensitivity and specificity of the molecular correlates identified in CHALLENGE 1C.
All four of the challenges, CHALLENGES 1A, 1B, 1C, and 1D should be solved.
Before initiation of the research, a brief description of the research plan should be submitted to the committee for coordination of the overall effort of the groups involved. PIs may also contact the committee for assistance in developing and maturing the research plan. All correspondence with the committee should be directed to the following:
Name: CHAN, Victor T-W.
Address:
Applied Biotechnology Branch
Air Force Research Lab.
Bldg 837, 2729 R Street, Area B
Wright-Patterson Air Force Base, OH 45433-5707
Email: Victor.Chan@wpafb.af.mil
Telephone: (937) 904-9501
Fax: 937-904-9610
|