The norm for the 21st century continues to involve many study subjects, but now each subject has the potential to contribute massive amounts of data, i.e., many more columns than rows.
Biostatisticians Tame the Human Side of Big Data
The Department of Biostatistics and Computational Biology plays an essential role in Medical Center research:
- From clinical trials of new therapies for neurological and heart diseases to epidemiological studies tracing the effects of mercury exposure on child development.
- From predicting suicide risk among veterans to studying the dynamics of complex cellular systems.
- In examining the influence of genomic, molecular, imaging, and environmental information on these and other aspects of human health through statistical modeling.
In all of these areas and more, the department’s faculty members provide expertise in study design and statistical analysis.
Beyond the support this department provides to other researchers, its members also initiate their own research related to the development of novel statistical methods. Increasingly, in all of these efforts, the department is confronted by the challenges— and opportunities—of Big Data, says Robert Strawderman, who became chair in July.
The department, which has 29 faculty members, boasts a long record of methodological and collaborative research and of educating professionals in the use of statistics. When researchers want to assess the benefits of new therapies, for example, “we’re the ones who try to design the studies in such a way that allows an unbiased comparison,” Strawderman says.
When there are problems with the data— when subjects drop out of a study in midstream, for example—“we have to account for the potential impact of those types of events on the inferences you want to make. And then there’s the actual statistical analysis and reporting of the results. So we are directly involved in all phases of data collection and analysis,” Strawderman adds. What complicates that work is the sheer volume of data now accessible to researchers.
“There’s a lot going on in the department dealing with genome data in one way or another,” Strawderman notes. A genome is the complete set of genetic material for an organism. In personalized medicine, for example, the design of targeted therapies relies on biomarker information derived from the genome of each patient, hence, on huge amounts of subject-specific “high dimensional” data. “That’s the human side of Big Data,” Strawderman observes.
For another example, consider the recent focus on health care reform—on moving to evaluation-based health care models. “This is another area where there are potentially massive amounts of data to cope with from various sources,” Strawderman notes. “There are government and clinical database sources. And genomic, imaging, and cytometric data are increasingly available for individual patients. Integrating these data sources, each containing potentially large amounts of data per subject, is part of the Big Data challenge.”
Statistics developed as a science in the 20th century, focused on answering well-defined questions about a specified population using a modest amount of information sampled from many study subjects. Strawderman characterizes such data as having “a lot of rows with relatively few columns.” The norm for the 21st century continues to involve many study subjects, but now each subject has the potential to contribute massive amounts of data, i.e., many more columns than rows. “Such large-scale datasets and associated questions of statistical inference lie beyond the scope of standard methods of analysis,” comments Strawderman.
Complicating matters, not all information collected might be relevant, the manner by which data are obtained may be inherently biased, and patterns detected using automated methods are easily distorted by hidden factors not known to the analyst. Strawderman says, “Biostatisticians try to figure out how to formulate the relevant scientific questions and process this information in a way that continues to make some sense.”
The department includes two related centers and a division:
- The Center for Integrative Bioinformatics and Experimental Mathematics, with more than 30 members, is an interdisciplinary research group focused on providing bioinformatics and computational biology support for research in immunology and infectious diseases.
- The Center for Biodefense Immune Modeling, with 17 members, is developing models of the immune response to influenza A infection, a potential bioterrorism agent and emerging pathogen. It is also modeling immune responses to influenza vaccinations.
- The Division of Psychiatric Statistics, with 13 members, supports innovative research collaborations and coordinated data gathering for studies of human behavior.
Department, center, and division faculty develop state-of-the-art methods and computational tools to query and analyze the increasingly complex data generated by URMC researchers.