Databrowsers

A scientific databrowser is the ultimate end product of our work. These web-based systems allow users to interactively interrogate their data, creating on-the-fly visualizations based on user input.

more about databrowsers ...

A databrowser is a web based interface that allows non-technical users to interact with scientific data.

Making sense of complex datasets depends upon two very different kinds of machines. Silicon based number crunchers (computers) perform complex mathematical calculations at lightning speed with essentially zero errors while carbon based pattern recognizers (our brains) detect visual patterns much faster than any computer and use these patterns to develop further questions about the data.

People enjoy looking at informative scientific graphics if the barrier to creating them is low. When this happens, our species' extraordinary capabilities as pattern recognizers enable us to convert what we see in excellent scientific graphics into a deeper understanding. The problem in many fields of science is that the barriers to creating excellent graphics are discouragingly high.

A bottleneck exists where information is transferred between number crunchers and pattern recognizers. It can take a large amount of time to organize, format and analyze data before generating the graphics that tell the story of the data. Often, the role of data management and analysis is handed over to computer experts rather than the scientist end users with a real interest in the data. With no easy way to create the graphics that they need, the ability of scientits, managers and interested members of the public to develop their intuition about a dataset is greatly impaired.

Scientific databrowsers attempt to solve this problem by hiding the details of data management and analysis while providing simple, intuitive interfaces to the kinds of analysis that are appropriate for a particular dataset. These analyses are typically vetted statistical routines that are written in code in such a way as to be driven by input from a web browser user interface. In this manner, end users including both experts and non-experts can harness the power of (server side) number crunchers as well as their own (client side) pattern recognizers without having to learn the arcana of data management and scientific analysis software.

Building a databrowser.

The process of building a databrowser involves several steps:

  1. cleaning up any problems with the source data so that they are consistent and well organized
  2. writing code that allows vetted statistical analyses to be run interactively
  3. writing code to create high quality scientific graphics based on the results of the analysis
  4. embedding the analysis and visualization code in a web-server based databrowser engine
  5. creating a user interface that allows users to quickly and easily send requests to the analysis and visualization engine running on the server

When properly designed, the code behind a good databrowser can encapsulate a huge amount of institutional memory about the scientific process. Ideally, databrowser graphics should be of high enough quality that they are immediately ready to be included in scientific publications.

Examples

Examples of our work are included below. Click on an image to go to a working example and learn more by interacting with the data.

Energy Import/Export

Access to fossil fuels is one of the most important issues of our time. The world's largest economies are extremely dependent upon imported supplies of oil and gas. Understanding who produces and consumes oil, coal and natural gas is critical today and will remain so in the years ahead.

This databrowser uses data from the BP Statistical Review and displays coal, oil & natural gas production and consumption timelines for each country in the database and several political and geographic groupings of nations. Users can dynamically plot import/export curves to get a sense of who the major fossil fuel producers and consumers are and how this has changed in the last four decades.

Mineral Production and Use

The United States Geological Survey (USGS) has the following to say regarding mineral resources within the United States:

"Mineral materials processed domestically accounted for more than $575 billion in the U.S. economy in 2007. U.S. manufacturers and consumers require increasing amounts of imported mineral materials. Making informed decisions about supply and development of mineral commodities that are critical to our economy and security requires current and reliable information about both mineral resources and the consequences of their development."

This databrowser uses data from the USGS Dataseries 140 covering a wide range of minerals used in all areas of manufacturing, construction and agriculture. Several different visualization styles are available, each tailored to answer a specific set of questions regarding mineral use and availability. The goal of the US Minerals Databrowser is to make it easier to extract meaningful information from this valuable dataset.

Energy Futures

Futures contracts allow traders to buy and sell commodities for delivery at some future date at a pre-determined price. A futures chain is created when prices for futures of subsequent months are chained together. In the current environment of volatile energy prices, energy futures chains provide a snapshot of the "emotional state" of the market -- the level of optimism or pessimism about near term and long term prospects for both the economy and the availability of energy resources such as oil and natural gas.

This databrowser uses daily prices for NYMEX energy futures from June 01, 2009 to yesterday's close. Users may select dates to compare market predictions from the past with the actual closing prices they were trying to predict. The data are presented in a manner that shows the weekly, monthly and quarterly variation in futures prices, providing an intuitive picture of volatility in energy markets.

EPA Probabilistic Survey Analysis

EPA's regional surveys were designed to collect data that are representative of conditions throughout the sampled area. Random site selection ensures that summary statistics derived from the data are unbiased and can be used to guide planning and management efforts at a regional scale.

This databrowser displays summary statistics for measures of biological condition, water chemistry, and site condition for wadeable streams. Users may test for correlation among response variables and human influence. Users may also subset the data and compare summary statistics and patterns of correlation according to year, state, stream order, or other factors.

This work was funded by the EPA Office of Environmental Information.

EPA Relative Risk Analysis

Relative risk is a statistical method widely applied in human health reporting to summarize and compare the risk of developing an illness for a given set of factors. This approach has been adapted to summarize the effects of different stressors on the environmental health of streams. Because streams experience a variety of stressors (e.g., increased nutrients, loss of riparian habitat, sediment), resource managers need a method to identify which threats present the greatest risk in order to implement effective programs to protect them. Relative risk summarizes the strength of the association between a stressor and an indicator of stream condition.

This databrowser links relative risk calculations to the national Wadeable Streams Assessment data set. Benthic macroinvertebrates were used to assess stream condition. The databrowser allows users to compare the relative importance of various stressors across political or ecological regions.

This work was funded by the EPA Office of Environmental Information.

Rheumatology

Rheumatoid arthritis is a chronic autoimmune disorder affecting millions of individuals. Various drug treatments are available to treat the disease. The recently introduced TNF-alpha inhibitors show promising results in clinical studies but command a high per-treatment price. In an effort to carefully evaluate the effectiveness of these drugs several European countries have begun systematically tracking patient response to these treatments.

The data in the Rheumatology Databrowser come from a European database of patients undergoing therapy for rheumatoid arthritis. A version of this databrowser is presently in pre-production use in rheumatology hospitals serviced by our international healthcare partners. This databrowser is being evaluated by rheumatology researchers as a way to consistently apply vetted statistical analyses to a growing database of patient data.

This work was funded by the Danish medical informatics company ZiteLab.

XHTML & CSS