About the Rheumatology Databrowser

This page gives advice on using the Rheumatology Databrowser and interpreting results. You may show or hide any section by clicking on the section header.

More about databrowsers.

A databrowser is a web based interface that allows non-technical users to interact with scientific data.

Making sense of complex datasets depends upon two very different kinds of machines. Silicon based number crunchers (computers) perform complex mathematical calculations at lightning speed with essentially zero errors while carbon based pattern recognizers (our brains) detect visual patterns much faster than any computer and use these patterns to develop further questions about the data.

People enjoy looking at informative scientific graphics if the barrier to creating them is low. When this happens, our species' extraordinary capabilities as pattern recognizers enable us to convert what we see in excellent scientific graphics into a deeper understanding. The problem in many fields of science is that the barriers to creating excellent graphics are discouragingly high.

A bottleneck exists where information is transferred between number crunchers and pattern recognizers. It can take a large amount of time to organize, format and analyze data before generating the graphics that tell the story of the data. Often, the role of data management and analysis is handed over to computer experts rather than the scientist end users with a real interest in the data. With no easy way to create the graphics that they need, the ability of scientits, managers and interested members of the public to develop their intuition about a dataset is greatly impaired.

Scientific databrowsers attempt to solve this problem by hiding the details of data management and analysis while providing simple, intuitive interfaces to the kinds of analysis that are appropriate for a particular dataset. These analyses are typically vetted statistical routines that are written in code in such a way as to be driven by input from a web browser user interface. In this manner, end users including both experts and non-experts can harness the power of (server side) number crunchers as well as their own (client side) pattern recognizers without having to learn the arcana of data management and scientific analysis software.

Building a databrowser.

The process of building a databrowser involves several steps:

  1. cleaning up any problems with the source data so that they are consistent and well organized
  2. writing code that allows vetted statistical analyses to be run interactively
  3. writing code to create high quality scientific graphics based on the results of the analysis
  4. embedding the analysis and visualization code in a web-server based databrowser engine
  5. creating a user interface that allows users to quickly and easily send requests to the analysis and visualization engine running on the server

When properly designed, the code behind a good databrowser can encapsulate a huge amount of institutional memory about the scientific process. Ideally, databrowser graphics should be of high enough quality that they are immediately ready to be included in scientific publications.

Data used in this databrowser.

The data in the Rheumatology Databrowser come from a European database of patients undergoing therapy for rheumatoid arthritis. A version of this databrowser is presently in pre-production use in rheumatology hospitals serviced by our international healthcare partners. (No identifiable information has been retained and this databrowser serves only as an example of how databrowsers can be used in medical research.)

The source data for this example databrowser contain only a very few variables associated with each patient-visit:

  • date
  • patient ID
  • hospital ID
  • drug treatment prescribed on this visit
  • evaluation of disease severity

Interpreting results.

The Rheumatology Databrowser allows users to subset the full dataset according to various criteria and then categorize the resulting subset according to disease severity, hospital where treatment took place, or patient treatment. The subset and categorize selectors can be augmented with additional information such as gender, age, family history, smoker status, etc. when that information is available in the dataset.

Only one visualization is currently available in this example databrowser and it gives three different views of how the chosen subset progresses with time.

boxplots

The bottom panel displays a boxplot diagram following the progress of patients' pain evaluation over three years of treatment. The pain score on the Y axis is broken into four color-coded categories: blue = remission; green = light symptoms; yellow = moderate symptoms; red = severe symptoms. The boxplot for a particular time interval represents all of the patient visits that occurred during that interval.

stacked barplot

The middle panel displays a stacked barplot showing the data from each time interval broken down into user selected categories and converted to percentages. From this plot one can assess, at each time interval, the relative ratios of each category. The first plot below categorizes the data according to disease severity (pain score). White space at the top of each stacked barplot represents those visits where no pain evaluation was recorded. The second plot below categorizes the same data according to the prescribed drug treatment.

multi-histogram plots

The top panel displays the same raw data as the middle plot but displayed as a histogram without conversion to percentages. In this plot one sees how many visits fall into each category rather than relative percentages. This information allows one to get a gut feeling for the significance of the results displayed in each of the other panels. For instance, month 9 in this study was not part of the regular patient-visit protocol and has many fewer visits. The lower number of patients after month 12 shows patients dropping out of the study for reasons that are not identified in the example dataset.