The Nightmare of Extracting and Interpreting Blood Pressure Data

Posted by Nicole Buechler on Mar 27, 2019 10:08:00 AM
Find me on:


Medical record data is full of hidden gotchas that a data professional can stumble into. My favorite example is Blood Pressure.

What Is Blood Pressure?

Blood Pressure is a measurement of force exerted by the heart on blood in order to move it through the body. It is a key indicator of heart health, made up of two measurements: Systolic Pressure (the pressure inside your blood vessels when your heart beats) and Diastolic Pressure (the pressure inside your blood vessels when your heart rests between beats). In its informative form, Blood Pressure is displayed to clinicians like this:

Systolic Blood Pressure / Diastolic Blood Pressure

To the untrained eye, this looks like a division math problem from elementary school. This string is an analyst’s nightmare!

  • Do you take the top number and divide it by the bottom number, and utilize the quotient as the numeric representation of blood pressure?
  • Can you consider Systolic or Diastolic Pressure numbers independently?
  • If I clean this data by removing punctuation, I am left with one 5-6 digit integer. Can I use that?

Needless to say, it is confusing if you do not know much about medical data.

In What Way Is This Data Made Useful?

To help make this data more actionable, these two numbers are compared against reference ranges, sometimes stratified by age and gender. An elevated number for either reading can flag the whole blood pressure reading for a clinician, but a single normal reading does not indicate healthy blood pressure overall. The AMA guidelines for example:

Image result for AMA blood pressure reference ranges

This diagram sheds light into the utility of the two values independently. In SQL terms, our logic for categorizing and color-flagging blood pressure would look like this:

Screen Shot 2019-03-25 at 3.42.27 PM

These are some analytic complexities when making blood pressure data actionable and useful, but let’s take it a step further and consider how we might see this data inside the underlying EMR data source.

How Do We See This Data Stored?

While a Blood Pressure reading’s useful form is as a string containing both systolic and diastolic readings, LOINC standards (LOINC Code 55284-4) note that it is inadvisable to report two observations in one record. Systems instead capture the data as two different integer readings, Systolic Blood Pressure (LOINC Code 8480-6) and Diastolic Blood Pressure (LOINC Code 8462-4). Now we know that the data tends to be stored as numbers, which should make an analyst jump for joy. This isn’t the whole story though.

Here are two common table structures modeled from EMR databases, both capturing vital readings:

System 1

Screen Shot 2019-03-25 at 3.45.54 PM

System 2

Screen Shot 2019-03-25 at 3.46.01 PM

Does anything pop out at you? These examples highlight a common EMR design trend for vitals data; it is not attributed its own primary key. Instead, it is logged in association with its encounter. Vital data is considered a subset of encounter information, rather than its own health data entity in and of itself. This introduces the concept of a vital set, which is the vital readings your doctor tends to do every time you visit (Height, Weight, Blood Pressure, O2 Saturation, Temperature, etc.). The vital set captures a specific set of vital readings, and allows for capture of 1 and only 1 reading per type.

Consider the Tilt Table Test, who’s data is recorded in the table examples above. During this test, a person lies on a table that rotates from vertical to horizontal. Their blood pressure is recorded at the supine position, then immediately monitored for the next minute once the table is rotated to the vertical position.

How would a system capture this information? System 1 creates a second vital set for the additional blood pressure reading. This approach allows the systolic and diastolic pair to remain closely associated to each other, but its association with other vital data captured during that encounter is now at the encounter level rather than vital set level. System 2 is in a long format and simply tacks on a new row. This model accounts for additional readings more fluidly, but the systolic and diastolic pair are not linked as closely. EncounterID, VitalTime, VitalPosition and any other columns available would need to be employed in order to make a systolic/diastolic link with confidence.

How Do We Match Vital Readings Together?

Now we see that while blood pressure data is most informative as a vital data pair, source systems do not store this data in its systolic/diastolic form. We as engineers need to come up with ways to confidently pair systolic and diastolic pressure readings when extracting this information from source systems.

Beyond pairing Blood Pressure data together, further complexity is introduced when considering that reference ranges for vital data are dictated by other patient information such as age, gender, or sometimes even from other vital data such as height or weight. Consider BMI, which is a calculated value from both height and weight. We may not expect height to change during the course of a visit, but we can reasonably expect weight to. This means that we have to account for the possibility that we will have as many calculated BMI values from a single encounter as there are weight measurements.


Medical data is a reflection of human health, thus mirroring its complexity in form and utility. We at Hart have come up with methods for extracting vital data as its own health data entity rather than as a child of an encounter, and pair it with other pieces of information to make it actionable for a clinician when viewed on the Compass Platform. Our knowledge of potential health data complexities, which can only be gained from deep experience with medical information, aids us in navigating every new source system we come across.

Topics: Data, Engineering