Where did this COVID stuff come from?

Good day to you all. This is post 5 for the new AstroVirtual Inc. blog.

Last week’s blogposts generated some response, more than I anticipated. As in “Where did you get all of this COVID data?”, and “Why did you decide to collect it in the first place?” Follow-up questions were “What did you find out?” and “What did you do then?”  A couple of important questions followed about the Geospatial Maps (the Choropleths).

Well, those are great questions, and I’ll propose to answer them to some degree here, but honestly, they are a derivative of what our AstroVirtual tools can provide, rather than ‘what we do.’ They are, more specifically, the kind of thing that InnovaScapes Institute does, so that website and blog are the more important locale for such questions https://www.innovascapesinstitute.com/.

The following material has not been proffered as yet at that site, however, so we will provide a sketch herein.

First, where was the origin of all of this COVID data?

Every Public Health department in every municipality in every country has the role and obligation to report infectious disease incidence and outcomes to their citizenry, and to the next level of Public Health department in their province, county, state or nation.  

These numbers are reported regularly, and for pandemics such as the COVID experience, this is done frequently (daily, for the first three years, typically).  

These numbers get aggregated at higher levels, by several services, and then reported to the public via news aggregators. For the fast-moving COVID data, the goal quickly became to do this daily at the county and state level for most nations, and services such as the New York Times and London’s BBC published updates daily. In addition, other groups leaned in as well, all over the world. The Johns Hopkins Coronavirus Resource Center became a primary validating source for global data https://coronavirus.jhu.edu/map.html, and for the United States, USAFacts https://www.usafacts.org, is a private “not-for-profit, non-partisan, civic initiative making government data easy for all.” USAFacts has a tagline that says: “Researchers.  Analysts.  Statisticians,  Designers.  No politicians. No one at USAFacts is trying to convince you of anything. The only opinion we have is that government data should be easier to access. Our entire mission is to provide you with facts about the United States that are rooted in data. We believe once you have the solid, unbiased numbers behind the issues you can make up your own mind.”

InnovaScapes Institute with AstroVirtual collected, collated and configured this data from these and other sources, daily, for nearly three years, until home testing for Case Infections had become common, and that statistic was no longer meaningful to collect.

Second, why did you decide to collect it?

There are two parts to this question. First, why did you decide to begin, and second, why did you persevere once there were multiple sources for this data.  

Why begin?

I served on the Colorado Air Pollution Control Commission for 2 years following America’s first Earth Day. Told the first day that Colorado was the 2nd worst state in America for emphysema death-rates, I asked about the ten leading states, and also what caused the disease. Just diagnosed with emphysema myself, my service was fully in self-interest.

The curt answer was that smoking caused 80-90% of it, and bad urban air was the rest. Which, for a Caltech-trained scientist who’d studied with Arie Haagen-Smit (the original investigator of Smog in Los Angeles), seemed far-fetched. I’d never smoked, and of the 25 largest cities in America (> 500,000 people), none were in any of the Top Ten emphysema states. Moreover, Utah was 15th on the list of ‘worst states’ and I knew from my Mormon wife that cigarettes weren’t allowed to be sold in Utah whatsoever.  

Figure 1

The worst ten states in America for emphysema death-rate 1960-1969

Faced with my skeptical askance, department officials averred that the Colorado stat’s were high because it had a long history of ‘clean air’ and people moved to the state from bad air places like Chicago and padded our death statistics.

It was long true that tuberculosis victims moved to Colorado for ‘clean air’, and the Headquarters for the American Lung Association was here for that reason.  Moreover, the prestigious National Center for Atmospheric Research was thirty miles from Denver, near the University of Colorado in Boulder. 

With the Department of Health datafiles, I compiled a death map for Colorado emphysema victims, county by county, for 2,752 deaths.The results? Disappointing for the rationale being used—first, nearly all of the deaths were long-term Colorado residents; second, they lived largely in small high-altitude towns, nowhere near to even Colorado’s ‘bad city air’; and third, more than one-third were not smokers to any strong degree.

Remember last week’s post, suggesting that comparative images are really useful?

Figure 2

Green areas are altitudes above 7,000 feet

Dark shading counties are the worst for emphysema per capita death-rates 

There are several other elements discoverable from the above maps, not least of which was the farming community death-rates in the upper right hand corner counties of Colorado. These traced to two specific issues: (1) improper installation and worker protection in alfalfa dehydrating plants, and (2) heavy smokers driving new (at the time) air-conditioned enclosed tractor cabs, which recirculated the smoker’s own air continually.

What we found was a problem that it has taken America’s vaunted Center for Disease Control more than fifty years since to acknowledge—that deep inversion layers in mountain valleys, when filled with toxic elements and small particulates that can invade deeply into lungs, are deadly over time.   Like 400% to 1500% more deadly.   Just 50 years to agree!   The next Figure (actually a collage of images) makes the point. One can imagine that this is the real culprit behind the ‘surprising finding’ over the past five years that up to two-thirds of lung cancer victims in places like China, Taiwan, and yes, even the US, aren’t smokers.

Figure 3

Among the true culprits of emphysema and COPD lung disease

While the Center for Disease Control never bowed to this thesis,  statisticians at the National Center for Health Statistics published a major paper in 2001, which showed a series of choropleths for COPD that had undeniable representations. See Figure 4.

Subsequent studies, including CDC graphs, verified this situation repeatedly, albeit with gradual evolution to the situation where much more of the country experienced such debilitating results.   

(FN:  Jay H. Kim, Jimmie D. Givens, Jai W. Choi; “Geographic Distribution and Changes in COPD Mortality: United States, 1983-1997, National Center for Health Statistics, Hyattsville, MD, presented at American Statistical Society, August, 2001.)

 One particularly interesting study, in Colorado (2007) found that, while the facts were true that it was rural folk, they explained it away (without data), that rural people were older, and hence more prone to dying anyway. They did conclude it was NOT smoking. They ‘thought’ it might be ‘inward migration of sick folk.’  Again, no data. Beautiful stupidity. If you want to read something truly asinine, that was able to get published by a reputable organization, try this one for size.  FN: Ryan McGhee and Greg Kinney, “2007 Colorado Chronic obstructive pulmonary disease (COPD) Surveillance report, Colorado COPD Institute, with the American Lung Association, Colorado Branch.

Figure 4

     COPD Death-rates by HAS region, 1983-1987

 

So, to finish the story about why I began to collect the data,  I had this view that we had found some singularly interesting things lo those many years ago in Colorado. Views that had been recently re-enforced at Stanford when we hosted the Places and Spaces exhibit for Katy Börner (of Indiana University) in 2009. There, I met Warren Muir, Executive Director, Division of Earth and Life Studies for the Institute of Medicine, and National Research Council, Washington D.C. He asked at a private breakfast, over pecan rolls and decadent coffee at Davis Masten’s home, why I hosted the exhibit. One of those early-morning simple conversation gambits with no real purpose but to pass the time agreeably. I said I’d years earlier become a total fan of the kind of work that Edward Tufte later popularized with his acclaimed book,“Visual Display of Quantitative Data” (1983). I told him that I’d solved the riddle of high death-rates in Colorado in 1971 with self-drawn maps, and it was like watching a largemouth bass take the lure in spring. He practically lunged at me, saying something like, “No, you couldn’t have.” Startled and shaken, I held my ground, and he fired,“What did you find?” When I told him the thesis shown in Figure 3, he sat back down, and said, “I found the same thing in 1974 during my PhD thesis, and no one would believe me!” Talk about vindication, at two levels—very satisfying breakfast!

 Lying in the hospital bed with COVID five times in four weeks from March 6, 2020, and then home in bed for several months, I at first bemoaned my fate, and then was thankful that I seemed to have weathered the storm without dying. It truthfully was quite a while before I realized that this was a golden opportunity for a data scarfer for three reasons. 

  1. First, a pandemic creates DYNAMIC data, changing rapidly on a daily basis in lots of places. My COPD study, and any of them at the time or since, was plagued by very slow processes—people take years to die from COPD after diagnosis.  

  2. Second,  the tools—for database construction, data graphing and mapping, and for that matter, disease diagnostics, have progressed immensely.   I had done the earlier work nearly a decade before Dan Bricklin created the first electronic spreadsheet—everything I did was hand-calculated, on long sheets of paper, mostly written in pen rather than typed. And, though I’d done an early computer graphics display box, there were no application tools to show the counties or other regional boundaries, not even nations themselves.

  3. Third, the pandemic scared enough people quickly enough that the health departments were forced to change their reporting modus operandi, moving from a leisurely monthly or even weekly report to a daily manner, and moreover, the press was demanding it, and then publishing it so that I, as a citizen in a very remote part of America, could get real-time data.

So, after mulling it over, the choice seemed obvious. I needed to collect the data, a lot of it, and project it through the lens of geospatial mapping. Who knows what we might find?

As for our perseverance, our belief was that the daily repetitive data flow would cease, and the repositories of the data being distributed would wane. Thus, people would lose the ability to ‘zero in’ on a date and locale because only aggregated numbers would become the available data. That has mostly proven true, and we think that AstroVirtual and InnovaScapes at this point have a nearly unique database for USA data because we were diligent about collecting, compiling, collating, and saving some 20 million data entries daily. Test it yourself—try to find out the data for Polk County, Iowa for September 12-16, 2020, for example, or your own community for any date in that 3 years.

The follow-on questions that I mentioned in the opening paragraph—what did you find out? And what have you done with the things you discovered? – are perhaps the most important at this point. The first thing we found out was that choropleths are not that well known (certainly not by that name), and their correct usage is non-trivial. We will have a blogpost about choropleth design and usage in due course (not right away). But they are incredibly valuable tools for showing easily-discerned patterns in ‘big data’ that simply cannot be imagined or shown in any other venue of which we are aware.

I called an old HP colleague to ask if his work with Microsoft Power BI (BI stands for Business Intelligence), this tool, a major extension of Microsoft Excel spreadsheets, has an advertising blurb that reads” Power BI is a unified, scalable platform for self-service and enterprise business intelligence (BI). Connect to and visualize any data.” There are competing tools out there, including Tableau, co-founded by Pat Hanrahan at Stanford, recently purchased by Salesforce.  Katy Börner’s exhibit had featured multiple displays of complex Tableau analytics.

The first Power BI test was my insistence upon a 2x2 matrix of choropleth maps—Confirmed Cases and Deaths on the top two maps, and under each the equivalent per capita data. I didn’t anticipate how dramatic this might prove to be. Figure 6 of last week’s blogpost (February 19) shows the data from the first time I’d actually seen these images side-by-side. And when I realized that the entire US population was being spoon-fed the left-side image when the right-side image was much more accurately showing the widespread nature of the rampaging pandemic, I felt that we had to step into action in a more forceful way.  How? I still couldn’t walk to the mailbox, so I used the InnovaScapes blog (which should be, but isn’t, tied to the InnovaScapes website). Here is that blog https://innovascapes.blogspot.com/ and if you trace back in history, I first posted about these maps on April 29, 2020, and showed the first 2x2 matrix on May 7th.

In that timeframe, I tried to interest our county commissioners to the issues, essentially urging them to keep the lockdowns longer than this red-necked county was inclined to do.

Figure 5

Tulare County per capita Death-rate map

Then, because the county health office had to do ‘contact tracing’ on my own activities in February, I began to delve more deeply into our local situation. The key finding here was published on June 7 in an InnovaScapes post that posited a connection to the Tulare Agricultural Fair in February as a potential hotspot or super spreader, analogous to the more widely known Mardi Gras and the troubled ski resorts. There, for the first time, a death-rate for our county was shown to be 400% to 2,000% higher than any surrounding county—all eleven of which shared similar demographics.

I didn’t appreciate it at the time, but to show a map like Figure 5 tends to beg the question of ‘what else is going on?’ Those questions more naturally emerge when the full 2x2 matrix is shown, as we later described to the United Nations General Assembly. 

Figure 6

Eleven adjacent agricultural California counties, 75% of California ag value

Figure 6 shows the full value of the 2x2 matrix described in Blogpost #4. Most of these counties shared a similar Confirmed Cases profile, and nearly all had similar per capita case-rates. Only four counties, surrounding Tulare County, had a high number of deaths, as seen in the upper right picture of the matrix.  And even the other three shared a virtual lack of high death-rate common to the other seven. Only Tulare County, where the Ag Fair was held for a week, shows any differential death-rate. 

There had to be a reason, right? And, upon investigation, yes, there was a solid reason. The Ag Fair was indeed, just as the ski resorts were, just as Mardi Gras was, and just as the Sturgis motorcycle rally was. The difference is that it never got national press, so at this point it is one of those plausibly deniable events that the Tulare officials, and certainly the Ag Fair officials, will never admit.  But, 92 people died in Tulare County, half as many as all the other ten counties combined. Said another way, with one-tenth the population, there were one-third the deaths in this tiny rural county.

Comparative maps—helpful! 

Next post, we’ll move away from these medical discussions and look at some business scenarios. Stay tuned.

Previous
Previous

Envisioning Data ®

Next
Next

Comparative Displays