by Kirby Phares
Over the 2019 Summer Term at Dartmouth College, I worked on the Eugenic Rubicon project under Dr. Jaqueline Wernimont and Dr. Meredith Ferguson. Drawing upon sterilization records, the Eugenic Rubicon project seeks to create a widely accessible digital interface which will tell the story of forced sterilization in America during the 20th century. The first half summer, I focused on gathering records of HIPPA laws for each state in the United States and researching the archived records of Vermont and New Hampshire state institutions which performed forced sterilization. I then transferred my focus to analyzing the dataset of 30,000 sterilization records and creating visualizations to best represent that data.
At the start of the summer, Caroline, Professor Wernimont, and I met to discuss what we might be interested in working on this summer. Having never researched before this summer, the question seemed unanswerable.
When beginning work on research such as Eugenic Rubicon as a student, it is overwhelming to work around the people in the lab and work with the material every day. My knowledge of the subject grew tenfold simply from reading the grant proposal which is why it is so hard to know what to do because I don’t know what is important.
To introduce myself to the difficulties of research and the prospective impact it can have, I began tracking down state HIPAA laws regarding the disclosure of an individual’s medical records after death. There is a federal law which mandates that the state can release medical records fifty years after the death of the individual. However, this only serves as a baseline such that states have the power to lengthen the amount of time until the time of release. I was assigned the second half of the fifty states, spent almost two weeks researching and found definitive and complete records for four of the twenty-five. Through this work, I discovered the extreme disorganization of government documents and lack of transparency. In many cases, laws restated the HIPPA Privacy Rule verbatim, yet there was no mention of the timeline for releasing medical records. The frustration of this task stemmed from the question of whether the state defaults to HIPPA or if I was not “looking hard enough.” After a few weeks, I transitioned from searching the websites of each states’ laws to calling the offices of state medical boards. I only received a response from the Vermont Board of Medical Practice.
In mid-July, I was introduced to the data set of the sterilization records. During a team meeting, we scrolled through the extensive variables which the data set accounts for. For many of the questions, there was simply a check-box answer and missing was an explanation. To see that people were sterilized by a quick check of a box listed “Dementia” or “Alcoholic” without any extensive rationale for why the decision for sterilization was made was incredibly troubling for me. My past experiences with data analysis, I was given mundane data sets, like election results or football statistics, and wrangle them to ultimately find “meaning” through a correlation of visualization. When you regularly work with large data sets, it is easy to get desensitized to what is behind the numbers. But as I looked through the data to find what I wanted to focus on, I found myself shying away from looking at each patient because I could not find a way to tell their story the way I felt they deserved. Therefore, I turned to who I saw as the “enemy”: the superintendents. I decided to try to analyze each superintendent and the choices they made concerning the sterilizations. I found that some superintendents tended to be drawn towards certain diagnoses, such as feeblemindedness. I found this striking because I could not make a logical conclusion that one institution just happened to have more feebleminded patients than any other.
Initially, this seemed like a simple visualization, however, as I took a closer look at the data, I realized that one superintendent could have multiple aliases, in a sense. Therefore, different bits of data were spread among different variations of one individual. For example, N.E. Williamson was listed under multiple names and thus although he altogether had 240 patients, “N.E. Williamson” would have only accounted for 40.
These inconsistencies were a result of human input, as the name of the superintendent was often written on each patient record. To fix this, I used regular expressions so that names could be quickly recognized as the same person and then all of that information could be attributed to one individual. The final figure shows the count of different diagnoses each superintendent signed off on during their time at the institution. To decrease clutter, I only included superintendents with more than 100,000 patients. The number of diagnoses exceeds expected values in which they should total to 30,000, but some patients had more than one diagnosis.
The second figure was inspired by the superintendent figure because I noticed a significant proportion of diagnoses across almost superintendents was “Other”. All summer I tried to distance myself from the data, but the instance of “Other” completely occupied my thoughts. The list of specific diagnoses is in no way limited, yet for a significant number of individuals, “Other” is the reason they were sterilized. The figure shows “Other” compared to other diagnoses with high frequencies. The institutions shown are the top three institutions with the greatest number of patients or more than 100,000 over the years 1920- 1960.