eugenic rubicon – Digital Justice Lab

by Caroline Casey

As the summer draws to a close, I am wrapping up research I have contributed to the Eugenic Rubicon project under the direction of Dr. Jacque Wernimont and Dr. Meredith B. Ferguson at Dartmouth College. At Dartmouth, I am an undergraduate Quantitative Social Science major with an interest in the human dimensions of data. I hope to contribute to research that promotes health care access and holds power to account, so I was naturally drawn to Jacque’s project. Using my background in statistics and skill in R, my role this summer was to study the over 20,000 available records of patients who were sterilized in California from 1919 to 2014. My goal in this work is to balance the desire to summarize these records in a cohesive manner with the imperative to treat every record as the violent ethics violation that it was so as not to flatten the data in any way that minimizes what it represents. In providing a detailed account of my work on this project and the rationale behind my decisions, I hope to emphasize that my visualizations are one of many, many imperfect ways to represent this data and to highlight the tradeoffs associated with visualization. With that being said, I am excited about the visualizations I created this summer and hope that they draw attention to the history of sterilization in the United States.

Documented Reason for Sterilization by Year

Figure 2: Eugenics Rubicon data visualization

Figure 3: Eugenics Rubicon data visualization

The data I worked with in this project came from hand transcriptions of records found in California by xxx which were inputted into Redcap by a group of undergraduate researchers at the University of Michigan. I am not personally too familiar with Redcap because I never actually used it during my research, but from what I know it is a program that allows for research of confidential documents such as medical records by deidentifying data and phase shifting dates. The data that I used was a csv file from Redcap of deidentified data from the California files.

The data as I received it from Jacque was not particularly suitable for data visualization because it was not “tidy”. This means that each possible observation for one variable was spread out in a column of its own as opposed to one column that says ‘variable’ that is complete with the specific observation. I had to “tidy” this data before I could use it for the manipulations I wanted to do in R. I also renamed some of the variables which initially had long names. For example, the observations for who consented to sterilization were of the form “Who.consented. . . .choice.Mother.” and I changed them to the form “Mother.”

I also found some errors in the years entered for some of the records. The vast majority of the records were from 1919 to 1952, however there were a total of 8 entries from 922, 937, 939, 1022, 1834, and 4925. The one entry from 1834 may have been correct, however that would be surprising since would have been almost 100 years before the next record entry. The other entries, while incorrect, were plausible mistakes (922 probably meant 1922, for example), so I chose not to remove them from the dataset. Instead, I left them in for visualizations that were not dependent on year and subset my observations based on year to 1919 to 1952. When I did this, I chose to leave out 16 total observations that occurred in 1960, 1985, 1992, 1993, 1998, and 2014. This was to prevent distortion caused by large amounts of whitespace between years when I first visualized the data with these years, as shown below.

While I believe this was the best way to visualize changes over time, it does leave some records out. I didn’t have enough time to this summer, but I would love to create a more qualitative summary of these later records in the future.

1935 Analysis

To decide what to focus on in 1935, I looked at the sterilizations by institution in 1934 and 1935. When I looked at these years, I saw that Sonoma and Patton had the most sterilizations in 1935 by far, followed by Napa and Stockton. All of those institutions had dramatic increases in sterilizations from 1934 to 1935.

Then I looked at superintendents for those years to see if there were any specific superintendents who oversaw a large number of sterilizations, particularly any new superintendents. G.M. Webster oversaw 973 sterilizations in 1935, followed by F.O. Butler who oversaw 664 sterilizations in 1935. I was stunned by the number of sterilizations that both of them oversaw, and it made me wonder what they were diagnosing in 1935 and if it was different than what they diagnosed in 1934.

When I looked at the number of diagnoses under G.M. Webster from 1934 to 1935, I saw that he oversaw the diagnosis of “Hebephrenic” for the first time in 1935 and there were 245 people with that. I also saw that the number of people diagnosed with “Dementia” under him increased from 18 to 373 from 1934 to 1935. When I looked at the data on the sterilizations Butler oversaw, I noticed that his diagnoses of “Feebleminded” increased from 108 to 312 from 1934 to 1935, and that his diagnoses of “Physically Negative” increased by 118 between those years. I have many questions about how and why the diagnoses changed that much in that year. I would love to do more research on these diagnoses as well as G.M. Webster and F.O. Butler. With the data, however, I decided to visualize the changes in the 1930s for all diagnoses.

I created a few figures based on this information to visualize how diagnoses changed in the 1930s. These figures clearly show the dramatic increases in diagnoses of “Hebephrenic”, “Dementia”, and “Feebleminded”, as well as the increase in the number of “Physically Negative” diagnoses. These visualizations show that G.M. Webster and F.O. Butler’s changes in the number of sterilizations they oversaw stand out in the data for that year. They show a peak in all diagnoses for 1935, indicating that 1935 was an important year beyond the changes in the actions of these two superintendents.

by Kirby Phares

Over the 2019 Summer Term at Dartmouth College, I worked on the Eugenic Rubicon project under Dr. Jaqueline Wernimont and Dr. Meredith Ferguson. Drawing upon sterilization records, the Eugenic Rubicon project seeks to create a widely accessible digital interface which will tell the story of forced sterilization in America during the 20th century. The first half summer, I focused on gathering records of HIPPA laws for each state in the United States and researching the archived records of Vermont and New Hampshire state institutions which performed forced sterilization. I then transferred my focus to analyzing the dataset of 30,000 sterilization records and creating visualizations to best represent that data.

At the start of the summer, Caroline, Professor Wernimont, and I met to discuss what we might be interested in working on this summer. Having never researched before this summer, the question seemed unanswerable.

When beginning work on research such as Eugenic Rubicon as a student, it is overwhelming to work around the people in the lab and work with the material every day. My knowledge of the subject grew tenfold simply from reading the grant proposal which is why it is so hard to know what to do because I don’t know what is important.

To introduce myself to the difficulties of research and the prospective impact it can have, I began tracking down state HIPAA laws regarding the disclosure of an individual’s medical records after death. There is a federal law which mandates that the state can release medical records fifty years after the death of the individual. However, this only serves as a baseline such that states have the power to lengthen the amount of time until the time of release. I was assigned the second half of the fifty states, spent almost two weeks researching and found definitive and complete records for four of the twenty-five. Through this work, I discovered the extreme disorganization of government documents and lack of transparency. In many cases, laws restated the HIPPA Privacy Rule verbatim, yet there was no mention of the timeline for releasing medical records. The frustration of this task stemmed from the question of whether the state defaults to HIPPA or if I was not “looking hard enough.” After a few weeks, I transitioned from searching the websites of each states’ laws to calling the offices of state medical boards. I only received a response from the Vermont Board of Medical Practice.

In mid-July, I was introduced to the data set of the sterilization records. During a team meeting, we scrolled through the extensive variables which the data set accounts for. For many of the questions, there was simply a check-box answer and missing was an explanation. To see that people were sterilized by a quick check of a box listed “Dementia” or “Alcoholic” without any extensive rationale for why the decision for sterilization was made was incredibly troubling for me. My past experiences with data analysis, I was given mundane data sets, like election results or football statistics, and wrangle them to ultimately find “meaning” through a correlation of visualization. When you regularly work with large data sets, it is easy to get desensitized to what is behind the numbers. But as I looked through the data to find what I wanted to focus on, I found myself shying away from looking at each patient because I could not find a way to tell their story the way I felt they deserved. Therefore, I turned to who I saw as the “enemy”: the superintendents. I decided to try to analyze each superintendent and the choices they made concerning the sterilizations. I found that some superintendents tended to be drawn towards certain diagnoses, such as feeblemindedness. I found this striking because I could not make a logical conclusion that one institution just happened to have more feebleminded patients than any other.

Initially, this seemed like a simple visualization, however, as I took a closer look at the data, I realized that one superintendent could have multiple aliases, in a sense. Therefore, different bits of data were spread among different variations of one individual. For example, N.E. Williamson was listed under multiple names and thus although he altogether had 240 patients, “N.E. Williamson” would have only accounted for 40.

Figure 1: Eugenics Rubicon data visualization

These inconsistencies were a result of human input, as the name of the superintendent was often written on each patient record. To fix this, I used regular expressions so that names could be quickly recognized as the same person and then all of that information could be attributed to one individual. The final figure shows the count of different diagnoses each superintendent signed off on during their time at the institution. To decrease clutter, I only included superintendents with more than 100,000 patients. The number of diagnoses exceeds expected values in which they should total to 30,000, but some patients had more than one diagnosis.

The second figure was inspired by the superintendent figure because I noticed a significant proportion of diagnoses across almost superintendents was “Other”. All summer I tried to distance myself from the data, but the instance of “Other” completely occupied my thoughts. The list of specific diagnoses is in no way limited, yet for a significant number of individuals, “Other” is the reason they were sterilized. The figure shows “Other” compared to other diagnoses with high frequencies. The institutions shown are the top three institutions with the greatest number of patients or more than 100,000 over the years 1920- 1960.

Eugenic Rubicon project update

Eugenic Rubicon Project Blog