by Caroline Casey
As the summer draws to a close, I am wrapping up research I have contributed to the Eugenic Rubicon project under the direction of Dr. Jacque Wernimont and Dr. Meredith B. Ferguson at Dartmouth College. At Dartmouth, I am an undergraduate Quantitative Social Science major with an interest in the human dimensions of data. I hope to contribute to research that promotes health care access and holds power to account, so I was naturally drawn to Jacque’s project. Using my background in statistics and skill in R, my role this summer was to study the over 20,000 available records of patients who were sterilized in California from 1919 to 2014. My goal in this work is to balance the desire to summarize these records in a cohesive manner with the imperative to treat every record as the violent ethics violation that it was so as not to flatten the data in any way that minimizes what it represents. In providing a detailed account of my work on this project and the rationale behind my decisions, I hope to emphasize that my visualizations are one of many, many imperfect ways to represent this data and to highlight the tradeoffs associated with visualization. With that being said, I am excited about the visualizations I created this summer and hope that they draw attention to the history of sterilization in the United States.
The data I worked with in this project came from hand transcriptions of records found in California by xxx which were inputted into Redcap by a group of undergraduate researchers at the University of Michigan. I am not personally too familiar with Redcap because I never actually used it during my research, but from what I know it is a program that allows for research of confidential documents such as medical records by deidentifying data and phase shifting dates. The data that I used was a csv file from Redcap of deidentified data from the California files.
The data as I received it from Jacque was not particularly suitable for data visualization because it was not “tidy”. This means that each possible observation for one variable was spread out in a column of its own as opposed to one column that says ‘variable’ that is complete with the specific observation. I had to “tidy” this data before I could use it for the manipulations I wanted to do in R. I also renamed some of the variables which initially had long names. For example, the observations for who consented to sterilization were of the form “Who.consented. . . .choice.Mother.” and I changed them to the form “Mother.”
I also found some errors in the years entered for some of the records. The vast majority of the records were from 1919 to 1952, however there were a total of 8 entries from 922, 937, 939, 1022, 1834, and 4925. The one entry from 1834 may have been correct, however that would be surprising since would have been almost 100 years before the next record entry. The other entries, while incorrect, were plausible mistakes (922 probably meant 1922, for example), so I chose not to remove them from the dataset. Instead, I left them in for visualizations that were not dependent on year and subset my observations based on year to 1919 to 1952. When I did this, I chose to leave out 16 total observations that occurred in 1960, 1985, 1992, 1993, 1998, and 2014. This was to prevent distortion caused by large amounts of whitespace between years when I first visualized the data with these years, as shown below.
While I believe this was the best way to visualize changes over time, it does leave some records out. I didn’t have enough time to this summer, but I would love to create a more qualitative summary of these later records in the future.
To decide what to focus on in 1935, I looked at the sterilizations by institution in 1934 and 1935. When I looked at these years, I saw that Sonoma and Patton had the most sterilizations in 1935 by far, followed by Napa and Stockton. All of those institutions had dramatic increases in sterilizations from 1934 to 1935.
Then I looked at superintendents for those years to see if there were any specific superintendents who oversaw a large number of sterilizations, particularly any new superintendents. G.M. Webster oversaw 973 sterilizations in 1935, followed by F.O. Butler who oversaw 664 sterilizations in 1935. I was stunned by the number of sterilizations that both of them oversaw, and it made me wonder what they were diagnosing in 1935 and if it was different than what they diagnosed in 1934.
When I looked at the number of diagnoses under G.M. Webster from 1934 to 1935, I saw that he oversaw the diagnosis of “Hebephrenic” for the first time in 1935 and there were 245 people with that. I also saw that the number of people diagnosed with “Dementia” under him increased from 18 to 373 from 1934 to 1935. When I looked at the data on the sterilizations Butler oversaw, I noticed that his diagnoses of “Feebleminded” increased from 108 to 312 from 1934 to 1935, and that his diagnoses of “Physically Negative” increased by 118 between those years. I have many questions about how and why the diagnoses changed that much in that year. I would love to do more research on these diagnoses as well as G.M. Webster and F.O. Butler. With the data, however, I decided to visualize the changes in the 1930s for all diagnoses.
I created a few figures based on this information to visualize how diagnoses changed in the 1930s. These figures clearly show the dramatic increases in diagnoses of “Hebephrenic”, “Dementia”, and “Feebleminded”, as well as the increase in the number of “Physically Negative” diagnoses. These visualizations show that G.M. Webster and F.O. Butler’s changes in the number of sterilizations they oversaw stand out in the data for that year. They show a peak in all diagnoses for 1935, indicating that 1935 was an important year beyond the changes in the actions of these two superintendents.