COVID Black (Purdue University), a Black Digital Humanities (Black DH) collective and an early response taskforce on Black health and data and the Digital Justice Lab (part of Dartmouth’s Digital Humanities and Social Engagement Cluster) are pleased to announce a strategic collaboration to develop several small-scale digital projects that will address Black communities impacted by COVID-19. By connecting COVID Black’s expertise in theories and methods around data and the Black lived experience with the Digital Justice Lab’s expertise in computational tools, the collaboration represents a progressive and community-centered means for effecting positive change in how Black health data is collected and disseminated. We hope that the joint-effort between COVID Black and the Digital Justice Lab will serve as a model for other cross-organizational collaborations around a shared vision and common goals.
As the summer draws to a close, I am wrapping up research I have contributed to the Eugenic Rubicon project under the direction of Dr. Jacque Wernimont and Dr. Meredith B. Ferguson at Dartmouth College. At Dartmouth, I am an undergraduate Quantitative Social Science major with an interest in the human dimensions of data. I hope to contribute to research that promotes health care access and holds power to account, so I was naturally drawn to Jacque’s project. Using my background in statistics and skill in R, my role this summer was to study the over 20,000 available records of patients who were sterilized in California from 1919 to 2014. My goal in this work is to balance the desire to summarize these records in a cohesive manner with the imperative to treat every record as the violent ethics violation that it was so as not to flatten the data in any way that minimizes what it represents. In providing a detailed account of my work on this project and the rationale behind my decisions, I hope to emphasize that my visualizations are one of many, many imperfect ways to represent this data and to highlight the tradeoffs associated with visualization. With that being said, I am excited about the visualizations I created this summer and hope that they draw attention to the history of sterilization in the United States.
The data I worked with in this project
came from hand transcriptions of records found in California by xxx
which were inputted into Redcap by a group of undergraduate researchers
at the University of Michigan. I am not personally too familiar with
Redcap because I never actually used it during my research, but from
what I know it is a program that allows for research of confidential
documents such as medical records by deidentifying data and phase
shifting dates. The data that I used was a csv file from Redcap of
deidentified data from the California files.
The data as I received it from Jacque was
not particularly suitable for data visualization because it was not
“tidy”. This means that each possible observation for one variable was
spread out in a column of its own as opposed to one column that says
‘variable’ that is complete with the specific observation. I had to
“tidy” this data before I could use it for the manipulations I wanted to
do in R. I also renamed some of the variables which initially had long
names. For example, the observations for who consented to sterilization
were of the form “Who.consented. . . .choice.Mother.” and I changed them to the form “Mother.”
I also found some errors in the years entered for some of
the records. The vast majority of the records were from 1919 to 1952,
however there were a total of 8 entries from 922, 937, 939, 1022, 1834,
and 4925. The one entry from 1834 may have been correct, however that
would be surprising since would have been almost 100 years before the
next record entry. The other entries, while incorrect, were plausible
mistakes (922 probably meant 1922, for example), so I chose not to
remove them from the dataset. Instead, I left them in for visualizations
that were not dependent on year and subset my observations based on
year to 1919 to 1952. When I did this, I chose to leave out 16 total
observations that occurred in 1960, 1985, 1992, 1993, 1998, and 2014.
This was to prevent distortion caused by large amounts of whitespace
between years when I first visualized the data with these years, as
shown below.
While I believe this was the best way to
visualize changes over time, it does leave some records out. I didn’t
have enough time to this summer, but I would love to create a more
qualitative summary of these later records in the future.
1935 Analysis
To decide what to focus on in 1935, I looked
at the sterilizations by institution in 1934 and 1935. When I looked at
these years, I saw that Sonoma and Patton had the most sterilizations in
1935 by far, followed by Napa and Stockton. All of those institutions
had dramatic increases in sterilizations from 1934 to 1935.
Then I looked at superintendents for those
years to see if there were any specific superintendents who oversaw a
large number of sterilizations, particularly any new superintendents.
G.M. Webster oversaw 973 sterilizations in 1935, followed by F.O. Butler
who oversaw 664 sterilizations in 1935. I was stunned by the number of
sterilizations that both of them oversaw, and it made me wonder what
they were diagnosing in 1935 and if it was different than what they
diagnosed in 1934.
When I looked at the number of diagnoses
under G.M. Webster from 1934 to 1935, I saw that he oversaw the
diagnosis of “Hebephrenic” for the first time in 1935 and there were 245
people with that. I also saw that the number of people diagnosed with
“Dementia” under him increased from 18 to 373 from 1934 to 1935. When I
looked at the data on the sterilizations Butler oversaw, I noticed that
his diagnoses of “Feebleminded” increased from 108 to 312 from 1934 to
1935, and that his diagnoses of “Physically Negative” increased by 118
between those years. I have many questions about how and why the
diagnoses changed that much in that year. I would love to do more
research on these diagnoses as well as G.M. Webster and F.O. Butler.
With the data, however, I decided to visualize the changes in the 1930s
for all diagnoses.
I created a few figures based on this
information to visualize how diagnoses changed in the 1930s. These
figures clearly show the dramatic increases in diagnoses of
“Hebephrenic”, “Dementia”, and “Feebleminded”, as well as the increase
in the number of “Physically Negative” diagnoses. These visualizations
show that G.M. Webster and F.O. Butler’s changes in the number of
sterilizations they oversaw stand out in the data for that year. They
show a peak in all diagnoses for 1935, indicating that 1935 was an
important year beyond the changes in the actions of these two
superintendents.
Over the 2019 Summer Term at Dartmouth College, I
worked on the Eugenic Rubicon project under Dr. Jaqueline Wernimont and
Dr. Meredith Ferguson. Drawing upon sterilization records, the Eugenic
Rubicon project seeks to create a widely accessible digital interface
which will tell the story of forced sterilization in America during the
20th century. The first half summer, I focused on gathering records of
HIPPA laws for each state in the United States and researching the
archived records of Vermont and New Hampshire state institutions which
performed forced sterilization. I then transferred my focus to analyzing
the dataset of 30,000 sterilization records and creating visualizations
to best represent that data.
At the start of the summer, Caroline, Professor
Wernimont, and I met to discuss what we might be interested in working
on this summer. Having never researched before this summer, the question
seemed unanswerable.
When beginning work on research such as Eugenic Rubicon
as a student, it is overwhelming to work around the people in the lab
and work with the material every day. My knowledge of the subject grew
tenfold simply from reading the grant proposal which is why it is so
hard to know what to do because I don’t know what is important.
To introduce myself to the difficulties of research and
the prospective impact it can have, I began tracking down state HIPAA
laws regarding the disclosure of an individual’s medical records after
death. There is a federal law which mandates that the state can release
medical records fifty years after the death of the individual. However,
this only serves as a baseline such that states have the power to
lengthen the amount of time until the time of release. I was assigned
the second half of the fifty states, spent almost two weeks researching
and found definitive and complete records for four of the twenty-five.
Through this work, I discovered the extreme disorganization of
government documents and lack of transparency. In many cases, laws
restated the HIPPA Privacy Rule verbatim, yet there was no mention of
the timeline for releasing medical records. The frustration of this task
stemmed from the question of whether the state defaults to HIPPA or if I
was not “looking hard enough.” After a few weeks, I transitioned from
searching the websites of each states’ laws to calling the offices of
state medical boards. I only received a response from the Vermont Board
of Medical Practice.
In mid-July, I was introduced to the data set of the
sterilization records. During a team meeting, we scrolled through the
extensive variables which the data set accounts for. For many of the
questions, there was simply a check-box answer and missing was an
explanation. To see that people were sterilized by a quick check of a
box listed “Dementia” or “Alcoholic” without any extensive rationale for
why the decision for sterilization was made was incredibly troubling
for me. My past experiences with data analysis, I was given mundane data
sets, like election results or football statistics, and wrangle them to
ultimately find “meaning” through a correlation of visualization. When
you regularly work with large data sets, it is easy to get desensitized
to what is behind the numbers. But as I looked through the data to find
what I wanted to focus on, I found myself shying away from looking at
each patient because I could not find a way to tell their story the way I
felt they deserved. Therefore, I turned to who I saw as the “enemy”:
the superintendents. I decided to try to analyze each superintendent and
the choices they made concerning the sterilizations. I found that some
superintendents tended to be drawn towards certain diagnoses, such as
feeblemindedness. I found this striking because I could not make a
logical conclusion that one institution just happened to have more
feebleminded patients than any other.
Initially, this seemed like a simple visualization, however, as I took a closer look at the data, I realized that one superintendent could have multiple aliases, in a sense. Therefore, different bits of data were spread among different variations of one individual. For example, N.E. Williamson was listed under multiple names and thus although he altogether had 240 patients, “N.E. Williamson” would have only accounted for 40.
These inconsistencies were a result of human input, as the name of the superintendent was often written on each patient record. To fix this, I used regular expressions so that names could be quickly recognized as the same person and then all of that information could be attributed to one individual. The final figure shows the count of different diagnoses each superintendent signed off on during their time at the institution. To decrease clutter, I only included superintendents with more than 100,000 patients. The number of diagnoses exceeds expected values in which they should total to 30,000, but some patients had more than one diagnosis.
The second figure was inspired by the superintendent
figure because I noticed a significant proportion of diagnoses across
almost superintendents was “Other”. All summer I tried to distance
myself from the data, but the instance of “Other” completely occupied my
thoughts. The list of specific diagnoses is in no way limited, yet for a
significant number of individuals, “Other” is the reason they were
sterilized. The figure shows “Other” compared to other diagnoses with
high frequencies. The institutions shown are the top three institutions
with the greatest number of patients or more than 100,000 over the years
1920- 1960.