Undergraduate Researcher Projects
Anticipated Projects for 2026
Dramatic Characters and Forms of Identity
Faculty PI: Mark Algee-Hewitt, Department of English
How do character relationships evolve over the course of dramatic texts? Are there stable relationship structures that clearly articulate power dynamics, information flow, and plot importance across different genres and periods of drama? And how separable are character dynamics from plot mechanisms? This project, a collaboration between the Stanford Literary Lab and David Bamman’s research group at UC Berkeley builds on past works by both groups to create a pipeline for automatically annotating and analyzing a corpus of English language drama. Previous work by Bamman’s group on video analysis used image and sound recognition to identify relationships in individual films and annotate speakers and receivers of dialogue. Here, we ask if similar techniques could be applied to play scripts, which lack the visual cues that television analysis depends on. Once we establish a pipeline for identifying character relationships in our corpus of 3963 plays from the sixteenth century to the present, we will then use this information to find “stock” relationship dynamics that repeat across different dramas at a scale. Our project will then explore how these types of relationships can transcend character identity in order to subvert audience expectations: for example, how relationship forms that are typical of an eighteenth-century heterosexual couple can be redeployed in same sex relationships by nineteenth-century authors to covertly queer code characters.
Students will work with an interdisciplinary team from the Literary Lab to parse the raw xml of the corpus, build BERT-based models to identify relationship structures and local outliers where relationship forms do not match the stated character identities within the drama.
Style in the Age of AI
Faculty PI: Mark Algee-Hewitt, Department of English
One of the primary assumptions in contemporary literary criticism is the association between style and meaning: how something is written affects what it can mean. This projects puts this connection between style and meaning to the test through a series of experiments that seek to define the relationship between meaning and expression outside of the domain of creative fiction or poetry. Style, we argue, adheres at the disciplinary level: when students are trained as historians or anthropologists, a key, but frequently unspoken, component of that training is in writing in the style of the discipline. Our project asks whether these stylistic differences are detectable at the scale of the individual text such that we can recognize whether a piece of writing belongs to psychology or sociology based not on the semantics of its language, but purely on the grammatical and syntactical features of the writing. Beyond being about the human mind or society, is there something inherently recognizable in how articles in these disparate areas are written? New advances in generative AI provide a new platform for experimentation: using a series of carefully designed prompts and training texts for fine tuning models, we can hold content relatively stable while varying the style of the writing, thereby generating corpora through which we can ask these questions.
Students in the project will design a series of structured prompts using an API to a frontier generative AI model in order to create a corpus of multi-disciplinary writing over constant subjects. They will then learn the basics of text analysis as they explore how content features change as the style of the text is varied.
Mapping Multinational Ghosts
Faculty PI: Mark Algee-Hewitt, Department of English
This project lies at the intersection between spatial humanities and cultural studies. The ghost story is a powerful multi-national signifier, revealing key cultural assumptions and anxieties. But in addition to the kinds of ghosts that vary between cultures and nationalities, the placement of ghosts is also at issue. In this work we ask whether they are identifiable patterns in where ghosts are represented as haunting across cultures and languages. Beginning with a multi-lingual corpus of ghost stories from the US, England, Singapore, China, Japan, and India, we will use NLP techniques to identify the ghosts in each story, before extracting them as entities and geocoding their location within the narrative framework of the story. We will then be able to create GIS maps of the locations where these ghosts can be found, layering on demographic, population, and geo-spatial information to determine the factors that affect the location of hauntings. Are ghosts primarily rural or urban? Are they associated with particular peoples or types of settlements?
Students will learn the basics of multi-lingual digital humanities, including ingesting and analyzing texts across five different languages. They will also play a pivotal role in extracting spatial information from the texts and will learn to encode and display this information within a GIS system, creating map-based visualizations that demonstrate cultural patterns in ghost stories.
Computational Focalization
Faculty PIs: Mark Algee-Hewitt, Department of English and Paula Moya, Department of English
Focalization is the localization of narrative perspective to a single character’s viewpoint. Whether in first or third person, limited or omniscient, narrative in novels is most often confined to a particular identifiable character, which remains consistent throughout the text. We are accustomed, for example, to read a novel from the perspective of the protagonist. Over the twentieth century, however, there has been a marked increase in multi-focalized novels: texts wherein the focalization jumps from character to character, frequently without being identified as such through textual apparati. Although finding its origins in modernist experimentation, this literary device has moved outside of purely experimental texts and is now a growing feature of mainstream literary fiction. In this project we investigate the rise of multi-focal novels by automating the discovery of perspectival shifts across a massive corpus (n~10,000) twentieth-century novels. Can we identify shifts in focalization computationally? And if so, can we trace this history of this device across the 20th century. Finally, is this device more frequent in literature written by non-US authors such that the rise of multi-focalized novels is a function of the rise in immigrant fiction in North America?
Students will learn the basics of narrative theory and focalization as they work with the project PIs to develop a train a model that detect the character focalizing different portions of a novel. They will then apply this model across a corpus of texts and work with the research team to analyze the causal mechanisms behind the recent rise of multi-focalized narratives.
Mapping Shared Sacred Sites
Faculty PI: Anna Bigelow, Department of Religious Studies
Despite the existence of numerous shared sites of religious observance across the world, they remain largely unknown. Shared sacred sites are “holy” for members of multiple religious groups (which may also be ethnically or nationally distinct) and serve not only as places where people come together to respect the site in various ways, but also as sites where they are forced, by their coexistence, to mediate and negotiate their diversity and differences. This ethos of sharing has been customary throughout the world and throughout history. This project proposes to restore accounts of cohabitation, hospitality, and tolerance to the historical record, taking their place alongside the better-known examples of communal strife and interreligious antagonism.
Visible Bodies
Faculty PI: Joel Cabrita, Department of History
The Visible Bodies project investigates the invisibility of African female writers within twentieth-century literary canons by highlighting the work and legacy of South African scholar and activist Regina Twala (1908–1968). Despite her prolific output, including four unpublished manuscripts, Twala’s contributions have remained largely overlooked. The project seeks to recover and reinterpret her writings while building a broader foundation for the study of women-authored texts from this period.
Students will work with an interdisciplinary team of historians, literary scholars, and archivists to create a critical digital archive displaying the key work of largely unknown 20th-century African female writers. They will engage in archival research, digitization, and the curation of primary source materials, contributing to an open scholarly resource aimed at both researchers and the general public, with a particular emphasis on accessibility for audiences across the African continent.
Martin Luther King, Jr. Digitization Project
Faculty PI: Lerone Martin, Departments of African and African American Studies & Religious Studies
The mission of the Martin Luther King Jr. Research and Education Institute is preserve and promote the work and legacy of MLK. We are currently working on a unique project: making our archival holdings of MLK, one of the most iconic individuals of the 20th century, accessible online to a 21st century public. The Papers of Martin Luther King, Jr. project began in 1985, It is a comprehensive collection of King's most significant correspondence, sermons, speeches, published writings, and unpublished manuscripts. Seven volumes (documented 1929 to 1962) have been published with some content available online and the 8th is in the works. Each volume contains approximately 180 documents. They have become essential reference works for researchers and have influenced scholarship about King and the movements he inspired. However, these large books are pricey and not accessible to all. We intend to build a searchable database and accompanying website that would enable scholars and the public to access, analyze
Mapping Stanford Global Studies
Faculty PI: Jisha Menon, Department of Theater and Performance Studies & Director of the Stanford Global Studies Division
This project, developed in collaboration with Stanford Global Studies (SGS), seeks to design an interactive map that makes visible the division’s extensive international engagements. By mapping activities such as fieldwork, internships, language study, and conference participation by students, faculty, visiting scholars, and postdoctoral fellows, the project explores how geospatial representation can communicate institutional networks, flows of knowledge, and sites of exchange.
Students will engage in the full research process assisting in collecting, cleaning, and structuring data for visualization, designing metadata schemas, and developing interpretive frameworks that situate global academic activity in spatial and historical context. Using a web-based mapping platform such as ArcGIS StoryMaps, students will gain hands-on experience in digital storytelling, spatial data visualization, and the communication of research impact on a global scale.
SILICON: Advancing Digitally Disadvantaged Languages in the 21st Century
In the 21st century, language death and digital exclusion have become linked in a mutually reinforcing cycle of marginalization and extinction. The gap that separates the top 100 languages from Digitally Disadvantaged Languages (DDLs) is steadily becoming a chasm, and the ramifications of this widening divide are profound. 6000-plus DDLs confront an existential crisis, predominantly among minority and indigenous communities: either we change business as usual, or language death will resolve the problem on its own. SILICON (Stanford Initiative on Language Inclusion and Conservation in Old and New Media) is committed to making all DDLs usable in digital environments. Our initiative is unique in seeking to combine human-centered approaches with technological innovation to change the current trajectory of language inclusion. SILICON is now in year two of its existence, having launched a successful internship and practitioners program to help advance Digitally Disadvantaged Languages worldwide. Key milestones in 2024-25 will be to (a) scale up and accelerate the process of digital inclusion for at-risk and under-resourced languages and (b) intensify and deepen community outreach to ensure a digital age that addresses deeply felt community needs.
Early Cape Travelers
Faculty PI: Grant Parker, Departments of African and African American Studies, and of Classics
This project canvases maps, both historical and contemporary, to make sense of human geographies at the Cape of Good Hope -- in other words the relation of humans to physical landscape. We'll focus on the early colonial period, roughly the late-1600s and 1700s, when the Cape was under Dutch control. The project continues and integrates earlier CESTA-supported projects, in which our team mapped the hinterland journeys of Hendrik Swellengrebel (1776-77), Francois LeVaillant (1781-83), and Peter Kolbe. Some guiding questions are as follows: How can we map early European journeys (for which we have extensive written records) in relation to locations as they are now known? How can we triangulate these early European travel narratives with archaeological sites and with indigenous Khoe place names?
The next stage of the Early Cape Pathways project will be to make detailed use of aggregative studies – especially E.L. Mossop's Old Cape Highways and V.S. Forbes' Pioneer Travellers in South Africa: a geographical commentary upon routes, records, observations and opinions of travelers at the Cape, 1750-1800 – to see to what degree known travel routes converge, and to what degree they coincide with known routes and locations. The immediate target audience for this will be scholars but eventually our findings will generate assets we'll tailor for use in schools as well as the non-profit tourism sector.
Indigenous South African Archives Portal
Faculty PI: Grant Parker, Departments of African and African American Studies, and Classics
This project aims to create a portal where the cultural artifacts, languages, and archives related to the San and Khoi peoples of Western South Africa can be explored within one platform. We will build on projects that have already been made around digitized archives of cultural artifacts, maps, and a variety of texts about the San and Khoi peoples. We will catalog and create an exhibit for interview videos of San and Khoi peoples. The portal will include a dedicated database for documenting and preserving languages spoken by the San and Khoi peoples, with a particular focus on those at risk of being lost. This linguistic database will feature audio recordings, phonetic transcriptions, and translations, providing valuable resources for language revitalization and research. By cataloging vocabulary, grammar structures, and traditional expressions, the database aims to support efforts to keep these languages alive for future generations. The portal will prioritize accessibility and user-friendly navigation, allowing users to explore these rich collections in a cohesive and immersive environment. It will include interactive elements, such as multimedia timelines and geographical overlays, to provide contextual understanding of the histories and traditions of the San and Khoi peoples. Additionally, we plan to incorporate educational resources and storytelling features to foster a deeper appreciation and understanding of the cultural significance of these artifacts and oral histories.
Aramaic Video Games
Faculty PI: Michael Penn, Department of Religious Studies
Throughout much of pre-modernity, the most geographically expansive church was not the Roman Catholic Church or the Byzantine Orthodox Church, but rather churches which reached from modern-day Turkey, throughout the Middle East, across Afghanistan, down to India, up to Tibet, and into China. For these churches the primary language of Christian scholarship and liturgy was the lingua franca of the late ancient Middle East, a dialect of Aramaic known as Syriac. Yet a focus on more western branches of Christianity has meant that Syriac Christians have essentially been written out of history.
Today, about ten million modern Christians trace their lineage back to the Syriac churches of old. Until recently most of them remained in the lands of their tradition’s birth: modern-day Iran, Iraq, Syria, Israel, Palestine, Lebanon, and eastern Turkey. During the First World War a genocide targeted Syriac Christians in the Ottoman Empire, killing between 250,000-500,000 of them. This was followed by periods of discrimination and persecution in Iran, Iraq, and Turkey. The twenty-first century is not looking much better. The chaos following the second Iraq war decimated the Syriac churches in Iraq and led to massive emigration and dislocation. The civil war in Syria has been even more destructive to these communities and their patrimony. Their congregations have been targeted by Islamists, their churches have been destroyed by the Islamic State movement, and their leaders have been kidnapped and killed.
But those interested in reclaiming the history of Syriac Christianity keep running into the same problem. For professors, students, and for modern heritage communities there are minimal pedagogical resources for teaching or learning this under resourced language. For example, although there are a few textbooks on first year Syriac, there isn’t a single textbook for intermediate or advanced study. Over the last three years CESTA interns have been changing that. Based on best practices in language pedagogy, our team uses gamification as an effective means to acquire and drill the Syriac language and our current set of grammar and vocabulary games can be found at syriacverbtutorial.org. This summer interns will finalize these games and perhaps even add some new ones. Of particular import is improved game design and graphics. In order to spend a summer designing video games to help save an endangered language one should have proficiency in web development especially implementing web graphics, animations, and sound in JavaScript, CSS/HTML and frameworks like React.
Spatial History of Ninth-Century Iraqi Saints
Faculty PI: Michael Penn, Department of Religious Studies
In the middle of the ninth century an Iraq-based bishop named Thomas of Marga wrote what may be the world’s longest alumni newsletter. That is, Thomas wanted to narrate the miraculous deeds of everyone connected to his home monastery. The resulting Book of Governors contains 622 different characters and is nothing short of a sprawling mess. Its tales jump from abbots to teleporting trees, caliphs to petrified dragons, bishops to a temporarily resurrected dog and often defy a linear read. But this messiness makes Thomas’s writings particularly amenable to digital analysis. Previous interns have worked on and become co-authors of articles and academic presentations regarding the character networks of this set of medieval saint tales written by Christians living under early Islam. This summer's intern will build on preliminary work our team has done regarding the geography of Thomas's narrative. They will consolidate a list of toponymns, georeference them, and produce digital maps and GIS data visualizations with the goal of journal publication. The ideal candidate will have previous GIS experience. Reading knowledge of French not required but a plus.
Mapping Circuits of Enslavement and Liberation
Faculty PI: Richard Roberts, Department of History
The Stanford Liberation Project (SLP) continues to build a spatial and relational database documenting individuals who sought freedom from enslavement. The SLP team, including undergraduate interns, has already published two articles co-authored with student researchers from previous summer programs at CESTA and has another currently under review. Project outputs have included a special issue on the ethics of naming enslaved individuals and a high-school world-history curriculum unit. This year, the team will focus on mapping the circuits of enslavement and liberation and finalizing data entry for individuals seeking their freedom. Students will participate in data entry, verification, historical mapping, and visualization design, contributing to a significant digital history resource.
The Modernist Archives Publishing Project (MAPP)
Faculty PI: Alice Staveley, Department of English
This project extends the Modernist Archives Publishing Project’s (MAPP) exploration of how computation can illuminate the cultural, economic, and literary history of modernist publishing. The team is developing an AI-driven transcription workflow to interpret handwritten nineteenth- and twentieth-century financial records from the bookselling industry. Building on an implementable model created by previous CESTA interns, the project combines text segmentation, Gemini- and ChatGPT-informed transcription, and “human in the loop” editing and correction. The student researcher will help refine and extend these models, contributing both technical and interpretive insight. Ideal candidates have advanced computational skills and an interest in how digital tools can make previously unreadable historical materials accessible at scale; students without prior experience will become familiar with transcription practices and the process of transforming archival materials into data. Through this work, students will gain experience applying AI methods to archival research, engaging in interdisciplinary analysis, and contributing as co-authors on a stand-alone digital humanities paper intended to come from this work.
Handwriting Analysis through New Directions in Manuscript studies with AI and Digital Environments (HANDMADE)
Faculty PI: Elaine Treharne, English Department
Handwriting Analysis through New Directions in Manuscript studies with AI and Digital Environments (HANDMADE) combines the domain expertise of medieval manuscript scholars with contemporary AI platforms to create a new environment for the analysis of manuscript handwriting. Despite recent advances in automated handwriting recognition, the success rates of these engines remain significantly poor, due in large part to the lack of input from manuscript scholars. As such, not only is the accurate transcription of manuscript materials out of reach, but so is any analysis of the handwriting itself. In this project, we take a new approach to this problem, fine tuning an AI engine not to transcribe handwritten documents, but to analyze a specific set of features that gives information beyond the text about the manuscript. Students will participate in fine tuning the algorithm, including annotating medieval manuscripts under the direction of the project PI, training the model, and assessing the results. Drawing on Stanford’s Parker Collection of digitized manuscripts, students will be exposed to paleography, book history, AI prompting, and model training and assessment.
Charting the Ottoman Empire: Power, Wealth, and Death in the Age of Revolutions
Faculty PI: Ali Yaycioglu, Department of History
This project explores the financial and political networks of the Ottoman Empire between 1750 and 1850 through detailed analysis of the fiscal codex MAD 9726. The research team is developing a relational database to trace connections among actors, institutions, and systems of wealth and governance, drawing on unique Ottoman accounting techniques. Students will assist with data preparation, categorization of unstructured data, and network analysis using digital tools. Familiarity with Python and experience in network analysis is preferred, and basic SQL skills are helpful but not required. The project provides an opportunity to gain hands-on experience in digital historical research, relational data modeling, and the analysis of early modern financial and political systems.
Capturing the K-pop Sound
Faculty PI: Dafna Zur, Department of East Asian Literatures and Cultures
This project explores whether machine learning can capture and distinguish the musical signatures that define the K-pop sound. By analyzing songs from major entertainment companies such as SM, JYP, and YG, the research examines how sonic features vary across production houses and generational eras, and what these variations reveal about the evolution of style within global pop music. Using computational models trained on audio features like spectrograms and frequency patterns, the project seeks to identify measurable patterns that correspond to cultural and aesthetic shifts within K-pop’s history. In doing so, it connects technical methods in data science with broader questions about creativity, standardization, and identity in the global music industry.
Mapping the K-Pop Sound
Faculty PI: Dafna Zur, Department of East Asian Literatures and Cultures
This project develops an interactive platform that visualizes the relationships among K-pop artists, companies, and musical styles. Using data derived from audio analysis and machine learning, the project seeks to represent how sound, production, and era intersect in shaping K-pop’s evolution. The resulting visualization or website will invite users to explore stylistic patterns across generations and companies, offering new ways to understand the genre’s development and global influence through interactive design and digital storytelling.