How do you feel about Växsjö?: Representing and Marginalizing Places in a Collection Based Research
Join us for a CESTA Tuesday lunch seminar on October 8th, between 12 and 1:15 pm, featuring Love Börjeson (Head of R&D and KBLab, National Library of Sweden) and Martin Malmsten (Senior Data Scientist at KBLab). Their talk is titled "How do you feel about Växsjö?: Representing and Marginalizing Places in a Collection Based Research." They will present their research utilizing collection-based language models to explore the cultural and linguistic dynamics in Sweden's National Library collections, especially in terms of data related to locations. They will show how they find locations that are marginalized within the archives, including who is writing in the archives. The event will take place in person at Wallenberg Hall, Room 433A, and virtually. Lunch will be provided to in-person participants. Please RSVP here for in-person and virtual participation.
Paper Abstract
KBLab, a research infrastructure at the National Library of Sweden (KB), has released a series of collection-based (L)LMs, trained to counteract the marginalization of the Swedish language in the context of LLM-applications in society. KBLab’s models are released openly and are the most used (L)LMs in Sweden, but they are still - much to our surprise - underused in research on KB’s own collections. To develop KBLab as a research infrastructure, we now initiate a geospatially augmented access to the collections, using KBLab’s models to identify topics, sentiments, places and their interrelatedness, in the National Library of Sweden’s collections. From a research infrastructural perspective, we will discuss how we do this and what considerations we make to maximize benefit to researchers. We will also showcase some early results. From a research perspective, we will discuss how we can use the results to investigate what parts of Sweden, and hence parts of variations of Swedish language and culture, that are marginalized in KB’s collections and subsequently in KBLab’s (L)LMs. Questions we will address are: Who gets to talk about the place they live? Who are in contrast being talked about, and by whom? And, who are completely left out? How do these patterns change over time?
KBLab’s models are published and released here: https://huggingface.co/KBLab
The work we present is in an early stage. Previous publications that may be of interest:
M Malmsten, C Haffenden, L Börjeson, (2022) Hearing voices at the national library--a speech corpus and acoustic model for the Swedish language. arXiv preprint arXiv:2205.03026.
L Börjeson, C Haffenden, M Malmsten, F Klingwall, E Rende, R Kurtz, … (2024) Transfiguring the Library as Digital Research Infrastructure: Making KBLab at the National Library of Sweden. College & Research Libraries 85 (4), 564-582.
M Malmsten, L Börjeson, C Haffenden (2020) Playing with Words at the National Library of Sweden--Making a Swedish BERT. arXiv preprint arXiv:2007.01658.
About the Speakers
Love Börjeson (PhD) is head of R&D and KBLab at the National Library of Sweden. He is an affiliated researcher at the Stockholm School of Economics and has a background as postdoc at the Computational Social Science Lab at Stanford university.
Martin Malmsten is senior data scientist at KBLab at the National Library of Sweden and has trained some of the lab’s most used models for text and sound.