23 - 26 August 2021
National Library of Estonia
Tallinn, Estonia and online
REGISTRATION IS CLOSED!
Due to the COVID-19 situation, physical participation in the event will require
a valid Covid-19 passport or certificate.
Welcome to the Baltic Summer School of Digital Humanities “Digital Methods in Humanities and Social Sciences“.
This year’s summer school is a joint venture of two summer schools: Digital Methods in Humanities and Social Sciences (University of Tartu) and Baltic Summer School of Digital Humanities (National Library of Latvia). And the the lecturers and students will gather in the National Library of Estonia, Tallinn and/or on the web.
The summer school aims to provide an introduction to data analysis and methodological principles for working with (digital) data in humanities and social sciences. There are four keynote lectures by leading experts in their fields and nine practical workshops focusing on, for example, data analysis and visualisation, automatic text indexing and social media analysis.
PhD and MA students and supervisors from all fields of humanities and social sciences are welcome. Researchers and specialists working in relevant positions as well as guests from memory institutions are also invited to attend.
2 ECTS can be awarded for full participation, additional 1 ECTS for PhD students presenting in the Student Sessions workshop. This means attending the keynote lecture and one workshop every day. Attendance is verified by signing the participation list or by the participation logs of the conference platform.
Participation in the summer school is free of charge.
For participants who are members of the organising graduate schools, accommodation will be arranged. Other participants will need to organise their own accommodation.
The summer school is organised on the initiative of Centre for Digital Humanities and Information Society at University of Tartu (CDHIS) and the National Library of Estonia with the support from the Graduate School of Linguistics, Philosophy and Semiotics; Graduate School of Culture Studies and Arts.
Additional information:
summerschool@nlib.ee
Organising committee
The event is taking place with the support of:
Previous Summer Schools
Monday, 23 August
08:45 – 09:45 - Coffee / Registration
09:45 – 10:00 - Opening of the Summer School, Main Conference Hall
10:00 – 11:30 - Plenary (60+30 min)
11:35 – 13:30 Workshops
13:30 – 14:30 Lunch break
14:30 – 16:00 Workshop
16:00 – 16:30 Coffee
16:30 – 18:00 Workshops continue
19:00 – 22:00 Welcome reception at the Tallinn Town Hall
Tuesday, 24 August
08:45 – 09:30 Coffee / Registration
09:30 – 11:00 Plenary (60+30 min)
Link to the recording: https://youtu.be/lVzPAYtaKmk
11:05 – 13:00 Workshops
13:00 – 14:00 Lunch break
14:00 – 15:30 Workshop
15:30 – 16:00 Coffee
16:00 – 17:30 Workshops continue
Wednesday, 25 August
08:45 – 09:30 Coffee / Registration
09:30 – 11:00 - Plenary (60+30 min)
Link to the recording: https://youtu.be/gC0wohPv6VE
11:05 – 13:00 Workshops
13:00 – 14:00 Lunch break
14:00 – 15:30 Workshop
15:30 – 16:00 Coffee
16:00 – 17:30 Workshops continue
Thursday, 26 August
08:45 – 09:30 Coffee / Registration
09:30 – 11:00 - Plenary (60+30 min)
Link to the recording: https://youtu.be/nWIHoG5OQqQ
11:05 – 13:00 Student sessions
13:00 – 14:00 Lunch break
14:00 – 15:00 Digital Methods in a GLAM
Link to the recording: https://youtu.be/W2jyfAySWyA
15:00 – 15:30 Coffee
15:30 – 17:00 Student session
17:00 – 17:30 Closing, Small Conference Hall
*All events take place in local Estonia time (GMT +3).
Richard McElreath - Causal Thinking for Descriptive Research
23 August 10:00 - 11:3
Causal inference is hard, and everyone knows it. It is less recognized that descriptive and comparative scholarship also rely upon causal inference. How data are sampled and curated influences how we should process the data, in order to accurately describe or compare the people, times, and places of interest. I'll present some examples to illustrate the problems that ignoring causal structure can create, along with some solutions.
Bio: I am an evolutionary ecologist who studies humans. My main interest is in how the evolution of fancy social learning in humans accounts for the unusual nature of human adaptation and extraordinary scale and variety of human societies. Humans are more widespread and successful than any other vertebrate. Simultaneously, humans are unlike any other animal in that we cooperate in very large groups of unrelated individuals. I and my colleagues use formal evolutionary models, experiments and ethnographic fieldwork to address these puzzles.
Mila Oiva - Uncovering the Formation of Fake History Narratives
24 August 9:30 - 11:00
We are living in an era of abundant, fast-circulating and easily twisting information. Among all other stories, nation bound historical narratives shape our identities and worldviews and through that have a potential to shake politics and international relations. Popular historical stories circulating in the world wide web contain a diversity of historical narratives, including ones that consciously contest the academic research and spread simplified understandings based on conspiracy theories, and that thus can be defined as ‘pseudohistory’. This plenary introduces an ongoing project that seeks to understand the process of pseudohistorical content development in the current era. The project explores the global topic through a case study of circulating pseudohistorical narratives on Russian medieval history in the Russian language web. The used data contains 1,5 million websites, blogposts and discussion forum posts addressing the topic of the origins of Russian state in the middle ages. The project utilizes text reuse detection, network analysis and topic modeling in its effort to detect the structures and dynamics of evolution of fake historical narratives.
Mila Oiva (milaoiva@tlu.ee) is a cultural historian enthusiastic about identifying patterns of transnational circulation of knowledge and ideas in a long temporal perspective. Her current ongoing projects are related to circulation of pseudohistorical narratives in the world wide web in the 2000s, newsreel production in the Soviet Union in the 20th century, and circulation of news in the 19th century. She works as a Senior Research Fellow at CUDAN Open Lab at Tallinn University. Her most recent publications include for example Digital Readings of History. History Research in the Digital Era. Helsinki: Helsinki University Press, 2020 (open access, co-edited together with Mats Fridlund and Petri Paju), Yves Montand in the USSR: Cultural Diplomacy and Mixed Messages. Palgrave Macmillan, 2021 (coauthored with Hannu Salmi and Bruce Johnson) and “Topic Modeling Russian History.” In The Palgrave Handbook of Digital Russia Studies, edited by Daria Gritsenko, Marielle Wijermars, and Mikhail Kopotev. Palgrave Handbooks. London: Palgrave Macmillan, 2021.
Link to the recording: https://youtu.be/lVzPAYtaKmk
Jonas Nölle - Virtual Reality: A new tool for studying human behaviour in the lab
25 August 9:30 - 11:00
In recent years, virtual reality (VR) equipment has become much more accessible and affordable for consumers as well as researchers. In this keynote, I will introduce how this technology can be used in the humanities and social sciences to study human behaviour directly in the lab. In the past, experimental rigour and ecological validity have often been conceived as the opposites of a methodological continuum, where researchers would have to sacrifice ecological validity to assure sufficient control. However, VR experiments can provide participants with immersive and realistic tasks that nevertheless allow researchers to tightly control all variables involved. I will show some examples from my own research in the area of language evolution that applies this approach. I used interactive VR experiments to study the impact of the environment on spatial referencing systems. In the real world, cross-linguistic variation has been proposed to interact with the local environment (such as mountains or rivers), but also other sociocultural variables. My experiments were able to show how participants’ spatial referencing strategies were systematically affected by the topographic environment while they had to solve a collaborative task in environments with different affordances. Beyond that, I will discuss the further potential of VR for similar studies into linguistic diversity, cultural evolution and the digital humanities more generally and provide some pointers on how to set up such experiments using modern hardware and 3D engines.
Jonas Nölle (jonas.noelle@glasgow.ac.uk) is a cognitive scientist and evolutionary linguist. He is currently a research fellow at the Social Psychophysics lab at the University of Glasgow in Scotland, where he investigates multimodal communication and specifically the role of facial expressions in social interactions as part of Rachael Jack's Facesyntax ERC project. Previously he worked with the Interacting Minds Centre at Aarhus University and completed a PhD at the Centre for Language Evolution in Edinburgh, where he pioneered the use of interactive VR experiments to study language and communication in realistic and immersive laboratory settings. His research focuses on the study of communication in task-oriented face-to-face interaction, the emergence and evolution of novel communication systems, and the interaction of culture, cognition and the environment in shaping conventions and human behaviour.
Link to the recording: https://youtu.be/gC0wohPv6VE
Dong Nguyen - NLP for robust and reliable measurements
26 August 9:30 - 11:00
We are increasingly using computational text analysis to explore humanities and social science questions. A common and crucial step, then, is measuring social or cultural concepts using computational methods. Recent advances in Natural Language Processing (NLP) are promising. However, there’s increasing evidence that NLP systems are brittle and that results are sensitive to small design choices. In this talk I will discuss two recent studies looking at these topics. I’ll focus on the measurement of biases in NLP and on a recently developed test suite to interrogate hate speech detection systems.
Dong Nguyen is an assistant professor at the department of Information and Computing Sciences at Utrecht University (NL). Previously she was a research fellow at the Alan Turing Institute (UK). She holds a PhD from the University of Twente (NL) and a master’s degree from Carnegie Mellon University (USA). She is interested in developing NLP methods to explore questions from the social sciences and the humanities. At Utrecht she leads the NLP and Society Lab.
Link to the recording: https://youtu.be/nWIHoG5OQqQ
Linda Freienthal – The KRATT at NLE - how did we do it
26 August 14:00-15:00
In this lecture I will introduce TEXTA and give an overview of a project we did last year for the National Library of Estonia. We will describe the process of developing our solution for automatically proposing subject indices (keywords) for books.
Moderator Peeter Tinits
Link to the recording: https://youtu.be/W2jyfAySWyA
Introduction to R and Tidyverse
Lecturer: Peeter Tinits (University of Tartu)
Date: 23 August
Room: Small Conference Hall
Description
R is a scripting language often used for data processing in humanities and social sciences. It provides the means to produce analyses as a reproducible workflow that is transparent to readers and easy to update. We will start with the very basics of R and RStudio, and quickly work our way through to simple data processing via Tidyverse packages. Tidyverse is a set of packages that aims to make R easy to use, especially for beginners. We will learn basic R syntax, data manipulation and overviews in Tidyverse style.
We will rely on personal laptops in this tutorial, you will need to install R (https://www.r-project.org) and RStudio (https://www.rstudio.com) a few days beforehand. Short instructions will be shared.
If you do not have previous experience in R, this workshop is a requirement for attending other workshops using R at this summer school.
References:
Grolemund, Garrett, and Wickham, Hadley (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.
About the instructor
Peeter Tinits is a digital humanities specialist in the University of Tartu, and teaches various digital humanities courses. His own research has been on spelling standardization of Estonian, the rise of environmentalism in the 20th century, and structural changes in film production crews. He is a firm believer that anyone can learn to code, and the humanities have a lot to gain from adopting reproducible research practices.
Introduction to natural language processing using Pandas, Spacy and Stanza
Lecturer: Kristiina Vaik (University of Tartu)
Date: 23 August
Room: Corner Hall
Description
This workshop aims to introduce an alternative programming language used in natural language processing - Python. Python has a simple syntax and transparent semantics and is widely used for analyzing, understanding, and deriving information from structured and unstructured data. This course will start with a basic introduction to Python, we will quickly go through topics such as syntax, variables, data structures, conditionals, loops, and IO. We will continue with an introduction to Pandas, a powerful Python data analysis toolkit used for data exploration and manipulation. Finally, I will introduce spaCy and Stanza. Both are free open-source libraries with many built-in capabilities for text processing, e.g, noise removal, tokenization etc. Additionally, we shall see how to apply spaCy's and Stanza's pre-built models for different downstream tasks, e.g, morphological and syntactical parsing, named entity recognition, etc.
I will provide the students with Jupyter notebooks containing the code used in this tutorial. This course will assume basic knowledge (syntax, what are data types and variables) in Python and build on that. I recommend using your laptop, instructions on what packages to download will be shared beforehand.
About the instructor
Kristiina Vaik is a Ph.D. student at the University of Tartu. She has worked as a programmer in the Natural Language Processing Research Group at the University of Tartu and as a data analyst at TEXTA.
Dataset creation, publishing and maintenance: Practices, solutions and open questions
Lecturer: Niko Partanen (University of Helsinki)
Date: 23 August
Room: Auditorium 2.0
Description
The publication of research data has become increasingly commonplace and required in recent years, but many of the related practices are still evolving, and there is a great deal of variation between scientific fields. There are, however, some basic principles we can aim to follow in order to ensure that our materials are organized in a way that is suitable for our research use and for further distribution. In this workshop, we will use FAIR principles as a general guideline, but we will focus primarily on practical and tested solutions that are available at the moment, while reflecting them in the wider context of FAIR.
In the workshop, we will go through the basic premises of dataset creation and documentation, with a focus on general project management and organization. We will also discuss issues related to good documentation and reproducibility in a wider context. We will study version control, and use GitHub's Zenodo integration as one publishing mechanism. We will also discuss the possibilities for storing closed datasets, and how we generally have to approach the licensing and reuse of different materials.
At the end of the workshop, each participant will be familiar with essential data preparation and publication methods that can be used at the moment. We will also reserve time to discuss the nuances of individual datasets the students have been working with.
About the instructor
Niko Partanen is a PhD student at the University of Helsinki. He has been working for the last ten years with documentation and description of endangered Uralic languages spoken in Russia, especially the varieties of Komi. He has also worked extensively with language technology and natural language processing, with a particular focus on integrating these methods into language documentation workflows. Partanen collaborates with a variety of archives in the digital preservation of legacy data.
Exploring and visualizing your data using R
Lecturer: Andres Karjus (University of Edinburgh)
Date: 24 August
Room: Small Conference Hall
Description
Big data is everywhere, holds unprecedented potential for humanities and social science research, and in general for the understanding of our complex ever-changing world. But understanding big data is hard. Unless you have the right tools. In this workshop, we’ll be exploring and dissecting various real world datasets using R, an excellent programming language for doing anything related to stats and data science. If you've never written code before in your life, this is your opportunity to learn (this superpower) through practical exercises with clear outcomes, primarily in the form of visualizations. We will mostly be using the ggplot2 R package and its addons, starting out with basic examples like scatterplots and time series. We will also look into a few other packages for creating things such as networks and maps, as well as interactive and animated plots. But, with great power comes great responsibility: so we will also spend some time discussing the ethics of data visualization, and approaches to making sure your graphs don't mislead your audience.
About the instructor
Andres Karjus is a research fellow at the ERA Chair for Cultural Data Analytics (CUDAN) at Tallinn University. He obtained his PhD in evolutionary linguistics from the University of Edinburgh in 2020, and holds degrees in linguistics (BA, MA) and computer science (MSc). He uses R daily in his research and has been teaching R workshops and courses since 2015.
Twitter: https://twitter.com/AndresKarjus
Personal website: https://andreskarjus.github.io
An introduction to doing research using social media data
Lecturer: Tuomo Hiippala (University of Helsinki)
Date: 24 August
Room: Auditorium 2.0
Description
This workshop introduces key issues in doing research using social media data. We will discuss the kinds of research questions that may be pursued using social media data; map access to data as of 2021; consider ethical questions related to social media research; and experiment with applicable computational methods from the fields of natural language processing and computer vision. The workshop involves some programming in Python using Jupyter Notebooks, an interactive environment running in a web browser.
Recommended readings
About the instructor
Tuomo Hiippala is Assistant Professor in English Language and Digital Humanities at the University of Helsinki, Finland. His research interests include multimodal communication and urban multilingualism.
Computational harmonization of bibliographic data
Lecturers: Leo Lahti (University of Turku), Iiro Tiihonen (University of Helsinki)
Date: 24 August
Room: Corner Hall (online only)
Description
The Workshop is an introduction to the computational harmonisation of bibliographic metadata.
Bibliographic metadata is a valuable source of historical and cultural information. However, it’s often the result of a long and shifting process, resulting in data coded with various differing notations and conventions. Its full utilisation as cultural heritage or research material is often impossible without harmonisation - the process of standardising, converting and cleaning the data. Using R and focusing on Estonian and Finnish bibliographic metadata, the workshop aims to motivate the importance of harmonisation and to demonstrate its application in practice.
As harmonisation is a vast and often content specific topic, we aim to combine general level motivation about the often fundamental role of harmonisation with practical examples based on real bibliographic data and workflows used in practice. We provide an overview of the process used to harmonise the historical bibliographic metadata of the Finnish National Library (Fennica) and a more focused hands on example using Estonian bibliographic data.
About the instructors
Iiro Tiihonen is a PhD student elect of history at the University of Helsinki. He has a background both in the humanities (M.A, history) and data analysis (M.Sc, applied mathematics) and his academic focus is on the application of bibliographic metadata to quantitatively study the early modern period.
Leo Lahti is associate professor in data science & computational humanities at the University of Turku and long-time member of Helsinki Computational History Group. Lahti got his doctoral degree in machine learning / bioinformatics from Aalto University, Finland, in 2010. The current research of the team focuses on computational analysis of complex natural and social systems. More information at the research homepage: datascience.utu.fi
Introduction to linear mixed models and Bayesian inference
Lecturer: Bodo Winter (University of Birmingham)
Date: 25 August
Room: Small Conference Hall
Description
In this workshop, we’ll be learning to use brms to fit linear models and linear mixed effects models in a Bayesian framework. As many different fields (including linguistics, psychology etc.) are moving away from thinking about data analysis in terms of significance tests, this workshop prepares you for the future. The workshop will teach you about two things: First, the fundamentals of statistical modelling, with a focus on how to interpret linear models. Second, the fundamentals of Bayesian inference. Instructions and materials will be released closer to the start of the workshop.
About the instructor
Bodo Winter is a UKRI Future Leaders Fellow, a Senior Lecturer at the University of Birmingham, UK, and Editor-in-Chief at the journal Language and Cognition. He has written a textbook “Statistics for linguists: An introduction using R” and has extensive experience running workshops on statistical modelling.
Twitter: https://twitter.com/BodoWinter
Personal website: https://bodowinter.com/
Introduction to the Annif automated indexing tool
Lecturers: Osma Suominen, Mona Lehtinen, Juho Inkinen (National Library of Finland)
Date: 25 August
Room: Corner Hall (online only)
Description
Many libraries and related institutions are looking at ways of automating their metadata production processes for example through the adoption of AI technology. In this hands-on tutorial, participants will be introduced to the multilingual automated subject indexing tool Annif (annif.org) as a potential component in a library’s metadata generation system. By completing exercises, participants will get practical experience on setting up Annif, training algorithms using example data, and using Annif to produce subject suggestions for new documents using the command line interface, the web user interface and REST API provided by the tool. The tutorial will also introduce the corpus formats supported by Annif so that participants will be able to apply the tool to their own vocabularies and documents.
The tutorial will be organized using the flipped classroom approach: participants are provided with a set of instructional videos and written exercises, and are expected to attempt to complete them on their own time before the tutorial event, starting at least a week in advance. The actual event will be dedicated to solving problems, asking questions and getting a feeling of the community around Annif.
Participants are instructed to use a computer with at least 8GB of RAM and at least 20 GB free disk space to complete the exercises. The organizers will provide the software as a preconfigured VirtualBox virtual machine. Alternatively, Docker images and a native Linux install option are provided for users familiar with those environments. No prior experience with the Annif tool is required, but participants are expected to be familiar with subject vocabularies (e.g. thesauri, subject headings or classification systems) and subject metadata that reference those vocabularies.
Workshop materials
Exercises and introductory videos can be found in the Annif-tutorial GitHub repository.
The tutorial materials have been created in collaboration with Anna Kasprzik and Moritz Fürneisen of ZBW - Leibniz Information Centre for Economics in Germany.
About the instructors
Osma Suominen works as an Information Systems Specialist at the National Library of Finland. He is the original developer of Annif and is currently leading the automated cataloguing project where Annif is being developed and deployed. He has a doctoral degree in Media Technology (Aalto University) and has a long experience with semantic web technologies, vocabulary services and metadata processes.
Mona Lehtinen is an Information Specialist at the National Library of Finland. She works with the Annif project and is happy to tackle various tasks such as project coordination, community and corpora building and testing the new features of Annif.
Juho Inkinen works as an Information Systems Specialist at the National Library of Finland. His tasks include developing Annif and taking care of the Annif instances hosted on the sites of the National Library.
Solving research questions through computational thinking around experimentation with digital collections / data
Lecturers: Mahendra Mahey (GLAM Labs)
Date: 25 August
Room: Auditorium 2.0
Description
This workshop attempts to help you take some pragmatic steps towards how your research question(s) and related data/digital collections may be examined, analysed and solved ‘computationally’.
You will get an opportunity to tell the group about your research question and present your data. We will then collectively give you some feedback and suggest some practical steps in moving your project forward, breaking things down into manageable steps and examining if any could harness the power of computation. You will then get a chance to start on this journey in this workshop and report back later on any progress or challenges faced.
Don’t have your own data? No problem! We can ensure you get ‘hands-on’ experience of working with cultural heritage data and digital collections and provide some of the typical challenges so that you get a chance to safely experiment and play.
Mahendra will examine and discuss challenges he has faced when experimenting with digital collections through different kinds of experiments in exploring, finding patterns and making new discoveries within data through hundreds of digital projects. He will also invite the workshop participants to contribute their wealth of experience in providing feedback too.
The workshop will conclude with reflections from the delegates and feedback on how to move forward in your enquiry and learn the power of thinking computationally for future research problems and contexts.
See a more detailed workshop description here.
About the instructors
Mahendra Mahey has a background of working with people, digital technology and data as a manager, educator, adviser and community builder in Cultural Heritage, Further and Higher Education for researchers, educators, librarians and businesses both in the UK and internationally.
For the last 8 years he has been helping scholars, artists, entrepreneurs, educators and innovators to work with cultural heritage data while working as the manager of British Library Labs. He has worked with colleagues to bring national, state, university and public Galleries, Libraries, Archives and Museums (GLAMs) together who are planning and already have digital experimental ‘Labs’. The GLAM Labs network aims to share expertise, knowledge and experience in order to build better ‘Labs’ for their organisations and users. Personal website: http://mahendramahey.com/
Student session: research methods troubleshooter
Date: 26 August
Room: Main Conference Hall (only on-site)
Description
This year, we will be hosting a student research session on the last day of the summer school. During the session, each participant will have a chance to present their own work. Two presentation formats are available. You can either introduce your project and brag about a nice solution you came up with to solve a difficult problem (which might be useful for somebody else as well), or, you can present an unresolved problem to the audience for public troubleshooting and brainstorming. The session will be moderated by invited interdisciplinary researchers with expertise in various qualitative and quantitative methods.
There will be no workshops on the final day so everyone can join in on the discussion! 1 ECTS will be awarded to those presenting their work (in either format). However, we encourage everybody to come join the audience and participate in the brainstorming. The details of this event will depend on the number of interested participants; if you are interested in presenting, please indicate so in the registration form and we will contact you with further information.
If you have any questions about this event, please contact Mariann Proos: mariann.proos@ut.ee
This year the Summer School will take place in hybrid form - physically and virtually. Please be advised that the physical event is subject to change due to the COVID-19 situation.
Physical venue is the National Library of Estonia (Tõnismägi 2, Tallinn. See on Google Maps). The Library can be easily accessed by car, by public transport and on foot. It is only a short walk away from Tallinn Old Town. It is going to be the last chance to see the Library building as it is today and has been for the last 30 years. Starting from January 2022, its doors will be closed for renovations for up to 4-5 years.
Online event will be hosted over Cisco Webex conferencing platform. For detailed information and guidelines, please refer to to section Online event.
Workshops, if the situation allows, will take place in rooms that have sufficient amount of power outlets for laptops. Public wifi is available in all areas. To attend workshops, each attendee must bring their own laptop. For necessary applications, the organizers will contact the participants prior to the event.
Lunch is not provided, as there are several restaurants, cafes and pubs nearby that have reasonably priced (4-6 €) daily offers. There is also a cafeteria in the Library.
COVID-19
Due to the constraints set by the Estonian government, a valid Covid-19 passport or certificate is needed to attend the physical event of the Summer School. This means that you need to be either
Please be advised that all participants, lecturers, staff and caterers have to provide a certificate to the organisers. Since the Summer School is a 4-day event, the certificates will be checked every day and each time you come to the event. Persons who fail to prove their infection safety, cannot participate in the event.
All participants coming to the National Library of Estonia are advised to come to the entrance reasonably early. Due to that we are opening the doors a bit earlier, already 8:45 am.
More information about Estonia and COVID-19 restrictions: https://www.kriis.ee/en/restrictions-force-estonia-starting-march-11
Eesti Rahvusraamatukogu
Narva mnt 11, 15015 Tallinn
+372 630 7100
info@rara.ee
rara.ee