Official event website: https://www.clarin.eu/event/2023/clarin-and-libraries-2023-large-language-models-and-libraries
The workshop builds upon the first CLARIN and Libraries workshop held in the Hague in May 2022 (see here).
This year's workshop will investigate further areas of collaboration between CLARIN-related initiatives and libraries with a special emphasis on building (large) language models in and in cooperation with libraries. The workshop will bring together for the second time a group of people associated with both CLARIN (or other research infrastructures) and libraries. Whereas the first CLARIN and Libraries workshop was particularly concerned with digital content delivery for researchers, the main theme of the second workshop will be large language models and library collections, e.g. technical challenges in building such models and legal implications of model training and use.
The host, the National Library of Norway (NLN), has since 2005 digitised its entire text collections, amounting at present to a large corpus of 160 billion words for Norwegian and has built large language models for text (BERT, GPT-2, T5) and speech (wav2vec, Whisper) on these collections. There will be keynotes from the National Libraries of Norway and Germany on the technical aspects of building such models in a library setting, as well as a keynote on the legal aspects of building large language models from the Swedish National Library.
Participation in the workshop is by invitation. If you are interested in attending, please contact your national coordinator or clarin@clarin.eu. The venue (National Library of Norway, Henrik Ibsens gate 110, Oslo) is located very close to the train station Nationaltheatret. Descriptions for getting to the venue can be found here.
12:00 - 13:15 | Lunch (Cafeteria, National Library of Norway) |
13:15 - 13:30 | Welcome |
13:30 - 15:00 | Introduction to CLARIN and Libraries, wrap-up from last year’s workshop (15 mins)Tour de table: introduction and points for discussion (45 mins)Library collections as data (Sally Chambers) |
15:00 - 15:30 | Break |
15:30 - 17:00 | Large language models at the National Library of Norway (Javier De La Rosa)Large language models at the German National Library (Peter Leinen)Discussion: technical aspects (chair: Andreas Witt) |
17:00 - 17:30 | Sensitive Data in HPC – How secure can it be? Is secure data processing in shared computing environments a dream? (Martin Matthiesen) |
19:00 | Evening social dinner (Avalon, Munkedamsveien 31, Oslo) |
9:30 - 10:30 | Lightning Talks: Participants who have registered for a lightning talk (see separate invitation by e-mail) will have the possibility to introduce their own projects and resources. |
10:30 - 11:00 | Break |
11:00 - 12:00 | Legal aspects of large language models in libraries (Jerker Rydén)Discussion: legal aspects (chair: Andreas Witt) |
12:00 - 13:00 | Lunch (Cafeteria, National Library of Norway) |
Address
National Library of Norway
Henrik Ibsens gate 110
0255 Oslo
Norway
Official event website: https://data.europa.eu/en/news-events/news/are-you-student-or-academic-sign-workshop-how-use-open-data-your-research
Join our workshop tailored to students and academics to learn how to use open data from data.europa.eu for your research.
On Thursday 19 October from 10.00 to 12.00 CET, data.europa.eu will host its first workshop series on ‘How to use open data for your research’. This series, designed for students and academics is also open to anyone interested. The goal is to showcase the relevance and importance of open data for academic purposes. During this workshop, you will learn how to use open data from our portal for research and get examples from universities that have already embraced open data in their daily operations.
The workshop will start with an introduction by the Publications Office of the EU on open data and data.europa.eu. Following that, a hands-on presentation will be given through the portal, demonstrating how to efficiently search for specific datasets. Furthermore, three esteemed universities - KU Leuven, University of Amsterdam, and University of Naples L'Orientale - will present practical examples of how open data can be used to conduct Research.
These universities offer three distinct perspectives on using open data in research. The KU Leuven Research Repository provides access to open research data through their database, providing valuable resources to students. The University of Amsterdam offers a practical example of how linked open data can drive research and the UniOR NLP Research Group of the University of Naples L'Orientale will showcase a real-life example of exploiting open data for applications.
Following the informative university presentations, a short practical assignment will be given to help put the learnings into practice. To wrap up the practical assignment, you will have the opportunity to share your insights and ask your questions during the Q&A session.
Do you want to improve your digital skills and learn about the relevance of open data for academic purposes with real-life examples? Click here to register for the workshop.
For more news and events, follow us on Twitter, Facebook and LinkedIn, or subscribe to our newsletter.
Official event website: https://cji.uniri.hr/en/hr/konferencija/clarc2023/
The Center for Language Research at the Faculty of Humanities and Social Sciences in Rijeka, Croatia is organizing the international scientific conference CLARC 2023 (cji.uniri.hr/clarc2023) entitled "Language and Language Data". You are invited to participate by presenting a scientific paper and/or organizing a panel on a relevant topic (extended deadline until July 31). We welcome high-quality presentations on empirical, theoretical, and methodological issues in science related to the theme of "Language and Language Data".
CLARC2023 will cover various topics in the fields of language technology and linguistics. Participants will have the opportunity to engage in panels and lectures that encompass areas such as computational linguistics and natural language processing, corpus linguistics, phonetics, phonology, morphology, syntax, sociolinguistics, pragmalinguistics, discourse analysis, language acquisition, psycholinguistics, neurolinguistics, language and cultural studies, language data from legal and economic perspectives, language and tourism, as well as medicine and public health.
The conference will feature inspiring keynote speakers, including Maja Miličević Petrović from the University of Bologna, Marko Tadić from the University of Zagreb, Tony Veale from University College Dublin (UCD), and Nikola Ljubešić from the Jožef Stefan Institute.
Additionally, various events will be organized, such as a panel section on transforming language and linguistics studies with the results of the UPSKILLS project (Upgrading the Skills of Linguistics and Language Students), a round table on career transitions from linguistics to data science, and a round table on linguistics and large language models. Participants will also have the opportunity to attend the co-occurring event RIKON 2023, the largest fantasy convention in Croatia with a rich tradition.
Please read the details about submissions, the deadline for abstracts and full papers, as well as information about registration fees on the official website of CLARC 2023: https://cji.uniri.hr/clarc2023
We look forward to your valuable presence at this conference, which brings together experts in the fields of language technology and linguistics.
Official event website: http://www.digitalhumanities.lv/bssdh/2023
When: 25 July - 28 July 2023
Where: Riga, National Library of Latvia and Online
Language: English
Duration: 4 days
Credits: 3 ECTS
Fee: 30 EUR*
Welcome to the annual Baltic Summer School of Digital Humanities (BSSDH 2023)!
This year’s programme offers must-have introductory courses for digital humanists and digital social scientists who wish to come to grips with programming with Python, collecting web data, and network visualization. For the first time, an advanced Python class is also available. The course is co-taught by an international team of researchers and practitioners of digital humanities and digital social sciences coming from Latvia, Finland, Sweden, and Poland. As always, this will be a great opportunity to meet colleagues and mentors from other countries and explore different perspectives!
The programme includes workshops and lectures in:
PARTICIPANTS
The Baltic Summer School of Digital Humanities is aimed at students and researchers of humanities and social sciences, library, and archives professionals. There are no prerequisites for participation, as the course does not require any background in DH computing. The working language of the summer school is English. After successful completion of the summer course, students will be awarded 3 ECTS by the University of Latvia.
REGISTRATION FEE*
30 EUR includes: Full access to all lectures and workshops, onsite or online. Food and drinks during lunch and coffee breaks for onsite students. Access is free of charge for assistants of workshops and other volunteers. Contact dh@lnb.lv to apply!
REGISTRATION FORM
https://reg.lnb.lv
SUPPORTERS
State Research programme
Nr. VPP-IZM-DH-2020/1-0002
ORGANIZERS
BSSDH 2023 is organized by a joint effort of the National Library of Latvia, Institute of Literature, Folklore and Art (University of Latvia), Faculty of Social Sciences of the University of Latvia.
CONTACT US
Coordinator Anda Baklāne
anda.baklane@lnb.lv
dh@lnb.lv
+371 29 143 299
Official event website: https://dh2023.adho.org/
July 10-14 2023, Graz | Austria
We are looking forward to welcoming you to Graz, Austria from the 10th to 14th July 2023!
Registration for the conference is now open and possible at: https://www.conftool.pro/dh2023/
DIGITAL HUMANITIES (DH) is at the intersection of computing or digital technologies and the disciplines of the humanities. It involves the development and use of digital resources and methods in the humanities, as well as the analysis of their application. DH scholarship means collaborative, transdisciplinary, and computationally engaged research, teaching, and publishing.
The Alliance of Digital Humanities Organizations (ADHO) promotes and supports digital research and teaching across all humanities disciplines, acting as a community-based advisory force, and supporting excellence in research, publication, collaboration and training.
The annual ADHO Digital Humanities Conference is the central and largest event of the international DH community and unites scholars from across the globe, presenting them with a unique opportunity for the exchange of their work and ideas and the fostering of future collaborations.
The conference theme, “Collaboration as Opportunity” showcases transdisciplinary and transnational collaboration, with a special focus on the South-Eastern European DH community. It will explore how mutual empowerment and collaboration of neighboring countries – regardless of continent and geopolitical placement – can transform regional hubs of expertise to international networks of excellent research, to the benefit of the global DH community.
Official event website: https://www.rara.ee/en/events/digital-memory-2023/
https://www.nlib.ee/et/sundmused/digimalu-seminar-2023
The next Digital Memory seminar will take place in the Architecture Center (Tallinn, Põhja pst 27a) on March 30 from 10:00 to 19:00. This time the focus is on creative use of digital collections.
Inspiration comes from the presentations by Annika Rockenberger (Oslo University), Fredrik Norén (Umeå University), Vojtěch Malínek and Tomasz Umerle (Polish Academy of Sciences), Jessica Wevers and Rianne Koning (Royal Library of the Netherlands), Sophie Hammer and Martin Krickl (National Library of Austria), and Thomas Padilla (Internet Archive). All presenters join the seminar online and are in English.
In addition, there will be a discussion panel on the future of digital creative use and a practical workshop.
The day ends with the presentation of the newly developed RaRa virtual lab platform.
Both will take place on the location and in Estonian.
We ask that you register for the seminar no later than 23.03 at 18.00 so that we know how to account
for food.
You are welcome to share the invitation with your colleagues!
10.00 Welcome I Moderators for the day: Laura Nemvalts (Specialist of Digital Humanities at National Library of Estonia) and Krister Kruusmaa (Data Scientist at the National Library of Estonia)
10.05 Opening words I Janne Andresoo (General Director at the National Library of Estonia)
10.10–11.40 Inspiration panel (all the presenters join the seminar through Zoom; in English)
Annika Rockenberger (University of Oslo Library) – “Sharing is Caring. Digital Research Support, Skills Development & Special Collections Digitisation at the University of Oslo Library”
The presenter talks about a core value of libraries: sharing. Sharing knowledge and information, but also skills, methods, and research activities. The University of Oslo Library is Norway's oldest university library and, until 1999, served as Norway's National Library, too. Now, it is home to a vast collection of research literature and several unique special collections of mostly non-Norwegian origin. With a recent strategic re-orientation, the library is focusing on digitisation in its various aspects. Their leading thought is: „How can we share our various collections – and smaller, more dispersed collections at the faculties and institutes at the University of Oslo - with our researchers and students here in Oslo, in Norway, and beyond: with the international scientific and cultural heritage community?“. They employ a three-fold approach:
Jessica Wevers and Rianne Koning (Royal Library of the Netherlands) – “Creative experiments by students”
KB collaborated with the art academy, creating artistic exhibitions in the national library with the digitized Alba Amicorum collections. Additionally, they created a living library by building interactive installations with students from the technical university of Delft using our digital collections.
Sophie Hammer and Martin Krickl (Austrian National Library) – “ONB Labs Artistic Experiments – Artists engaging with digital collections of the Austrian National Library”
ONB Labs is the platform of the Austrian National Library for the scientific and creative use of digital collections. In addition to the general opening of selected digital collections as images, texts and metadata, the ONB Labs actively seek the exchange with young as well as established artists since the beginning of the ONB Labs in 2018. This talk discusses the process and results of the ONB Labs' recent 'artistic experiments': three programs that invited artists to creatively and critically engage with the library's digital collections, organised in the course of the EU co-funded project "Open Digital Libraries".
11.40–12.10 Coffee break
12.10–13.10 Discussion panel I “About the future of creative use of digital collections – where could we move on and what should we do about it” I Moderated by Peeter Tinits (Head Specialist of Digital Humanities at the National Library of Estonia) (in Estonian)
Panelists: Kadri Vare (Institute of the Estonian Language), Mikk Meelak (Estonian Academy of Arts, Platvorm), Mirjam Rääbis (Estonian National Heritage Board) and Indrek Ibrus (Tallinn University)
13.10–14.00 Lunch
14.00–15.30 Inspiration panel (all the presenters join the seminar through Zoom; in English)
Fredrik Norén (Umeå University) – “Swedish Riksdag 1867–2022: An Ecosystem of Linked Open Data”
The parliament has the power to transform society’s future. Its documents constitute a democratic resource for our present-day that, in turn, can be used by researchers to remodel our understanding of the past. In this talk, the presenter will present a newly funded research infrastructure project to enhance the possibility of exploring the Swedish parliamentary past. The purpose of the project is to (1) create a database of all members of parliament since 1867 and (2) to link members to the speeches they gave, motions they wrote, committees they were part of, and – if they were part of the government – bills they were responsible for.
Vojtěch Malínek (Polish Academy of Sciences) – “Current Situation in European Bibliographies for the Humanities”
The presentation will introduce the activities and outputs of Bibliographical Data Working Group (BDWG) of DARIAH-ERIC Consortium. Established in 2019, BDWG has gathered scholars, IT developers and data curators interested in bibliographical data curation and research. The main output of the BDWG efforts is a white paper "An Analysis of the Current Bibliographical Data Landscape in the Humanities. A Case for the Joint Bibliodata Agendas of Public Stakeholders" (2022) which is mapping current situation and trends in European landscape of bibliographies for humanities. Another project related to BDWG Community is a portal Literarybibliography.eu which tries to present European literary bibliographies in a joint interface. At the moment, three national datasets (Czech, Polish and Finnish) are available and systematically harmonised, while joint authority file is created based on re-use of wikidata. Last but not least, short overview of the Czech Literary Bibliography research infrastructure and its internationally relevant activities will be given.
Thomas Padilla (Internet Archive)
Increasingly, libraries, archives, and museums seek to support computational use of their collections as data. This talk will share lessons learned from a national effort to support small, medium, and large GLAM institutions in the development of collections amenable to computational research and pedagogy.
15.30–17.00 I Workshop I Participants reflect on the ideas received during the day and ground the inspiration gathered I Regina Tagger and Margus Veimann (Service Designers at the National Library of Estonia) (in Estonian)
17.00–19.00 I Presentation of the RaRa virtual lab platform by the National Library of Estonia team (in Estonian) followed by snacks and wine
*First part of the day will be recorded.
Official event website: https://blogs.bl.uk/digital-scholarship/2023/03/bl-labs-symposium-2023-programme-and-speakers-announced.html
Mar 30, 2023 02:00 PM in London
The annual British Library Labs Symposium is back! Once again we are celebrating creative and inspiring work, that uses the British Library’s digital collections and data for new purposes ranging from research and art to science, community and learning. Following the Webinar, there will be an opportunity to exchange ideas at a networking reception in the Library. You can register for both events here.
The first part of the Symposium will look into the new possibilities emerging in the world of Virtual and Mixed Reality, and the innovative uses of GLAM data in aiding new types of creativity.
In the second half we will delve into the world of data science and AI, including the real life case studies of projects using the British Library data.
We look forward to also telling you about our future plans for the BL Labs. Your ideas are very welcome too!
For the full programme, and further information on our speakers, visit: https://blogs.bl.uk/digital-scholarship/2023/03/bl-labs-symposium-2023-programme-and-speakers-announced
We do hope you will join us and look forward to seeing you soon!
Official event website: https://culturedigitalskills.org/webinars/webinar-introduction-to-centre-for-digital-skills-in-visual-and-material-culture/
Please join us at our webinar Introduction To Centre For Digital Skills In Visual And Material Culture. To register:
The webinar will take place on the 24th of March 2023 at 10:00 am (GMT).
The webinar introduces our AHRC-funded project (https://culturedigitalskills.org/) on piloting a regional training centre for upskilling the Arts and Humanities community in creating, managing, and using multidimensional (2D/3D) digital media.
In this one-hour introductory session, we will discuss needs and plans to co-develop the centre with colleagues who specialise in visual and material culture research in the Higher Education sector, Galleries, Libraries, Archives and Museums (GLAM), and the creative industries to improve their current and future digital skills.
Please register and share with colleagues and interested parties.
Dr Myrsini Samaroudi
Research Fellow
Centre for Secure, Intelligent and Usable Systems (CSIUS)
School of Architecture, Technology and Engineering
University of Brighton
23 - 26 August 2021
National Library of Estonia
Tallinn, Estonia and online
REGISTRATION IS CLOSED!
Due to the COVID-19 situation, physical participation in the event will require
a valid Covid-19 passport or certificate.
Welcome to the Baltic Summer School of Digital Humanities “Digital Methods in Humanities and Social Sciences“.
This year’s summer school is a joint venture of two summer schools: Digital Methods in Humanities and Social Sciences (University of Tartu) and Baltic Summer School of Digital Humanities (National Library of Latvia). And the the lecturers and students will gather in the National Library of Estonia, Tallinn and/or on the web.
The summer school aims to provide an introduction to data analysis and methodological principles for working with (digital) data in humanities and social sciences. There are four keynote lectures by leading experts in their fields and nine practical workshops focusing on, for example, data analysis and visualisation, automatic text indexing and social media analysis.
PhD and MA students and supervisors from all fields of humanities and social sciences are welcome. Researchers and specialists working in relevant positions as well as guests from memory institutions are also invited to attend.
2 ECTS can be awarded for full participation, additional 1 ECTS for PhD students presenting in the Student Sessions workshop. This means attending the keynote lecture and one workshop every day. Attendance is verified by signing the participation list or by the participation logs of the conference platform.
Participation in the summer school is free of charge.
For participants who are members of the organising graduate schools, accommodation will be arranged. Other participants will need to organise their own accommodation.
The summer school is organised on the initiative of Centre for Digital Humanities and Information Society at University of Tartu (CDHIS) and the National Library of Estonia with the support from the Graduate School of Linguistics, Philosophy and Semiotics; Graduate School of Culture Studies and Arts.
Additional information:
summerschool@nlib.ee
Organising committee
The event is taking place with the support of:
Previous Summer Schools
Monday, 23 August
08:45 – 09:45 - Coffee / Registration
09:45 – 10:00 - Opening of the Summer School, Main Conference Hall
10:00 – 11:30 - Plenary (60+30 min)
11:35 – 13:30 Workshops
13:30 – 14:30 Lunch break
14:30 – 16:00 Workshop
16:00 – 16:30 Coffee
16:30 – 18:00 Workshops continue
19:00 – 22:00 Welcome reception at the Tallinn Town Hall
Tuesday, 24 August
08:45 – 09:30 Coffee / Registration
09:30 – 11:00 Plenary (60+30 min)
Link to the recording: https://youtu.be/lVzPAYtaKmk
11:05 – 13:00 Workshops
13:00 – 14:00 Lunch break
14:00 – 15:30 Workshop
15:30 – 16:00 Coffee
16:00 – 17:30 Workshops continue
Wednesday, 25 August
08:45 – 09:30 Coffee / Registration
09:30 – 11:00 - Plenary (60+30 min)
Link to the recording: https://youtu.be/gC0wohPv6VE
11:05 – 13:00 Workshops
13:00 – 14:00 Lunch break
14:00 – 15:30 Workshop
15:30 – 16:00 Coffee
16:00 – 17:30 Workshops continue
Thursday, 26 August
08:45 – 09:30 Coffee / Registration
09:30 – 11:00 - Plenary (60+30 min)
Link to the recording: https://youtu.be/nWIHoG5OQqQ
11:05 – 13:00 Student sessions
13:00 – 14:00 Lunch break
14:00 – 15:00 Digital Methods in a GLAM
Link to the recording: https://youtu.be/W2jyfAySWyA
15:00 – 15:30 Coffee
15:30 – 17:00 Student session
17:00 – 17:30 Closing, Small Conference Hall
*All events take place in local Estonia time (GMT +3).
Richard McElreath - Causal Thinking for Descriptive Research
23 August 10:00 - 11:3
Causal inference is hard, and everyone knows it. It is less recognized that descriptive and comparative scholarship also rely upon causal inference. How data are sampled and curated influences how we should process the data, in order to accurately describe or compare the people, times, and places of interest. I'll present some examples to illustrate the problems that ignoring causal structure can create, along with some solutions.
Bio: I am an evolutionary ecologist who studies humans. My main interest is in how the evolution of fancy social learning in humans accounts for the unusual nature of human adaptation and extraordinary scale and variety of human societies. Humans are more widespread and successful than any other vertebrate. Simultaneously, humans are unlike any other animal in that we cooperate in very large groups of unrelated individuals. I and my colleagues use formal evolutionary models, experiments and ethnographic fieldwork to address these puzzles.
Mila Oiva - Uncovering the Formation of Fake History Narratives
24 August 9:30 - 11:00
We are living in an era of abundant, fast-circulating and easily twisting information. Among all other stories, nation bound historical narratives shape our identities and worldviews and through that have a potential to shake politics and international relations. Popular historical stories circulating in the world wide web contain a diversity of historical narratives, including ones that consciously contest the academic research and spread simplified understandings based on conspiracy theories, and that thus can be defined as ‘pseudohistory’. This plenary introduces an ongoing project that seeks to understand the process of pseudohistorical content development in the current era. The project explores the global topic through a case study of circulating pseudohistorical narratives on Russian medieval history in the Russian language web. The used data contains 1,5 million websites, blogposts and discussion forum posts addressing the topic of the origins of Russian state in the middle ages. The project utilizes text reuse detection, network analysis and topic modeling in its effort to detect the structures and dynamics of evolution of fake historical narratives.
Mila Oiva (milaoiva@tlu.ee) is a cultural historian enthusiastic about identifying patterns of transnational circulation of knowledge and ideas in a long temporal perspective. Her current ongoing projects are related to circulation of pseudohistorical narratives in the world wide web in the 2000s, newsreel production in the Soviet Union in the 20th century, and circulation of news in the 19th century. She works as a Senior Research Fellow at CUDAN Open Lab at Tallinn University. Her most recent publications include for example Digital Readings of History. History Research in the Digital Era. Helsinki: Helsinki University Press, 2020 (open access, co-edited together with Mats Fridlund and Petri Paju), Yves Montand in the USSR: Cultural Diplomacy and Mixed Messages. Palgrave Macmillan, 2021 (coauthored with Hannu Salmi and Bruce Johnson) and “Topic Modeling Russian History.” In The Palgrave Handbook of Digital Russia Studies, edited by Daria Gritsenko, Marielle Wijermars, and Mikhail Kopotev. Palgrave Handbooks. London: Palgrave Macmillan, 2021.
Link to the recording: https://youtu.be/lVzPAYtaKmk
Jonas Nölle - Virtual Reality: A new tool for studying human behaviour in the lab
25 August 9:30 - 11:00
In recent years, virtual reality (VR) equipment has become much more accessible and affordable for consumers as well as researchers. In this keynote, I will introduce how this technology can be used in the humanities and social sciences to study human behaviour directly in the lab. In the past, experimental rigour and ecological validity have often been conceived as the opposites of a methodological continuum, where researchers would have to sacrifice ecological validity to assure sufficient control. However, VR experiments can provide participants with immersive and realistic tasks that nevertheless allow researchers to tightly control all variables involved. I will show some examples from my own research in the area of language evolution that applies this approach. I used interactive VR experiments to study the impact of the environment on spatial referencing systems. In the real world, cross-linguistic variation has been proposed to interact with the local environment (such as mountains or rivers), but also other sociocultural variables. My experiments were able to show how participants’ spatial referencing strategies were systematically affected by the topographic environment while they had to solve a collaborative task in environments with different affordances. Beyond that, I will discuss the further potential of VR for similar studies into linguistic diversity, cultural evolution and the digital humanities more generally and provide some pointers on how to set up such experiments using modern hardware and 3D engines.
Jonas Nölle (jonas.noelle@glasgow.ac.uk) is a cognitive scientist and evolutionary linguist. He is currently a research fellow at the Social Psychophysics lab at the University of Glasgow in Scotland, where he investigates multimodal communication and specifically the role of facial expressions in social interactions as part of Rachael Jack's Facesyntax ERC project. Previously he worked with the Interacting Minds Centre at Aarhus University and completed a PhD at the Centre for Language Evolution in Edinburgh, where he pioneered the use of interactive VR experiments to study language and communication in realistic and immersive laboratory settings. His research focuses on the study of communication in task-oriented face-to-face interaction, the emergence and evolution of novel communication systems, and the interaction of culture, cognition and the environment in shaping conventions and human behaviour.
Link to the recording: https://youtu.be/gC0wohPv6VE
Dong Nguyen - NLP for robust and reliable measurements
26 August 9:30 - 11:00
We are increasingly using computational text analysis to explore humanities and social science questions. A common and crucial step, then, is measuring social or cultural concepts using computational methods. Recent advances in Natural Language Processing (NLP) are promising. However, there’s increasing evidence that NLP systems are brittle and that results are sensitive to small design choices. In this talk I will discuss two recent studies looking at these topics. I’ll focus on the measurement of biases in NLP and on a recently developed test suite to interrogate hate speech detection systems.
Dong Nguyen is an assistant professor at the department of Information and Computing Sciences at Utrecht University (NL). Previously she was a research fellow at the Alan Turing Institute (UK). She holds a PhD from the University of Twente (NL) and a master’s degree from Carnegie Mellon University (USA). She is interested in developing NLP methods to explore questions from the social sciences and the humanities. At Utrecht she leads the NLP and Society Lab.
Link to the recording: https://youtu.be/nWIHoG5OQqQ
Linda Freienthal – The KRATT at NLE - how did we do it
26 August 14:00-15:00
In this lecture I will introduce TEXTA and give an overview of a project we did last year for the National Library of Estonia. We will describe the process of developing our solution for automatically proposing subject indices (keywords) for books.
Moderator Peeter Tinits
Link to the recording: https://youtu.be/W2jyfAySWyA
Introduction to R and Tidyverse
Lecturer: Peeter Tinits (University of Tartu)
Date: 23 August
Room: Small Conference Hall
Description
R is a scripting language often used for data processing in humanities and social sciences. It provides the means to produce analyses as a reproducible workflow that is transparent to readers and easy to update. We will start with the very basics of R and RStudio, and quickly work our way through to simple data processing via Tidyverse packages. Tidyverse is a set of packages that aims to make R easy to use, especially for beginners. We will learn basic R syntax, data manipulation and overviews in Tidyverse style.
We will rely on personal laptops in this tutorial, you will need to install R (https://www.r-project.org) and RStudio (https://www.rstudio.com) a few days beforehand. Short instructions will be shared.
If you do not have previous experience in R, this workshop is a requirement for attending other workshops using R at this summer school.
References:
Grolemund, Garrett, and Wickham, Hadley (2017) R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O'Reilly Media.
About the instructor
Peeter Tinits is a digital humanities specialist in the University of Tartu, and teaches various digital humanities courses. His own research has been on spelling standardization of Estonian, the rise of environmentalism in the 20th century, and structural changes in film production crews. He is a firm believer that anyone can learn to code, and the humanities have a lot to gain from adopting reproducible research practices.
Introduction to natural language processing using Pandas, Spacy and Stanza
Lecturer: Kristiina Vaik (University of Tartu)
Date: 23 August
Room: Corner Hall
Description
This workshop aims to introduce an alternative programming language used in natural language processing - Python. Python has a simple syntax and transparent semantics and is widely used for analyzing, understanding, and deriving information from structured and unstructured data. This course will start with a basic introduction to Python, we will quickly go through topics such as syntax, variables, data structures, conditionals, loops, and IO. We will continue with an introduction to Pandas, a powerful Python data analysis toolkit used for data exploration and manipulation. Finally, I will introduce spaCy and Stanza. Both are free open-source libraries with many built-in capabilities for text processing, e.g, noise removal, tokenization etc. Additionally, we shall see how to apply spaCy's and Stanza's pre-built models for different downstream tasks, e.g, morphological and syntactical parsing, named entity recognition, etc.
I will provide the students with Jupyter notebooks containing the code used in this tutorial. This course will assume basic knowledge (syntax, what are data types and variables) in Python and build on that. I recommend using your laptop, instructions on what packages to download will be shared beforehand.
About the instructor
Kristiina Vaik is a Ph.D. student at the University of Tartu. She has worked as a programmer in the Natural Language Processing Research Group at the University of Tartu and as a data analyst at TEXTA.
Dataset creation, publishing and maintenance: Practices, solutions and open questions
Lecturer: Niko Partanen (University of Helsinki)
Date: 23 August
Room: Auditorium 2.0
Description
The publication of research data has become increasingly commonplace and required in recent years, but many of the related practices are still evolving, and there is a great deal of variation between scientific fields. There are, however, some basic principles we can aim to follow in order to ensure that our materials are organized in a way that is suitable for our research use and for further distribution. In this workshop, we will use FAIR principles as a general guideline, but we will focus primarily on practical and tested solutions that are available at the moment, while reflecting them in the wider context of FAIR.
In the workshop, we will go through the basic premises of dataset creation and documentation, with a focus on general project management and organization. We will also discuss issues related to good documentation and reproducibility in a wider context. We will study version control, and use GitHub's Zenodo integration as one publishing mechanism. We will also discuss the possibilities for storing closed datasets, and how we generally have to approach the licensing and reuse of different materials.
At the end of the workshop, each participant will be familiar with essential data preparation and publication methods that can be used at the moment. We will also reserve time to discuss the nuances of individual datasets the students have been working with.
About the instructor
Niko Partanen is a PhD student at the University of Helsinki. He has been working for the last ten years with documentation and description of endangered Uralic languages spoken in Russia, especially the varieties of Komi. He has also worked extensively with language technology and natural language processing, with a particular focus on integrating these methods into language documentation workflows. Partanen collaborates with a variety of archives in the digital preservation of legacy data.
Exploring and visualizing your data using R
Lecturer: Andres Karjus (University of Edinburgh)
Date: 24 August
Room: Small Conference Hall
Description
Big data is everywhere, holds unprecedented potential for humanities and social science research, and in general for the understanding of our complex ever-changing world. But understanding big data is hard. Unless you have the right tools. In this workshop, we’ll be exploring and dissecting various real world datasets using R, an excellent programming language for doing anything related to stats and data science. If you've never written code before in your life, this is your opportunity to learn (this superpower) through practical exercises with clear outcomes, primarily in the form of visualizations. We will mostly be using the ggplot2 R package and its addons, starting out with basic examples like scatterplots and time series. We will also look into a few other packages for creating things such as networks and maps, as well as interactive and animated plots. But, with great power comes great responsibility: so we will also spend some time discussing the ethics of data visualization, and approaches to making sure your graphs don't mislead your audience.
About the instructor
Andres Karjus is a research fellow at the ERA Chair for Cultural Data Analytics (CUDAN) at Tallinn University. He obtained his PhD in evolutionary linguistics from the University of Edinburgh in 2020, and holds degrees in linguistics (BA, MA) and computer science (MSc). He uses R daily in his research and has been teaching R workshops and courses since 2015.
Twitter: https://twitter.com/AndresKarjus
Personal website: https://andreskarjus.github.io
An introduction to doing research using social media data
Lecturer: Tuomo Hiippala (University of Helsinki)
Date: 24 August
Room: Auditorium 2.0
Description
This workshop introduces key issues in doing research using social media data. We will discuss the kinds of research questions that may be pursued using social media data; map access to data as of 2021; consider ethical questions related to social media research; and experiment with applicable computational methods from the fields of natural language processing and computer vision. The workshop involves some programming in Python using Jupyter Notebooks, an interactive environment running in a web browser.
Recommended readings
About the instructor
Tuomo Hiippala is Assistant Professor in English Language and Digital Humanities at the University of Helsinki, Finland. His research interests include multimodal communication and urban multilingualism.
Computational harmonization of bibliographic data
Lecturers: Leo Lahti (University of Turku), Iiro Tiihonen (University of Helsinki)
Date: 24 August
Room: Corner Hall (online only)
Description
The Workshop is an introduction to the computational harmonisation of bibliographic metadata.
Bibliographic metadata is a valuable source of historical and cultural information. However, it’s often the result of a long and shifting process, resulting in data coded with various differing notations and conventions. Its full utilisation as cultural heritage or research material is often impossible without harmonisation - the process of standardising, converting and cleaning the data. Using R and focusing on Estonian and Finnish bibliographic metadata, the workshop aims to motivate the importance of harmonisation and to demonstrate its application in practice.
As harmonisation is a vast and often content specific topic, we aim to combine general level motivation about the often fundamental role of harmonisation with practical examples based on real bibliographic data and workflows used in practice. We provide an overview of the process used to harmonise the historical bibliographic metadata of the Finnish National Library (Fennica) and a more focused hands on example using Estonian bibliographic data.
About the instructors
Iiro Tiihonen is a PhD student elect of history at the University of Helsinki. He has a background both in the humanities (M.A, history) and data analysis (M.Sc, applied mathematics) and his academic focus is on the application of bibliographic metadata to quantitatively study the early modern period.
Leo Lahti is associate professor in data science & computational humanities at the University of Turku and long-time member of Helsinki Computational History Group. Lahti got his doctoral degree in machine learning / bioinformatics from Aalto University, Finland, in 2010. The current research of the team focuses on computational analysis of complex natural and social systems. More information at the research homepage: datascience.utu.fi
Introduction to linear mixed models and Bayesian inference
Lecturer: Bodo Winter (University of Birmingham)
Date: 25 August
Room: Small Conference Hall
Description
In this workshop, we’ll be learning to use brms to fit linear models and linear mixed effects models in a Bayesian framework. As many different fields (including linguistics, psychology etc.) are moving away from thinking about data analysis in terms of significance tests, this workshop prepares you for the future. The workshop will teach you about two things: First, the fundamentals of statistical modelling, with a focus on how to interpret linear models. Second, the fundamentals of Bayesian inference. Instructions and materials will be released closer to the start of the workshop.
About the instructor
Bodo Winter is a UKRI Future Leaders Fellow, a Senior Lecturer at the University of Birmingham, UK, and Editor-in-Chief at the journal Language and Cognition. He has written a textbook “Statistics for linguists: An introduction using R” and has extensive experience running workshops on statistical modelling.
Twitter: https://twitter.com/BodoWinter
Personal website: https://bodowinter.com/
Introduction to the Annif automated indexing tool
Lecturers: Osma Suominen, Mona Lehtinen, Juho Inkinen (National Library of Finland)
Date: 25 August
Room: Corner Hall (online only)
Description
Many libraries and related institutions are looking at ways of automating their metadata production processes for example through the adoption of AI technology. In this hands-on tutorial, participants will be introduced to the multilingual automated subject indexing tool Annif (annif.org) as a potential component in a library’s metadata generation system. By completing exercises, participants will get practical experience on setting up Annif, training algorithms using example data, and using Annif to produce subject suggestions for new documents using the command line interface, the web user interface and REST API provided by the tool. The tutorial will also introduce the corpus formats supported by Annif so that participants will be able to apply the tool to their own vocabularies and documents.
The tutorial will be organized using the flipped classroom approach: participants are provided with a set of instructional videos and written exercises, and are expected to attempt to complete them on their own time before the tutorial event, starting at least a week in advance. The actual event will be dedicated to solving problems, asking questions and getting a feeling of the community around Annif.
Participants are instructed to use a computer with at least 8GB of RAM and at least 20 GB free disk space to complete the exercises. The organizers will provide the software as a preconfigured VirtualBox virtual machine. Alternatively, Docker images and a native Linux install option are provided for users familiar with those environments. No prior experience with the Annif tool is required, but participants are expected to be familiar with subject vocabularies (e.g. thesauri, subject headings or classification systems) and subject metadata that reference those vocabularies.
Workshop materials
Exercises and introductory videos can be found in the Annif-tutorial GitHub repository.
The tutorial materials have been created in collaboration with Anna Kasprzik and Moritz Fürneisen of ZBW - Leibniz Information Centre for Economics in Germany.
About the instructors
Osma Suominen works as an Information Systems Specialist at the National Library of Finland. He is the original developer of Annif and is currently leading the automated cataloguing project where Annif is being developed and deployed. He has a doctoral degree in Media Technology (Aalto University) and has a long experience with semantic web technologies, vocabulary services and metadata processes.
Mona Lehtinen is an Information Specialist at the National Library of Finland. She works with the Annif project and is happy to tackle various tasks such as project coordination, community and corpora building and testing the new features of Annif.
Juho Inkinen works as an Information Systems Specialist at the National Library of Finland. His tasks include developing Annif and taking care of the Annif instances hosted on the sites of the National Library.
Solving research questions through computational thinking around experimentation with digital collections / data
Lecturers: Mahendra Mahey (GLAM Labs)
Date: 25 August
Room: Auditorium 2.0
Description
This workshop attempts to help you take some pragmatic steps towards how your research question(s) and related data/digital collections may be examined, analysed and solved ‘computationally’.
You will get an opportunity to tell the group about your research question and present your data. We will then collectively give you some feedback and suggest some practical steps in moving your project forward, breaking things down into manageable steps and examining if any could harness the power of computation. You will then get a chance to start on this journey in this workshop and report back later on any progress or challenges faced.
Don’t have your own data? No problem! We can ensure you get ‘hands-on’ experience of working with cultural heritage data and digital collections and provide some of the typical challenges so that you get a chance to safely experiment and play.
Mahendra will examine and discuss challenges he has faced when experimenting with digital collections through different kinds of experiments in exploring, finding patterns and making new discoveries within data through hundreds of digital projects. He will also invite the workshop participants to contribute their wealth of experience in providing feedback too.
The workshop will conclude with reflections from the delegates and feedback on how to move forward in your enquiry and learn the power of thinking computationally for future research problems and contexts.
See a more detailed workshop description here.
About the instructors
Mahendra Mahey has a background of working with people, digital technology and data as a manager, educator, adviser and community builder in Cultural Heritage, Further and Higher Education for researchers, educators, librarians and businesses both in the UK and internationally.
For the last 8 years he has been helping scholars, artists, entrepreneurs, educators and innovators to work with cultural heritage data while working as the manager of British Library Labs. He has worked with colleagues to bring national, state, university and public Galleries, Libraries, Archives and Museums (GLAMs) together who are planning and already have digital experimental ‘Labs’. The GLAM Labs network aims to share expertise, knowledge and experience in order to build better ‘Labs’ for their organisations and users. Personal website: http://mahendramahey.com/
Student session: research methods troubleshooter
Date: 26 August
Room: Main Conference Hall (only on-site)
Description
This year, we will be hosting a student research session on the last day of the summer school. During the session, each participant will have a chance to present their own work. Two presentation formats are available. You can either introduce your project and brag about a nice solution you came up with to solve a difficult problem (which might be useful for somebody else as well), or, you can present an unresolved problem to the audience for public troubleshooting and brainstorming. The session will be moderated by invited interdisciplinary researchers with expertise in various qualitative and quantitative methods.
There will be no workshops on the final day so everyone can join in on the discussion! 1 ECTS will be awarded to those presenting their work (in either format). However, we encourage everybody to come join the audience and participate in the brainstorming. The details of this event will depend on the number of interested participants; if you are interested in presenting, please indicate so in the registration form and we will contact you with further information.
If you have any questions about this event, please contact Mariann Proos: mariann.proos@ut.ee
This year the Summer School will take place in hybrid form - physically and virtually. Please be advised that the physical event is subject to change due to the COVID-19 situation.
Physical venue is the National Library of Estonia (Tõnismägi 2, Tallinn. See on Google Maps). The Library can be easily accessed by car, by public transport and on foot. It is only a short walk away from Tallinn Old Town. It is going to be the last chance to see the Library building as it is today and has been for the last 30 years. Starting from January 2022, its doors will be closed for renovations for up to 4-5 years.
Online event will be hosted over Cisco Webex conferencing platform. For detailed information and guidelines, please refer to to section Online event.
Workshops, if the situation allows, will take place in rooms that have sufficient amount of power outlets for laptops. Public wifi is available in all areas. To attend workshops, each attendee must bring their own laptop. For necessary applications, the organizers will contact the participants prior to the event.
Lunch is not provided, as there are several restaurants, cafes and pubs nearby that have reasonably priced (4-6 €) daily offers. There is also a cafeteria in the Library.
COVID-19
Due to the constraints set by the Estonian government, a valid Covid-19 passport or certificate is needed to attend the physical event of the Summer School. This means that you need to be either
Please be advised that all participants, lecturers, staff and caterers have to provide a certificate to the organisers. Since the Summer School is a 4-day event, the certificates will be checked every day and each time you come to the event. Persons who fail to prove their infection safety, cannot participate in the event.
All participants coming to the National Library of Estonia are advised to come to the entrance reasonably early. Due to that we are opening the doors a bit earlier, already 8:45 am.
More information about Estonia and COVID-19 restrictions: https://www.kriis.ee/en/restrictions-force-estonia-starting-march-11
Use of Digital Cultural Heritage in Research and Education
4-6 December 2019
National Library of Estonia, Tallinn, Estonia
The theme of the DH Estonia 2019 conference is the use of digital cultural heritage, especially in the field of education. We are witnessing the new habits and behaviours of contemporary internet users in the wide variety of digital environments. Users have become producers, taking over the production of online content.
The conference aims to focus on how the user habits as well as needs and requirements could influence the practices of memory institutions, such as museums, archives and libraries who are striving to increase the volume of digitised sources of cultural heritage. We hope to examine the using and designing of cultural heritage collections through a variety of disciplines ranging from art to history, from museum studies to media studies, and from archival studies to literary studies.
Presentations and discussions on all areas associated with digital humanities, digital memory and digital culture are welcome. We wish to share and exchange novel and innovative ideas, experiences and practices in the field of digital humanities in Estonia and the neighbouring countries.
The conference is organised by the National Library of Estonia, the Estonian Literary Museum and the Estonian Society for Digital Humanities.
Supporters:
National Library of Estonia
Narva Road 11, 15015 Tallinn
+372 630 7100
info@rara.ee
rara.ee/en