CLARIN and Libraries 2023: Large Language Models and Libraries

CLARIN, National Library of Norway

01.10.2023

Official event website: https://www.clarin.eu/event/2023/clarin-and-libraries-2023-large-language-models-and-libraries

The workshop builds upon the first CLARIN and Libraries workshop held in the Hague in May 2022 (see here).

This year's workshop will investigate further areas of collaboration between CLARIN-related initiatives and libraries with a special emphasis on building (large) language models in and in cooperation with libraries. The workshop will bring together for the second time a group of people associated with both CLARIN (or other research infrastructures) and libraries. Whereas the first CLARIN and Libraries workshop was particularly concerned with digital content delivery for researchers, the main theme of the second workshop will be large language models and library collections, e.g. technical challenges in building such models and legal implications of model training and use.

The host, the National Library of Norway (NLN), has since 2005 digitised its entire text collections, amounting at present to a large corpus of 160 billion words for Norwegian and has built large language models for text (BERT, GPT-2, T5) and speech (wav2vec, Whisper) on these collections. There will be keynotes from the National Libraries of Norway and Germany on the technical aspects of building such models in a library setting, as well as a keynote on the legal aspects of building large language models from the Swedish National Library.

Participation in the workshop is by invitation. If you are interested in attending, please contact your national coordinator or clarin@clarin.eu. The venue (National Library of Norway, Henrik Ibsens gate 110, Oslo) is located very close to the train station Nationaltheatret. Descriptions for getting to the venue can be found here.

Programme

Tuesday 5 December 2023

12:00 - 13:15	Lunch (Cafeteria, National Library of Norway)
13:15 - 13:30	Welcome
13:30 - 15:00	Introduction to CLARIN and Libraries, wrap-up from last year’s workshop (15 mins)Tour de table: introduction and points for discussion (45 mins)Library collections as data (Sally Chambers)
15:00 - 15:30	Break
15:30 - 17:00	Large language models at the National Library of Norway (Javier De La Rosa)Large language models at the German National Library (Peter Leinen)Discussion: technical aspects (chair: Andreas Witt)
17:00 - 17:30	Sensitive Data in HPC – How secure can it be? Is secure data processing in shared computing environments a dream? (Martin Matthiesen)
19:00	Evening social dinner (Avalon, Munkedamsveien 31, Oslo)

Wednesday 6 December 2023

9:30 - 10:30	Lightning Talks: Participants who have registered for a lightning talk (see separate invitation by e-mail) will have the possibility to introduce their own projects and resources.
10:30 - 11:00	Break
11:00 - 12:00	Legal aspects of large language models in libraries (Jerker Rydén)Discussion: legal aspects (chair: Andreas Witt)
12:00 - 13:00	Lunch (Cafeteria, National Library of Norway)

Address
National Library of Norway
Henrik Ibsens gate 110
0255 Oslo
Norway

CLARIN and Libraries 2023: Large Language Models and Libraries

Programme

Tuesday 5 December 2023

Wednesday 6 December 2023

Liitu Eesti Rahvusraamatukogu uudiskirjaga