Official event website: https://www.clarin.eu/event/2023/clarin-and-libraries-2023-large-language-models-and-libraries
The workshop builds upon the first CLARIN and Libraries workshop held in the Hague in May 2022 (see here).
This year's workshop will investigate further areas of collaboration between CLARIN-related initiatives and libraries with a special emphasis on building (large) language models in and in cooperation with libraries. The workshop will bring together for the second time a group of people associated with both CLARIN (or other research infrastructures) and libraries. Whereas the first CLARIN and Libraries workshop was particularly concerned with digital content delivery for researchers, the main theme of the second workshop will be large language models and library collections, e.g. technical challenges in building such models and legal implications of model training and use.
The host, the National Library of Norway (NLN), has since 2005 digitised its entire text collections, amounting at present to a large corpus of 160 billion words for Norwegian and has built large language models for text (BERT, GPT-2, T5) and speech (wav2vec, Whisper) on these collections. There will be keynotes from the National Libraries of Norway and Germany on the technical aspects of building such models in a library setting, as well as a keynote on the legal aspects of building large language models from the Swedish National Library.
Participation in the workshop is by invitation. If you are interested in attending, please contact your national coordinator or clarin@clarin.eu. The venue (National Library of Norway, Henrik Ibsens gate 110, Oslo) is located very close to the train station Nationaltheatret. Descriptions for getting to the venue can be found here.
12:00 - 13:15 | Lunch (Cafeteria, National Library of Norway) |
13:15 - 13:30 | Welcome |
13:30 - 15:00 | Introduction to CLARIN and Libraries, wrap-up from last year’s workshop (15 mins)Tour de table: introduction and points for discussion (45 mins)Library collections as data (Sally Chambers) |
15:00 - 15:30 | Break |
15:30 - 17:00 | Large language models at the National Library of Norway (Javier De La Rosa)Large language models at the German National Library (Peter Leinen)Discussion: technical aspects (chair: Andreas Witt) |
17:00 - 17:30 | Sensitive Data in HPC – How secure can it be? Is secure data processing in shared computing environments a dream? (Martin Matthiesen) |
19:00 | Evening social dinner (Avalon, Munkedamsveien 31, Oslo) |
9:30 - 10:30 | Lightning Talks: Participants who have registered for a lightning talk (see separate invitation by e-mail) will have the possibility to introduce their own projects and resources. |
10:30 - 11:00 | Break |
11:00 - 12:00 | Legal aspects of large language models in libraries (Jerker Rydén)Discussion: legal aspects (chair: Andreas Witt) |
12:00 - 13:00 | Lunch (Cafeteria, National Library of Norway) |
Address
National Library of Norway
Henrik Ibsens gate 110
0255 Oslo
Norway
National Library of Estonia
Narva Road 11, 15015 Tallinn
+372 630 7100
info@rara.ee
rara.ee/en