Hide Filters
Show Filters
ORDER
TYPE





CATEGORY












See more

FORMAT







See more

COPYRIGHT TAGS

























See more

SOURCE





Datasets overview
Datasets overview
Read more about the possibilities of our data
See more

Currently, Digilab primarily includes datasets created based on the digital collections of the National Library of Estonia. In addition, Digilab also includes data from the Estonian Thesaurus, and in the future, other cultural heritage datasets will be added as well.

RaRa datasets consist of three main components: the digital archive DIGAR, the digitized Estonian Articles DEA, and the Estonian National Bibliography ENB. DIGAR contains various types of data, such as books, periodicals, maps, sheet music, and postcards. These can be accessed separately in Digilab. DEA primarily includes newspaper texts but also includes more recent periodicals. ENB contains metadata about print publications published in or related to Estonia. The datasets in Digilab are created to provide direct access to the underlying data behind the user interface of the digital collections. The datasets and associated information are continually updated.

The National Library of Estonia's digital archive DIGAR

DIGAR (www.digar.ee/arhiiv/en) is the digital archive of the National Library of Estonia, providing access to publications stored in the digital archive. It includes e-books, newspapers, magazines, maps, sheet music, photos, postcards, posters, illustrations, audiobooks, and music files. The format of books and periodicals is mostly PDF or EPUB, while image materials are in JPEG format, and audio recordings are in WAV format.

Digitized Estonian Articles DEA

DIGAR Estonian Articles (dea.digar.ee/?l=en) provides access to digitally born and digitized newspapers published in Estonia throughout history, as well as Estonian-language publications from abroad. It includes newspapers, journals, and ongoing publications registered in the annual publication "Estonian National Bibliography. Periodicals" since 2017.

The portal allows users to browse publications, search for content within newspapers, read full-text articles, add keywords to articles, create lists of found articles, and send them via email. Users can also share discovered information on social networks and perform other actions.

Access is provided to newspapers published since 2014, journals and ongoing publications since 2017, and partially to older newspapers. The portal is updated daily. Older newspapers (1821-2013) are gradually added according to a conversion plan.

Estonian National Bibliography

The Estonian National Bibliography database ERB (www.ester.ee/search~S95*eng) registers data about national publications. National publications include all publications published in Estonia in all languages and publications published in Estonian abroad, including works by Estonian authors and their translations, regardless of their physical format (paper, electronic). The principles for compiling ERB are defined in the document "Principles of National Bibliography Compilation." The database is continuously updated with new data, at least once a week.

During the registration process, a detailed description is created for each publication based on the information contained in the publication. The description includes the title, information about the responsible individuals and organizations for the publication, publisher and printing house details, edition information, physical description (pages, dimensions, etc.), and affiliation to any series. In addition, search features such as keywords, subject indexes, and standardized forms of names for related individuals and organizations are added to the description.

All data in the database adhere to international standards:

  • ISBD (International Standard Bibliographic Description) – for descriptive data;
  • AACR2 (Anglo-American Cataloguing Rules 2) – for search features;
  • UDC (Universal Decimal Classification) – for subject indexes;
  • MARC21 – used as the data exchange format.

The open data in ENB is categorized into groups based on material types: books, periodicals (journals, newspapers, continuing resources), maps, scores, video recordings, audio recordings, image materials, and multimedia resources. In the case of books, the data is further divided into Estonian books and non-Estonian books.

The data of individuals and collectives in the Estonian National Bibliography is also separately available.

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Ut elit tellus, luctus nec ullamcorper mattis, pulvinar dapibus leo.

OCR Text Corrections
OCR Text Corrections
Text Corrections of Newspapers Created as Collaborative Work in DEA

Over the years, nearly 500 users have contributed to correcting the texts in DIGAR, and as a result of their efforts, more than 800,000 lines have been corrected. This dataset contains a selection of pairs of original texts and their corrections.

The dataset, along with its documentation, can be found here: https://zenodo.org/records/13325713

The collection of text corrections in the DIGAR environment has been carried out through collaborative creation.

Preprocessing

The text corrections in the DIGAR archive are saved as change logs, meaning the original text has been reverse-engineered, with the corrected parts replaced by the original content. The texts are heavily filtered. Specifically, only text correction pairs that meet the following criteria are included:

  • The corrected text contains at least 80% alphabetical characters.
  • The difference in length between the original texts and the corrected texts does not exceed 5%.
  • The relative Levenshtein distance between the two texts is at least 0.1.

These criteria are used to exclude texts that are partially edited, contain too many numbers, lists, or other non-alphabetical symbols, or where significant parts have been deleted or added (often to correct segmentation errors).

Quality Assessment

Since the corrections are the result of collaborative creation, they may contain errors and should not be considered the final truth. To provide a rough overview of the quality of the corrected texts, both the original and corrected texts have been processed through GPT-4o mini, which assigned them a readability score ranging from 1 to 5. The following scale was used for this assessment:

The following is the OCR output from a digitized historical Estonian newspaper from {year}. Analyze the text placed after "TEXT" and decide if it is reasonably free of OCR errors. Return a rating on the scale of 1 to 5.

5 - The text is clear and readable. It may contain unusual spellings and use of punctuation throughout, but there are no distorted words.
4 - The text is readable, but contains some distortions of alphabetical characters. These distortions do not impede understanding the text at any given point.
3 - The text is readable with minor difficulties. Words and phrases may be noticeably distorted.
2 - The text is only readable with great difficulties. All or almost all sentences contain severe errors that make it very hard to understand.
1 - The text is unreadable. It contains mostly gibberish and random symbols, almost no words are recognizable.

If you are hesitating between 4 and 5, it is probably a 5. If you are hesitating between 2 and 3, it is probably a 2.

Note: the use of "w" instead of "v" and "=" instead of "-" are elements of historical orthography an do not count as errors.

Do not reply anything else than a number from 1 to 5, unless explicitly asked to do so.

TEXT:
{ocr_transcription}
Books
Books
Books metadata from ENB and DIGAR

Estonian National Bibliography

Books in Estonian

A subset of book metadata from the Estonian national bibliography. A subset of Estonian language books metadata from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=raamat&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography – books in Estonian [Dataset]. https://doi.org/10.5281/zenodo.8228805

Foreign language books

A subset of works in other languages than Estonian from the Estonian national bibliography. A subset of other language books metadata from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=muukeelne&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography – foreign language books [Dataset]. https://doi.org/10.5281/zenodo.8228821

Works in public domain

A subset of public domain works from the Estonian national bibliography. The metadata dataset of public domain works is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=vabakasutus&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography – works in public domain [Dataset]. https://doi.org/10.5281/zenodo.8228830

National Library of Estonia digital archive's metadata

Books

The metadata of the books from DIGAR is accessible in different formats.

  • In the OSF repository as TSV and JSON files (NOTE: At the moment it is not possible to download the metadata via the links. The dataset can be obtained by writing to the Digilab team at digilab@rara.ee).
  • Via the OAI-PMH protocol in XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=book&metadataPrefix=edm
Dataset's distribution in time, language and topics

Standards

The metadata of the standards from DIGAR is accessible in XML format via the OAI-PMH protocol: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=standard&metadataPrefix=edm

Periodicals
Periodicals
Periodicals metadata from ENB, DIGAR and DEA

Estonian National Bibliography

Periodicals

A subset of periodicals metadata from the Estonian national bibliography. The periodicals metadata from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=perioodika&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography periodicals [Dataset]. https://doi.org/10.5281/zenodo.8228827

National Library of Estonia digital archive's metadata

Journals (–2016)

DIGAR comprises journals up to 2016. Their metadata is accessible in the following formats:

  • As TSV and JSON files
  • Via the OAI-PMH protocol in XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=journal&metadataPrefix=edm
Dataset's distribution in time, language and topics

Serial publications

The metadata of serial publications from DIGAR is accessible in XML format via the OAI-PMH protocol: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=serials&metadataPrefix=edm

DIGAR Estonian Articles

Newspapers and journals (2017–)

In DEA there are journals from 2017. The metadata is accessible in MARC21XML format via the OAI-PMH protocol: https://dea.digar.ee/cgi-bin/dea-oaiserver?verb=ListRecords&metadataPrefix=marc21

To access the full text of newspapers, you can use the Jupyter Notebook-based access point here.

A periodically updated overview of the DEA text collection can be seen here.

Graphics
Graphics
Graphical objects metadata from ENB and DIGAR

Estonian National Bibliography

Maps

A subset of maps metadata from the Estonian national bibliography. The metadata of the maps from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=kaardid&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography maps [Dataset]. https://doi.org/10.5281/zenodo.8228811

Graphic material

A subset of graphical works from the Estonian national bibliography. The metadata of the graphics from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=piltteavikud&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography – graphic material [Dataset]. https://doi.org/10.5281/zenodo.8228809

National Library of Estonia digital archive's metadata

Maps

The metadata of the maps from DIGAR is accessible in different formats.

  • As TSV and JSON files
  • Via the OAI-PMH protocol in XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=map&metadataPrefix=edm
Dataset's distribution in time, language and topics

Posters

The metadata of the posters from DIGAR is accessible in XML format via the OAI-PMH protocol: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=poster&metadataPrefix=edm

Postcards

The metadata of the postcards from DIGAR is also accessible in XML format via the OAI-PMH protocol: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=postcard&metadataPrefix=edm

Sound
Sound
Sound recordings metadata from ENB and DIGAR

Estonian National Bibliography

Sound recordings

A subset of sound recordings metadata from the Estonian national bibliography. The metadata of the sound recordings from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=helisalvestised&metadataPrefix=marc21xml

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography – sound recordings [Dataset]. https://doi.org/10.5281/zenodo.8228834

Sheet music

The metadata of the sheet music from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=noodid&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography sheet music [Dataset]. https://doi.org/10.5281/zenodo.8228832

National Library of Estonia digital archive's metadata

Sound recordings

The metadata of the sound recordings from DIGAR is accessible in XML format via the OAI-PMH protocol: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=soundrecording&metadataPrefix=edm

Sheet music

The metadata of the sheet music from DIGAR is accessible in XML format via the OAI-PMH protocol: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=sheet_music&metadataPrefix=edm

Multimedia
Multimedia
Multimedia metadata from ENB

Estonian National Bibliography

Multimedia

A subset of multimedia works metadata from the Estonian national bibliography. The metadata of the multimedia from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=multimeedia&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography multimedia [Dataset]. https://doi.org/10.5281/zenodo.8228819

Video

A subset of video works metadata from the Estonian national bibliography. The metadata of the videos from the ENB is accessible:

  • In Zenodo repository as a TSV file
  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=video&metadataPrefix=marc21xml
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography video [Dataset]. https://doi.org/10.5281/zenodo.8228836

Persons and Organisations
Persons and Organisations
People and collectives metadata from ENB

Estonian National Bibliography

Persons

The metadata of the persons from the ENB is accessible:

  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=person&metadataPrefix=marc21xml

Organisations

The metadata of the organisations from the ENB is accessible:

  • As a ZIP file in MARC21XML format
  • Via the OAI-PMH protocol in MARC21XML format: https://data.digar.ee/repox/OAIHandler?verb=ListRecords&set=organization&metadataPrefix=marc21xml
Estonian National Bibliography
Estonian National Bibliography
Metadata of all ENB subsets

Estonian National Bibliography

The Estonian National Bibliography dataset, which contains the metadata of all subsets. The ENB metadata is accessible:

  • in Zenodo repository as a TSV file.
Dataset's distribution in time, language and topics

How to cite the dataset:

National Library of Estonia. (2023). Estonian National Bibliography [Dataset]. https://doi.org/10.5281/zenodo.8228794

Estonian Subject Thesaurus EMS
Estonian Subject Thesaurus EMS
A thesaurus-structured keyword glossary covering all subjects

The Estonian Subject Thesaurus is a universal controlled vocabulary in Estonian for indexing and searching various library material.
The official name of the thesaurus is "Eesti märksõnastik" and its official abbreviation is EMS. In English the thesaurus is called "Estonian Subject Thesaurus".

The subject terms from EMS are used
- in the online catalogue ESTER
- in the database of Estonian articles ISE
- in the union catalogue URRAM of the Estonian public libraries
- in various other catalogues and bibliographic databases of Estonia.

EMS includes about 61 000 preferred and nonpreferred terms. 

The Estonian Subject Thesaurus is managed by the Estonian Libraries Network Consortium, areas of responsibility are divided between libraries.

Format: Machine-readable MARC21:
Request syntax: https://ems.elnet.ee/teenus.php?vorming=M&sona=[keyword]
Sample request: https://ems.elnet.ee/teenus.php?vorming=M&sona=kaubamärgid

Format: Human-readable MARC21:
Request syntaxhttps://ems.elnet.ee/teenus.php?vorming=i&sona=[keyword]
Sample requesthttps://ems.elnet.ee/teenus.php?vorming=i&sona=kaubamärgid

Format: MarcXML:
Request syntax: https://ems.elnet.ee/teenus.php?vorming=X&sona=[keyword]
Sample request: https://ems.elnet.ee/teenus.php?vorming=X&sona=kaubamärgid

Format: Multiple-word subject terms:
Request syntax: https://ems.elnet.ee/teenus.php?vorming=I&sona=[keyword]+[keyword]
Sample requesthttps://ems.elnet.ee/teenus.php?vorming=I&sona=asutuste+arhiivid

All subject terms including a string(truncating mark %):
Request syntaxhttps://ems.elnet.ee/teenus.php?vorming=I&sona=%[keyword]%
Sample requesthttps://ems.elnet.ee/teenus.php?vorming=I&sona=%arhiiv%

FormatRecord Marc21 Authority in machine-readable format (see https://www.loc.gov/marc/specifications/):
Request syntaxhttps://ems.elnet.ee/id/[keyword ID]#marc21
Sample requesthttps://ems.elnet.ee/id/EMS007185#marc21

FormatRecord Marc21 in human-readable format (Marc21-I):
Request syntaxhttps://ems.elnet.ee/id/[keyword ID]#marc
Sample requesthttps://ems.elnet.ee/id/EMS007185#marc

FormatRecord in MarcXML format (see http://www.loc.gov/standards/marcxml//):
Request syntaxhttps://ems.elnet.ee/id/[keyword ID]#xml
Sample requesthttps://ems.elnet.ee/id/EMS007185#xml

If the ID is in an inappropriate format,the answer is “Terms not found”:
Sample request: https://ems.elnet.ee/id/midaiganes#marc21
If the ID is in an appropriate format but no such ID is found or the term has been deleted, the answer is 0:
Sample request: https://ems.elnet.ee/id/EMS999999#marc21

FormatMachine-readable Marc21:
Request syntaxhttps://ems.elnet.ee/teenus.php?id=[ID]&vorming=M
Sample requesthttps://ems.elnet.ee/teenus.php?id=EMS005160&vorming=M

FormatHuman-readable Marc21:
Request syntaxhttps://ems.elnet.ee/teenus.php?id=[ID]&vorming=I
Sample requesthttps://ems.elnet.ee/teenus.php?id=EMS005160&vorming=I

FormatMarcXML:
Request syntaxhttp://ems.elnet.ee/teenus.php?id=[ID]&vorming=X
Sample requesthttp://ems.elnet.ee/teenus.php?id=EMS005160&vorming=X

A field (except 00X fields) consists of the field number, two indicator positions and sub-fields for data content. An empty indicator position is marked by a slash. The sub-field sign consists of the sign $ and a letter or number. Below the most substantial elements are introduced, read more at https://www.loc.gov/marc/authority/ecadhome.html 

LDR  leader,e.g. 00000nza2200000n%00

001  control number which is EMS ID, e.g. EMS167171

003  code of the issuer of the control number ErEMS

008  fixed-length field for various coded info, the first 6 places indicate the compilation time of the record yymmdd, e.g. 130823|n|anznnbabn||n|

 040  compilation details of the record, do not vary: $aErEMS$best$cErEMS$fems

072  7 EMS subject field number where the term belongs, and EMS code, e.g. $a53$2ems

            The field can be repeated 

150  authorised thematic term, e.g. $ainfokeskkond

151  authorised geographic term, e.g. $aAbja-Paluoja

155  authorised form term, e.g. $aõigusaktid 

450, 451, 455  nonpreferred terms (synonyms) for authorised subject terms, e.g.450 $ainforuum; 451  $aAbja; 455 $anormatiivaktid 

450, 451, 455 9 English-language equivalents, e.g. 450 9 $ainformation environment; 451 9 $aNarva river; 455 9 $alegal acts 

550, 551, 555 related subject terms and their URLs on sub-field $0

$wg – broader term

$wh – narrower term

$w missing – other semantic connection

nt

150  $aalalõualuu

450  $amandibula

450  9$amandible

550 $wg$alõualuud$0https://ems.elnet.ee/id/EMS029481

550  $wh$aalalõuapõnt$0https://ems.elnet.ee/id/EMS149978

550 $aalalõualiiges$0https://ems.elnet.ee/id/EMS147267 

670    source, e.g. $aRegio Eesti Teede Atlas,Regio, 1998.

680  explanation with the sub-field symbol $i, e.g. $iIsikute, organisatsioonide ja süsteemide kogum, milles kogutakse, töödeldakse ja levitatakse infot. Hõlmab ka informatsiooni ennast.

The subject thesaurus is constantly updated and can be downloaded in both MARC21 and MARCXML formats.

Machine-readable MARC21: link

Human-readable MARC21: link

MarcXML: link

Näita veel

Sign up to the National Library Newsletter

    OPEN
    RaRa small building
    Mon-Fri 10—20
    Sat 12—19
    Sun Closed

    Solaris Embassy
    Mon-Sun 10—19
    CONTACT

    National Library of Estonia
    Narva Road 11, 15015 Tallinn
    +372 630 7100
    info@rara.ee
    rara.ee/en

    linkedin facebook pinterest youtube rss twitter instagram facebook-blank rss-blank linkedin-blank pinterest youtube twitter instagram