Nineteenth- century law journals now available in full text on DLC
April 09, 2018
The Digital Libraries Connected platform has a vast collection of historical legal journals with searchable full-text content, browsable digitised images and carefully curated structural and meta-data, which has recently been made accessible to the public.
The project, entitled Legal Journals of the 19th Century (Juristische Zeitschriften des 19. Jahrhunderts) and funded by the German Research Foundation (DFG), has provided online access to a vast collection of legal journals through the kleio database since 2002. Seventy-five journals were selected, compiled in uninterrupted series, supplemented with structural and meta-data, and published. (For more on the selection criteria, see the project summary.) The internationally recognised collection spans these 75 journals, including 1320 total volumes and 635 752 pages. As of 2017, full-text content generated by OCR was integrated into the collection, and this content was also supplemented with hand-curated structural and meta-data. Merging the older and newer data necessitated migrating the content to a new databank platform to contain this greatly increased volume and to make the full-text content searchable.
(Technically speaking, migrating the data entailed converting the structural data, which were in EBIND/SGML, and the OCR data, which were in Abbyy-XML, into TEI/XML. The parallel structural and full-text data in TEI were then fused together. Bibliographical meta-data were exported from the library catalogue before being entered into the DLC system along with the digitised images and the structural and meta-data in TEI.)
The process was completed on 8 March 2018, and we are proud to announce the publication of the Legal Journals of the 19th Century (German-language link) on the DLC system. Functions available in DLC include displaying the digitised images and the accompanying full text, reading and browsing through the works, navigating through the tables of contents, searching through the full texts and much more.
While the full-text data may contain errors, owing to the lack of manual correction, the quality of the source material and many titles printed in Fraktur script, the collection still represents a great quantity of work (c. 1.5 GB plaintext and over 215M tokens) in languages varying by domain. Anyone interested in using the full-text data as a single corpus is welcome to contact us for further information.
We welcome everyone one to make use of the collection and are happy to provide further information.
Click here to view the collection.