Commons:Structured data/WMSE white paper on Structured Data on Commons/Fataburen

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
IntroductionWiki Loves MonumentsUNESCO ArchivesMusikverketFataburenReflections and conclusions

Case study 4: Fataburen

In a nutshell

  • We worked with a set of digitized articles which all had corresponding Wikidata articles.
  • Whether, and to what degree information can be duplicated between SDC and Wikidata is an important issue to discuss.

Wikimedia Sverige has done several projects focused on books and other documents shared by libraries and other GLAMs in Sweden. These projects have involved creating Wikidata items for the publications and uploading openly licensed content to Wikimedia Commons, as well as working with Wikisource in Swedish. We looked back at some of this material in order to improve its structured data, taking advantage of the synergies between Wikimedia Commons and Wikidata that SDC enables.

Fataburen – journal articles on Commons and Wikidata

[edit]
The article Allmogeforskningen och etnologien has both a scan on Commons and a Wikidata item.
Back in 2018, we worked with a collection of out-of-copyright digitized articles from the journal Fataburen (Q10494316) published by the Nordic Museum (Q1142142). We created Wikidata items for the articles, as well as uploaded 200 scans to Wikimedia Commons. In 2021, these articles became the focus of our SDC project.

What's particular about these articles is that each document file corresponds to a Wikidata item that contains its metadata. This made for an interesting case study. This situation, where the same book, manuscript, painting or other artwork has both a Commons file and a Wikidata item, is not uncommon when working with GLAM collections. This has inspired questions about the relationship between Wikidata and Structured Data on Commons, which we believe are going to become more and more important as more users, especially GLAM staff, start contributing to SDC. We know from following community discussions and other contributors' experiences that this is a question that often comes up and can cause confusion – when annotating a digitized work of art or similar item which has a Wikidata item, what data goes where? And what about objects without Wikidata items – should every work of art or other object whose existence can be confirmed e.g. by a museum catalog get its own Wikidata item?

In the discussion Original work and digital representation several users bring their perspective to this topic. It is also brought up in Sandra Fauconnier's article Structured Data on Commons and GLAM: open questions and fresh challenges, where she notes that making a clear distinction between an object and its digital representation is of crucial importance to GLAM partners. She also remarks that while data duplication (storing two copies of the same information, on both Wikidata and Wikimedia Commons) can be useful, making it easier for Commons users to find and discover images they are interested in, it also brings risks of data that is unclear and out of sync, with mistakes on either side that are less easy to discover, and two communities maintaining similar data in parallel.

In any case, the property digital representation of (P6243) is what we use to create a connection between a digital representation of an object, such a book or artwork, and its Wikidata item. Working with this particular dataset, the Fataburen articles, we decided to duplicate some of the data, but not all of it, as presented in the diagram below.

Our reasoning was that a certain level of data duplication was overall beneficial, as it makes it easier for users to query for interesting articles, such as those authored by a certain writer or those in Swedish. Combining results from Wikidata and Wikimedia Commons using the Wikimedia Commons Query Service requires higher skills than querying one of the platforms at a time, which increases the entry threshold for new users.

Results

[edit]

The ca. 200 digitized articles from the Nordic Museum's Fataburen collection were provided with SDC statements. The most important statement added in this project was digital representation of (P6243), as it links the document to the corresponding Wikidata item. Some of the information in the Wikidata items was duplicated in the files, such as author (P50), language of work or name (P407) and URN-NBN (P4109). While it is not optimal to duplicate information this way, it lowers the threshold for users searching for the files. Hopefully this will facilitate the discussion on data duplication between SDC and Wikidata, as this question will always arise when dealing with files, such as artworks and documents, that are notable enough to have Wikidata items.