User:GFontenelle (WMF)/Sandbox

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

HEADER

Finding data[edit]

This page brings together several ways and tools to find or search data available from files with Structured Data on Commons.

If you are looking for more options, there is also a list of tools to contribute data and make batch uploads to Structured Data on Commons. Another option is this list of tools in the Wikimedia ecoystem

How to search Structured Data on Commons[edit]

The Commons search bar allows Structured Data on Commons statements to be searchable. It works both with the new Commons search, Search media, and with the previous search, Search.

This is possible by using the haswbstatement search, which allows to search for a specific information available on files with Structured Data on Commons.

Digital representation of[edit]

If you want to search for a file in which the digital representation of (P6243) is the painting The Kiss (Q698487), you should search for: haswbstatement:P6243=Q698487

Categories on Commons[edit]

There is also a way to search Structured Data on Commons statements in different Commons categories, using the search bar.

  1. Add "incategory:"
  2. Followed by the name of the category with underlines (_) between each space. Example: Images_from_the_Rijksdienst_voor_het_Cultureel_Erfgoed
  3. Add "haswbstatement:" and the property you wish to seach. Example: haswbstatement:P180

In the end, it should appear like this: incategory:Images_from_the_Rijksdienst_voor_het_Cultureel_Erfgoed haswbstatement:P180

Wikimedia Commons Query Service

The Wikimedia Commons Query Service is still a beta version and based of the Wikidata Query Service. It uses Wikibase and the Wikidata Query Help provides the documentation needed to use it, except for the M IDs, which are described below.

M-IDs[edit]

The only Commons specific part of the Wikimedia Commons Query Service are M-IDs, which are a unique identifier for each file on Wikimedia Commons. They are equivalent to Q IDs (or QIDs) on Wikidata.

Find M IDs[edit]

Individual files[edit]

To find M IDs for individual images, look at a file in the left hand menu and find the Concept URI option. Right click and copy link, this link contains the M ID. This number consists of a letter M, followed by a few numbers.

Multiple files[edit]

The PetScan tool can be used to find M IDs for all the files in a category on Wikimedia Commons. Find the name of the Commons category and chose the following options:

  1. Language = commons
  2. Project = wikimedia
  3. Categories = Name of the category (replacing spaces in the name with underlines _). Example: Files_from_the_Helsinki_City_Museum
  4. Combination = Intersection
  5. Go to the Page properties tab and under Namespaces select the file box.

In the results, Page ID is the M ID (the letter M must be added by the user). The results can be either copied manually or, under the Output tab, there is a range of options for export.

SPARQL examples[edit]

More Commons queries examples are available on this page.

Digital Representations of "David" by Michelangelo[edit]

Files with digital representation of (P6243) set to David (Q179900)

# Digital depictions of "David" by Michelangelo
#defaultView:ImageGrid
SELECT ?file ?image WHERE {
  ?file wdt:P6243 wd:Q179900 . 
  ?file schema:contentUrl ?url .
  # workaround to show the images in an image grid
  bind(iri(concat("http://commons.wikimedia.org/wiki/Special:FilePath/", wikibase:decodeUri(substr(str(?url),53)))) AS ?image)
}

Try it!

Files with multiple "Digital Representations of" statements[edit]

digital representation of (P6243) statements has "single value constraint", or in other words each file can be a digital representation of only one Wikidata item, (use depicts (P180) if there are more objects in the image). The query below finds constraint violations.

SELECT ?file (COUNT(?value) AS ?count)  {
  ?file wdt:P6243 ?value .
} 
GROUP BY ?file 
HAVING ( ?count > 1 ) 
ORDER BY DESC(?count)
LIMIT 100

Try it!

Tools for findability

This table brings together tools that allow users to find or search Structured data on Commons statements.

Tool name / code repository Maintainer What it does Main category Tracked / issues Status
Tool name / code repository Maintainer What it does Main category Phab / issue  Not started yet


Handy links


Audiences[edit]

Impact and benefits for the Wikimedia movement[edit]

The project will affect the Wikimedia movement in the following ways:

  1. Categories and metadata can be created in multilingual ways, so that volunteers with different language skills can work together more easily, and files can be found via other languages than English. Multilingual categories on Commons have been a long-term request from the Commons community.
  2. Wikimedia Commons becomes a lot friendlier and more usable to developers. Structured Commons provides a new infrastructure of fine-grained APIs and other machine-readable endpoints, so that developers both within and outside the Wikimedia community can create consistent, reusable and reliable software that helps with editing, reusing and analyzing Commons media and its associated data. Without structured data, such tools rely on short-term solutions that break or produce bad data when MediaWiki core changes or when the volunteer community updates wikitext or categories.
  3. When it becomes easier to search Wikimedia Commons - in multiple languages! - Wikimedia contributors can more effectively illustrate Wikimedia projects such as Wikipedia. Without structured data, Wikipedians need to know English, need to know the category system on Commons well, and/or need to know the specific terms with which the files are described by uploaders, in order to be able to find suitable illustrations on Commons.
  4. Structured data allows for easier and simpler partnerships with content providers, especially knowledge institutions and organizations with media collections (such as cultural institutions or GLAMs). Without structured data, mass uploads of larger sets of well-described media files to Commons are technically complicated, even with relatively user-friendly tools like Pattypan. With structured data, the precise and complex metadata of files in institutional databases can more easily be integrated into Commons, also on a large scale.

Impact and benefits for other organizations[edit]

  1. With structured data, Wikimedia Commons gains a large, and highly valued, new advantage for partner organizations who donate media: it will finally become possible to follow, and review, changes that have happened to 'their' media on Commons, such as improvements and translations of the metadata. When Wikimedia Commons has refined, structured APIs, it is also possible to import these changes to institutions' own catalogues again. In this way, the Wikimedia community does not only receive materials from GLAMs around the world, but it is also able to give back, in the form of improved and updated metadata, in a clean and consistent format.
  2. Structured data also makes Wikimedia Commons more attractive for knowledge institutions around the world, because a structured environment aligns much better with the advanced metadata in the specialized repositories that such institutions have built during the last decades. Better search and findability of media on Commons also provides a greater incentive to share collections there. Without structured data, the main incentive for institutions to upload to Commons is the volume of Wikipedia page views from pages that contain their media files. By improving Commons itself, expanding the way people can search for images and reuse them, we greatly expand the usefulness of Commons, also of those files that are not used as an illustration on Wikipedia.
  3. Many knowledge organizations, especially in regions like South and Southeast Asia, Latin America and Africa, don't have support from online cultural aggregators like Europeana, Trove and DPLA, and sometimes don't even have the technical capacity for hosting their own digitized collections. Especially with structured data, Wikimedia Commons can fill this gap, becoming a de facto hosting platform and aggregator for cultural media across the world - a reliable venue for sharing cultural heritage content under free formats and free licenses.

Impact and benefits for re-use of Commons media across the web[edit]

  1. Structured data on Commons makes it easier to dynamically re-use and to embed Wikimedia Commons content with proper attribution: because the data behind media is provided in a structured form, via detailed APIs, many content management systems and platforms (such as Drupal and Wordpress) can develop embed tools and plugins that help their end users to use media from Commons, while correctly complying with our licensing.
  2. The vocabulary for describing media files (such as creators, institutions, depicted people, places, animals, plants, buildings, historical events…) is drawn from Wikidata. There, these concepts are linked with the wider internet via identifiers. This allows for cross-internet discovery of relationships between media files - a foundational principle of the semantic web and Linked Open Data.
  3. With structured data, the content on Wikimedia Commons can more faithfully and more consistently be archived by Internet Archive and other digital archiving services, assuring longevity of that content, even if Wikimedia projects disappear. Digital archiving media files becomes easier and more precise when their associated metadata is properly structured.