Commons talk:Structured data/Archive 2021

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

What is correct way to describe coordinates from multiple sources?

I have currently set of photos from Finna which will have coordinate information from

Finna
- Generally coordinates informed by GLAM.
- Coordinates of the photographer OR Coordinates of the depicted object (can be either and it cannot be known from the data)
- Derived from street address or location by name (can be quessed from the data)
Digitransit Address Lookup.
- This is used also for crosschecking the Finna's location data
Ajapaik
- GPS data of the position photographer when the rephotograp has been taken
- Manual geotagging: P625)) (P625))) OR coordinates of the point of view (P1259) + heading (P7787)

So, if I want to add coordinate information and I would like to store also information where the information comes from then i should add some source qualifier for the data? In example:

coordinate location (P625) OR coordinates of the point of view (P1259) with criterion used (P1013)

Also should I use item of the service (ie. Finna (Q18760310), Ajapaik (Q28845848)) as a value for P1013? --Zache (talk) 10:39, 11 January 2021 (UTC)

Possibly relevant: phab:T253053. Jean-Fred (talk) 16:39, 11 January 2021 (UTC)

Adding perceptual hashes to SDC

Hi, I just calculated the pHash values using imagehash library for the photos from Finna for detecting the duplicates. I was thinking to add the hash values of the Finna photos to the SDC. (Example: [1]; See also longer writing for how imagehashes works can be found here User:Fae/Imagehash) Another solution could be registering unique properties for each hashing algorithms (ie own property for sha1, md5, blockhash.io, phash, averagehash etc) but using checksum (P4092)+determination method (P459) is how it is currently done in Wikidata items --Zache (talk) 10:41, 18 January 2021 (UTC)

I made a Property proposal for this as it feels that it would be a bad ide to use checksum (P4092)+determination method (P459) as a long term solution. --Zache (talk) 20:19, 22 January 2021 (UTC)

Questions still unanswered

I'm still waiting for User:RIsler (WMF) - or anyone - to answer the points raised in February, in Commons:Village pump/Archive/2020/02#Misplaced invitation to "tag" images. Those being the points he said were "not being ignored"; and including (but not only) "requests to show where there is consensus for the [tagging] tool to operate, or to use depicts statements in the manner it is [and] requests to explain how the tool, or the invitation to tag, can be turned off." Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:31, 22 December 2020 (UTC)

@RIsler (WMF): I think a core issue is that currently the invitation message is not worded the way the Commons community wants it to be worded. Text that's displayed on WikiCommons should conform to the wishes of the community. Can you point to the page in which the current message gets stored and which can be edited by WikiCommons users or admins to change the message to explain the nature of using the depict statement better to users? ChristianKl (talk) 14:32, 22 December 2020 (UTC)
- Patches for the text of system messages can be submitted for review for change through gerrit. In this case, for English, the file in question would be: https://github.com/wikimedia/mediawiki-extensions-UploadWizard/blob/master/i18n/en.json (search for "cta" to find the strings). As for the wishes of the community, those vary, and changes to current messaging would need to reflect that to pass review swiftly. For stats on how other community users currently use the SuggestedTags tool, see statistics here. Additionally, due to some previous community discussions, we've already changed the Popular queue to prioritize images that are not categorized and therefore in need of a bit of metadata. This alleviates the issue where some community members had concerns about less specific tags added to media that already had lots of detailed/specific metadata including categories. RIsler (WMF) (talk) 21:15, 22 December 2020 (UTC)
  - And still User:RIsler (WMF) has not answered the questions (the ones I put to him in February, which he says he is not ignoring). Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 10:21, 23 December 2020 (UTC)
  - @RIsler (WMF): Which community discussions make you believe that only adding suggesting incorrect statements to images that don't already have statements alleviates concerns of the community that you are encouraging users to add incorrect statements?

The problem isn't about how specific tags are but whether the "tags" are made with approriate Wikidata properties. The term tag as it's used on websites like Flickr includes tags that have nothing to do with depicts (P180). ChristianKl (talk) 02:01, 24 December 2020 (UTC)

I kind of agree that depicts (P180) statement name might be confusing when applied to lets say sound recordings, but I am not aware of other Wikidata properties for describing content of the file. Do you mean that for example we can use inscription (P1684) for free-text inscriptions, etc.? I also agree that flickr uses tags differently, as you can tag things not visible, but I do not think we ever described our depicts (P180) statement as equivalent to flickr tags. ChristianKl, can you provide examples of what you mean. --Jarekt (talk) 14:05, 24 December 2020 (UTC)

https://www.wikidata.org/wiki/Wikidata:Property_proposal/tag was a property proposal for a property that holds tags. It had examples such as File:Родовые столбы сэргэ.jpg-tag-shamanism, that don't really fit into depicts but if we call it tagging we encourage uses to apply such terms with depicts. ChristianKl (talk) 22:19, 25 December 2020 (UTC)

Updating messaging

Currently, the messages that speak about tags seem problematic:

"mwe-upwiz-mv-cta-description": "Commons has a new tool that will suggest tags for images you upload if you haven't already added tags. When you confirm accurate tags, you're helping make images easier for everyone to search for.",
"mwe-upwiz-mv-cta-checkbox-label": "Yes, I'd like to get notifications when my uploads have tags that are ready for my review",
"mwe-upwiz-mv-cta-final-cta": "Ready to start tagging right away? Give the tool a try by tagging popular images now!",
"mwe-upwiz-mv-cta-user-preference-unset": "You will no longer receive notifications to tag your uploads.",

An improvement might be:

"mwe-upwiz-mv-cta-description": "Commons has a new tool that will suggest metadata for images you upload if you haven't already added metadata. When you confirm accurate metadata, you're helping make images easier for everyone to search for.",
"mwe-upwiz-mv-cta-checkbox-label": "Yes, I'd like to get notifications when my uploads have metadata that is ready for my review",
"mwe-upwiz-mv-cta-final-cta": "Ready to start adding metadata right away? Give the tool a try by adding metadata to popular images now!",
"mwe-upwiz-mv-cta-user-preference-unset": "You will no longer receive notifications to add metadata to your uploads.",

Maybe in some cases there should be structured before metadata. What do you think about the messages? ChristianKl (talk) 02:18, 24 December 2020 (UTC)

@ChristianKl: just wanted you to know that Foundation staff were mostly off work through the beginning of January, so your suggestion hasn't been reviewed or discussed yet. Keegan (WMF) (talk) 15:24, 4 January 2021 (UTC)

Why not create a form of wiki that doesn't require proof of what's said in the article and would leave the reader to prove it themselves? — Preceding unsigned comment was added by 213.205.198.41 (talk) 11:08, 2 February 2021 (UTC)

That seems off-topic. If it is relevant, draw me the connection.
What use, exactly, is a site where people are free to go and tell lies? Oh, wait, that would be Parler. - Jmabel ! talk 13:08, 2 February 2021 (UTC)

Off topic it is but I don't have a clue how to post this idea in the correct place. Its just that Wikipedia started out as a place that wasn't trusted and now a person can only write or edit it if they are into the complexity or complicated process. There needs to be a place were numpties like me can put stuff on wiki something without being jumped on (wiki bollox might be a good name. As I said its up to the reader to prove for themselves if its true. No editing or deleting just add to the article. Peoples contributions would be traceable by their IP address (although I don't know what that is). Anyhoo just an idea. Back to the way it was. Good or bad I don't know. Thanks. — Preceding unsigned comment was added by 213.205.198.41 (talk) 14:52, 2 February 2021 (UTC)

Commons Structured Data SQL access?

Is Commons Structured Data available on the replica database servers on wmf cloud? Could anyone give me a pointer on how to extract commons structured data on a larger scale (e.g. getting the about 16 million camera locations). the wb_property_info table on commonswiki.analytics.db.svc.eqiad.wmflabs commonswiki_p is empty. Where is that data? I'd like to avoid running SPARQL queries via HTTP, unless the performance is comparable to SQL queries. --Dschwen (talk) 19:56, 28 January 2021 (UTC)

It's been several years since I used these, but https://dumps.wikimedia.org is an easy way of doing it.

Recommend ethical caution with mass analysis of location or other embedded data. This may have side-effects of outing or doxxing people with data they did not think would be revealing or may have forgotten, or not even have realized was being embedded by their camera. --Fæ (talk) 21:08, 28 January 2021 (UTC)

Thanks Fæ! Point well taken, although I don't think the WikiMiniAtlas will add that much more exposure to those coordinates (I wish it did :-D). I can only display a very limited amount of images on the map at a given scale, so to find more obscure images you'll have to zoom in a lot (-> low chance of discovery for a specific image). --Dschwen (talk) 21:12, 28 January 2021 (UTC)

Hmm, it seems that the dumps don't offer anything beyond the direct SQL access I have to the commons replica (which does not seem to contain the structured data). Or is there a separate dump for the structured data, that I'm not seeing? --Dschwen (talk) 21:15, 28 January 2021 (UTC)

There are RDF dumps at https://dumps.wikimedia.org/other/wikibase/commonswiki/ (dumps.wikimedia.org → Other files → Structured Data dumps from Commons). —Tacsipacsi (talk) 21:44, 28 January 2021 (UTC)

Ah, ok. In the meantime

SELECT ?file ?image ?camera ?object ?heading WHERE {
  ?file (wdt:P625|wdt:P1259) ?location.
  ?file schema:contentUrl ?url .
  OPTIONAL { ?file wdt:P625 ?object. }
  OPTIONAL { ?file wdt:P1259 ?camera. }
  OPTIONAL { ?file wdt:P7787 ?heading. }
  bind(wikibase:decodeUri(substr(str(?url),53)) AS ?image)
} LIMIT 1000

would be the first stab at a SPARQL query. --Dschwen (talk) 22:40, 28 January 2021 (UTC)

30 GB compressed?! Holy cow! I guess I'll stick with querying the SQL replicas for external links for now. Looks like not all pics with location templates have coordinate structured data yet anyways. --Dschwen (talk) 22:53, 28 January 2021 (UTC)

Also, to process the images I need image metadata, such as filetype, size in bytes, width and height. Those don't seem to be in the structured data, meaning I'd have to do SQL queries on top of the coordinate extraction... 16million separate ones, since I cannot join SQL and SPARQL queries. --Dschwen (talk) 22:58, 28 January 2021 (UTC)

I do not think it will be much use to you but there is also some API that can be useful SQL based coordinates or SDC for a given file. --Jarekt (talk) 01:15, 29 January 2021 (UTC)

Oh and SQL query for coordinates I use is this one although I never used it on a large scale. --Jarekt (talk) 01:19, 29 January 2021 (UTC)

Thanks Jarekt, but I've already run something like that. It takes about 5h for 16 million coordinates. I have the externallinks based extraction down I think (https://github.com/dschwen/wikiminiatlas_servers/blob/master/maintenance/newextract.py). I was looking for a fast way to query structured data. --Dschwen (talk) 02:21, 29 January 2021 (UTC)

You can do something like this in tools to extract the coordinates from the mediainfo dumps

grep_coordinates.sh
#!/bin/sh
bzip2 -dc /public/dumps/public/other/wikibase/commonswiki/latest-mediainfo.nt.bz2 |head -n 1000 |grep -P "P625|P1259|P7787" | bzip2 > coordinates.out.bz2

and then

> jsub -once -j y -cwd -N grep_coordinates -mem 768m -o grep_coordinates.log nice ./grep_coordinates.sh

--Zache (talk) 10:09, 29 January 2021 (UTC)

Ok, I'll try that (minus the "head -n 1000" :-)) --Dschwen (talk) 15:40, 29 January 2021 (UTC)

This looks like a good start. I get stuff like

<https://commons.wikimedia.org/entity/M76> <http://www.wikidata.org/prop/direct/P1259> "Point(6.5702 52.9913)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
<https://commons.wikimedia.org/entity/M76> <http://www.wikidata.org/prop/P1259> <https://commons.wikimedia.org/entity/statement/M76-2DA5F4AA-859C-4D02-8C80-6FE2A1A6A5C7> .
<https://commons.wikimedia.org/entity/statement/M76-2DA5F4AA-859C-4D02-8C80-6FE2A1A6A5C7> <http://www.wikidata.org/prop/statement/P1259> "Point(6.5702 52.9913)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
<https://commons.wikimedia.org/entity/statement/M76-2DA5F4AA-859C-4D02-8C80-6FE2A1A6A5C7> <http://www.wikidata.org/prop/statement/value/P1259> <https://commons.wikimedia.org/value/09d7bd5c6a8f8857017b5dab9ef1573f> .
<https://commons.wikimedia.org/entity/M1209> <http://www.wikidata.org/prop/direct/P1259> "Point(6.617 51.9415)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
<https://commons.wikimedia.org/entity/M1209> <http://www.wikidata.org/prop/P1259> <https://commons.wikimedia.org/entity/statement/M1209-9E9B2D2D-EE93-4D62-8CDA-9ABB136E006E> .
<https://commons.wikimedia.org/entity/statement/M1209-9E9B2D2D-EE93-4D62-8CDA-9ABB136E006E> <http://www.wikidata.org/prop/statement/P1259> "Point(6.617 51.9415)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
<https://commons.wikimedia.org/entity/statement/M1209-9E9B2D2D-EE93-4D62-8CDA-9ABB136E006E> <http://www.wikidata.org/prop/statement/value/P1259> <https://commons.wikimedia.org/value/0902c17312d60e3205c2834a2f3f554a> .
<https://commons.wikimedia.org/entity/M1210> <http://www.wikidata.org/prop/direct/P1259> "Point(6.61582 51.942)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
<https://commons.wikimedia.org/entity/M1210> <http://www.wikidata.org/prop/P1259> <https://commons.wikimedia.org/entity/statement/M1210-082B5CF8-A6DF-4098-9591-88FAA29E8DEF> .
<https://commons.wikimedia.org/entity/statement/M1210-082B5CF8-A6DF-4098-9591-88FAA29E8DEF> <http://www.wikidata.org/prop/statement/P1259> "Point(6.61582 51.942)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> .
<https://commons.wikimedia.org/entity/statement/M1210-082B5CF8-A6DF-4098-9591-88FAA29E8DEF> <http://www.wikidata.org/prop/statement/value/P1259> <https://commons.wikimedia.org/value/dfa770e5941ee77b9f58e109b6d8e3a9> .

where the M???? identifies the particular image. And as I just realized the number behind the M is the page_id. Ok, I can make that work. Instead of bzipping I'd pipe that straight into python script for parsing and write the info out to my database for further processing. --Dschwen (talk) 15:49, 29 January 2021 (UTC)

And with

grep -P "statement/P625|direct/P625|statement/P1259|direct/P1259|qualifier/P7787"

I can reduce the noise to a minimum. Or am I missing important info that way? --Dschwen (talk) 15:52, 29 January 2021 (UTC)

Looks like "direct" and "statement" are redundant. Odd. --Dschwen (talk) 21:10, 29 January 2021 (UTC)

This is what I've come up with bases on the suggestions above: https://github.com/dschwen/wikiminiatlas_servers/blob/master/maintenance/parse_csd_dump.py (I just pipe the unzipped dump straight into that script and it populates a database table with the coordinates.) --Dschwen (talk) 22:41, 29 January 2021 (UTC)

Note that the statements for one media file are not necessarily clumped together in the dump. Unfortunately that means the current script produces key errors when inserting the second set of statements for one media file after inserting data from other files in between. One solution I see to that is to extract everything in to a python dictionary in memory to consolidate all claims, the alternative is changing the database schema to insert each statement/claim separately. --Dschwen (talk) 14:59, 2 February 2021 (UTC)

Dschwen As far as I understand it there should be a single camera coordinate statement per file. A file might have several object coordinates but a single camera coordinate, and attempts to add a second {{Location}} template will result in errors. Also coordinates of the point of view (P1259) has single-value constraint (Q19474404) So I would assume that there would be very few (if any) files with multiple coordinates. I tried to write a query to search for files multiple P1259, but could not get them to work (see d:Wikidata:Request_a_query#Help_needed_with_Commons_query). --Jarekt (talk) 19:02, 2 February 2021 (UTC)

Yeah, good point about multiple object coords. If I just wanted to extract the camera coordinate it would probably be easier, but I'd like to have the object coordinate (if it's unique) as a fallback. --Dschwen (talk) 20:12, 2 February 2021 (UTC)

This is where a JSON dump similar to the one for Wikidata would probably come handy. The Wikidata JSON dump contains one entity per line that can be interpreted as a standalone JSON object itself, and then be filtered easily (see here). For simple parsing tasks, the JSON dump is IMO by far the most useful, particularly in comparison to the RDF dumps; no idea why there is none for SDC. —MisterSynergy (talk) 00:55, 3 February 2021 (UTC)

Dschwen I was wrong and apparently we have many thousands files with duplicated coordinates, some of the conflicting with each other. The query below (crafted by User:Dipsacus fullonum here)

SELECT ?file
{
	?file p:P1259 ?coord1 , ?coord2 .
    FILTER (?coord1 != ?coord2)
} Limit 10000

Try it!

finds at least 10k of files with single-value constraint violation for coordinates of the point of view (P1259). Looking at random sampling of the file histories I mostly see 2 issues:

Files like File:BlueBellHillA229 0019.JPG that have 2 identical copies of all SDC metadata. Both added in the same second by BotMultichill (@Multichill: ). I run into almost identical issue while using QuickStatements to add Wikimedia VRTS ticket number (P6305), which took a while to clean up, so it might be some database issue.
Files like File:Mains of Findochty - geograph.org.uk - 248887.jpg that have 3 sets of P1259. Two of them (one wrong) added by User:ZI Jony with QuickStatements half a year after Multichill added coordinates.

I think both of the issues need to be corrected. Any idea how to proceed? --Jarekt (talk) 03:46, 3 February 2021 (UTC)

To be precise, the query only finds max 5K different files as each file is in the results at least twice (all permutations of each statement with P1259 bound to ?coord1 or ?coord2 is found) If you want to find all concerned files, you can do it with a series of queries each testing a different subset of all files. A way to split the files into usable subsets is to use the file dimensions. You can for instance use schema:height for the height. By using the rangeSafe hint it is fast to select images in a given height interval. So to get all files with multiple P1259 statements you can use:

SELECT DISTINCT ?file
{
    ?file schema:height ?h .
    hint:Prior hint:rangeSafe true .
    FILTER ( 0 < ?h && ?h <= 100 )

	?file p:P1259 ?coord1 , ?coord2 .
    FILTER (?coord1 != ?coord2)
}

Try it!

SELECT DISTINCT ?file
{
    ?file schema:height ?h .
    hint:Prior hint:rangeSafe true .
    FILTER ( 100 < ?h && ?h <= 200 )

	?file p:P1259 ?coord1 , ?coord2 .
    FILTER (?coord1 != ?coord2)
}

Try it!

SELECT DISTINCT ?file
{
    ?file schema:height ?h .
    hint:Prior hint:rangeSafe true .
    FILTER ( 200 < ?h && ?h <= 300 )

	?file p:P1259 ?coord1 , ?coord2 .
    FILTER (?coord1 != ?coord2)
}

Try it! etc. If any happens to timeout, try a little smaller intervals. --Dipsacus fullonum (talk) 05:22, 3 February 2021 (UTC)

Thanks. Yeah, with 16 million files with locations having bad data in 1% would already be 16,000 files to fix, which seems like a daunting task to do manually. The duplicate viewpoint could be consolidated automatically if the coordinates are identical. Playing around with the WikiMiniAtlas I've come across tons of pictures in the oceans that are a result of (for example)

Extremely coarse coordinates (only degrees near India)
Swapped lat/lon in the Golf of Oman (pictures that are supposed to be in Europe)
Flipped longitude sign in the Indian Ocean (for pictures that are supposed to be in south America)

There likely is lots more to fix. The WMA makes it easy, by showing the thumbnails directly on the map, which helps make a decision whether a file belongs to the location it's listed at. I've manually corrected one or two hundred files already, but it is time consuming, as you have to remove the SDC claims first and then edit the template. Tool assist would be nice here as well. --Dschwen (talk) 05:45, 3 February 2021 (UTC)

Converting the GLAM's subjects tags to P180 values?

Do we have opinnion on which granularity we should target in P180 values? It would be possible to do translation from subject tags used by Finnish GLAMs to Wikidata items using Finto ontologies. However, it is possible that there will be a lot of quite random information tags too if it is done automatically.

In example "wooden house" in first example photo which is in background and not a key element. Also it would be likely that human would add wikidata item of Ratakadun poliisiasema (Q98432303) instead of poliisilaitokset as there is wikidata item for that specific police station.

In second photo we can see that @Apalsola: already added values Tervahovi (Q11897001) and barrel (Q10289) + of (P642) = tar (Q186209) to P180 values. This so well defined information that it is out of reach of any automatic conversion.

So my question is that when I am adding the tags, should i try to focus on very specific tags or should I add all of the tags which I can translate to wikidata which would be more general? Also do we know which strategy would be better for Media search development?

Photograph in Finna	Subjects jugend -> Yso 8952 -> Art Nouveau (Q34636) puurakennukset -> Yso p3650 -> wooden house (Q279118) poliisi -> Yso p3398 -> police (Q35535) poliisilaitokset -> Yso p13271 -> ??? ~~Helsinki, Ratakatu 8, 10, 12~~ (address) ~~1910 -luku~~ (date)
https://www.finna.fi/Record/musketti.M012:HK19700502:254 Photograph in Finna ]	Subjects aitat -> yso p11033 -> granary (Q114768) terva -> yso p19346 -> tar (Q186209) tervatynnyrit -> ??? työvaatteet -> p9195 -> workwear (Q828980) varastot -> yso p21311 -> warehouse (Q181623) ~~Toppila, Oulu~~ (address) ~~1898~~ (date)

--Zache (talk) 09:29, 8 February 2021 (UTC)

alt text property proposal

In case you missed the notice on the village pump a couple days ago: there's a Wikidata property proposal for adding an alt text field to the structured data of images. On one hand this would enable all kinds of programmatic workflows, like Toolforge tools for creating or translating alt text for images; would allow using alt text for images outside wikitext; and would probably improve the search result ranking of images. On the other hand it is being argued that alt text by its nature cannot be centralized since it always needs to reflect article context. More opinions on the matter would be welcome. --Tgr (talk) 14:05, 3 March 2021 (UTC)

Query structured data and Wikidata

In order to showcase publicly the stuff we do with SPARQL on query.wikidata.org (or embed it at some place), it would be necessary to query the author and license from the images. At the moment AFAIK you can display P18 only without any copyright info which makes all the learnings and cool stuff a very private fun thing (if you don't restrict on public domain paintings) :-| – Now that we have all the structured information on Commons, has this been done already? A query to merge Wikidata items with Structured Commons? I understand it's a different Wikibase instance, would it be thinkable at all? Any approaches under way? Thanks! (please ping me for an answer) --Elya (talk) 20:49, 3 March 2021 (UTC)

Captions in wrong langcode

like special:diff/435428822, a string of Cyrillic letters entered in a Latin language (english). these obvious errors should be somehow automatically detected or at least tagged for review and correction.--RZuo (talk) 11:56, 25 February 2021 (UTC)

Done @RZuo: That caption was nonsense and obviously not in good faith so I removed it. Thanks for pointing it out! --Sabelöga (talk) 14:55, 25 February 2021 (UTC)

@Sabelöga:

Not done, the question was not to remove that specific caption (anyone can do that), but to prevent/tag such edits automatically.

@RZuo: Wikidata abuse filter #33 does something like this, maybe it could be adopted to Commons. I’m not an admin myself, so I can’t edit abuse filters, but hopefully you can find someone able and willing to help at COM:AN. —Tacsipacsi (talk) 23:17, 26 February 2021 (UTC)

@Sabelöga: it was a good-faith edit, but just entered in a wrong langcode.

a request for a filter was submitted but not followed up so far: Commons_talk:Abuse_filter/Archive_2021/03#New_AF_request:_purely_non-Latin_alphabets_entered_into_Latin-script-language_captions.--RZuo (talk) 11:47, 14 March 2021 (UTC)

How can I add SDC to Categories?

Can I define a depicts (P180) for a category? E.g. for single topic categories. How does this technically work? --Herzi Pinki (talk) 13:01, 15 March 2021 (UTC)

SDC is only for files. Category might be connected to Wikidata and you could add depicts there. --Jarekt (talk) 13:12, 15 March 2021 (UTC)

Proposal: Move captions from file information to structured data tab

Tracked in Phabricator
Task T276718

Currently we have two tabs on files on Commons:

File information to display information about the file generally in a infobox template format
Structured data to see and edit structured data in a key/value format

Currently the captions are in the "file information" tab. This is weird because these captions are part of the structured data. These captions cause visual clutter. I propose that the captions are moved to the structured data tab. I opened phab:T276718 for that. Please comment here to see if we have community consensus for this. Multichill (talk) 18:33, 7 March 2021 (UTC)

In my opinion, you should set up several tabs and group the content by topic. However, more than 5 to 7 tabs would be confusing.--XRay 💬 19:16, 7 March 2021 (UTC)

I agree with XRay, personally I would (also) prefer to see a separate tab for them, but then also to have more (written) information in that tab. If that ain't an option on the table then I do support the move to the SD tab. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:14, 7 March 2021 (UTC)

At least one thing gives me hesitancy about this: I rarely pay attention to the structured data tab because, being a human rather than a bot, I rarely find information there that I can't glean as well or better from the wikitext. However, I really do like to have an easy way to see and remove the incorrect captions that are frequently added to my photos. So perhaps this should be a user option? - Jmabel ! talk 02:57, 8 March 2021 (UTC)

In short, is this one more case of trying to move information from wikitext off to the SCD…? -- Tuválkin ✉ ✇ 13:13, 8 March 2021 (UTC)
- No it's not. You seem to have completely misunderstood the proposal. This is about a change how the user interface looks. Multichill (talk) 18:40, 8 March 2021 (UTC)

That’s a relief, then. As for any extra tabs displaying info that’s not in the file page’s wikitext, I’m okay with anything, provided I get to hide them as I currently do. -- Tuválkin ✉ ✇ 19:23, 8 March 2021 (UTC)

Sounds reasonable, as captions are usually more or less redundant with the description of {{Information}}. --Zolo (talk) 18:18, 9 March 2021 (UTC)

Whatever the result of this proposal, IMHO there should be a gadget (opt-in) to show the structured data tab as the main/primary tab, if a (robotic?) user wants it that way. Strakhov (talk) 19:20, 9 March 2021 (UTC)

Agree --Jarekt (talk) 13:14, 15 March 2021 (UTC)

While the captions are indeed structured data, they also represent descriptive information in a format that is more human readable than the rest of the structured data fields, which is why they are in the main "file information" tab. For users who are not fluent in English, the caption may also be the most easily accessible descriptive information available in their language. I do think that clearer modeling of how to use the "caption" field would benefit everybody. If you could say more about how having captions on the main page gets in the way of your workflows or of easily viewing file information, or if they represent some other challenge I don't understand, I'd love to hear about it. We're certainly open to making a change if users would prefer to move the captions to the structured data tab. However, the team has other priorities right now, so I'm not sure when we would be able to focus on it. (I've also said this in phab:T276718.) CBogen (WMF) (talk) 20:43, 15 March 2021 (UTC)

@CBogen (WMF): the caption from the structured data provide more or less the same information as the description field of {{Information}}. Actually, when the field from {{Information}} is left empty, it just shows the structured data caption. Showing the caption above {{Information}} is usually useless. --Zolo (talk) 17:00, 24 March 2021 (UTC)

Is it possible or is it planned to have references for facts in SDC?

Hi all

Is it possible or is it planned to have references for facts in SDC? I mean in a similar way to how Wikidata provides a space for references for facts? I realise that a lot of SDC will be self evident from the image e.g number of sugar cubes or other depictions. However some facts would benefit from having some kind of reference e.g to confirm an image depicts a specific location or person (this could help with miss identification). Also to identify where structured data has been imported from an external source e.g when an image has been imported from a museum website.

Thanks

John Cummings (talk) 19:39, 15 March 2021 (UTC)

@John Cummings: See phab:T230315

Tracked in Phabricator
Task T230315

I believe technically you can add them (eg by bot, QuickStatements etc), you just can't see them, apart from in queries. I am definitely one of those who thinks it may often be very useful to record where a statement has come from. Jheald (talk) 14:13, 16 March 2021 (UTC)

Thanks very much Jheald, very helpful to know that other people think this would be useful and it has been documented as a request. John Cummings (talk) 14:50, 16 March 2021 (UTC)

Text about Image Annotator

@GFontenelle (WMF): [2] The sentence you added is not targeted for translation.--Afaz (talk) 12:38, 26 March 2021 (UTC)

bad request

File:Tram in Kärntner Ring, at twilight (Vienna, Austria).jpg is linked via SDC with <title of image> (https://www.wikidata.org/wiki/Special:EntityPage/M84681576), which gives a bad request. No idea. --Herzi Pinki (talk) 22:38, 26 March 2021 (UTC)

@Herzi Pinki: Where? The link is obviously wrong (it should be https://commons.wikimedia.org/wiki/Special:EntityPage/M84681576), but I can’t find it anywhere on the SDC tab. —Tacsipacsi (talk) 00:09, 28 March 2021 (UTC)

if you edit the above image, you will have below: Wikidata entities used in this page where there currently is a single entry:

Tram in Kärntner Ring, at twilight, photograph taken from the back window of another tram (Vienna, Austria): Title, Other (Statements)

exhibiting the described behaviour. best --Herzi Pinki (talk) 05:37, 28 March 2021 (UTC)

@Herzi Pinki: I see. You didn’t mention edit interface, so I thought you’re referring to the page that appears if I click the link, i.e. the view (non-edit) interface. Seems like phab:T250611 and/or phab:T240358. —Tacsipacsi (talk) 20:03, 28 March 2021 (UTC)

sounds as if this is the problem. I thought as the problem is occurring for quite a while, nobody saw it until now. AGF = nobody saw it; ¬ AGF = nobody considered it important enough to care for it. (I had an issue from 2015 which was triaged last week). Thanks for identifying the suitable tasks. (I thought Wikidata entities used in this page to be independent from the view of the page, sorry) --Herzi Pinki (talk) 20:23, 28 March 2021 (UTC)

Geograph restarted as structured data upload

I restarted the Geograph upload, see Special:ListFiles/GeographBot. All metadata is stored as structured data and {{Geograph from structured data}} is used to display it in the wikitext. This way content and presentation are separated. Multichill (talk) 11:17, 10 April 2021 (UTC)

Wikitext is not presentation. -- Tuválkin ✉ ✇ 12:33, 10 April 2021 (UTC)
- Thank you for this extremely helpful and constructive feedback. Multichill (talk) 12:41, 10 April 2021 (UTC)

You’re an admin, Multichill. At least be polite. -- Tuválkin ✉ ✇ 20:23, 10 April 2021 (UTC)

SDC at GLAMHack 2021 on 16th and 17th April + can we request a Query Service dump request on a specific day?

Hi all

This coming Friday and Saturday (16th and 17th) GLAMHack 2021 (organised by Beat Estermann) is taking place (its free).

Please come and take part, play around with content and data and make something fun, you don't have to be technical (I'm not). Its a nice excuse to talk to other Wikimedia people since we can't hang out like normal.

I'm organising a Structured Data on Commons hackathon team for the event where we will play around with Commons, data and try to make new and exciting things, I'll be providing some basic SDC materials for people who are new to it. Register for free here

One of the things that would be really helpful to make this event successful is if we could request that the dump which the Commons Query Service runs off is updated on the night of the 16th so that any additional content we want to add can be queried on the second day.

Thanks very much John Cummings (talk) 20:13, 10 April 2021 (UTC)

Clothing and costume accessories

Do we have best practices for deescribing clothing and costume accessories in photos and artworks? "Wears" seems logical but causes an error. Do we prefer "shown with features"?

Compare:

Thanks - PKM (talk) 00:38, 17 April 2021 (UTC)

I've fixed the constraint of wears (P3828). However, that information shouldn't be at Commons anyway for both mentioned cases but rather at the Wikidata items for the paintings. Best --Marsupium (talk) 08:21, 17 April 2021 (UTC)

Easy enough to remove it. But are we really saying we don't want to be able to search clothing details from portraits in Commons? (This is why SDC confuses me.) - PKM (talk)

For Commons itself, the most useful thing here would be an ImageNote placed precisely on the item of clothing in question. - Jmabel ! talk 15:32, 21 April 2021 (UTC)

A painting can't wear anything! It can depict a person wearing something, but if I look at the current examples for the use of wears (P3828) it is always associated with the actual person and not with a depiction of the person. So in this case it would be "Elisabeth I" wearing the "Three Brothers jewel" on George Gower's "The Ermine Painting" Sorry, no Wikidata Item for the "Three Brothers Jewel", so I can't show how this would work. --Wuselig (talk) 18:42, 21 April 2021 (UTC)

Modeling picture taken in a certain municipality

What is the best way to model the relationship between the picture and the municipality? (see [3] for background)--So9q (talk) 06:54, 21 April 2021 (UTC)

Please let me just add a bit of context: the picture depicts a shelter which is in that municipality. The municipality has a Wikidata item, but the shelter does not and probably never will. Thanks all in advance! Syced (talk) 11:20, 21 April 2021 (UTC)

The shelters in Wikimedia Commons will all have a wikidata item if I get to decide. I have cleared it with the wikidata community. The only thing we need is an external source of truth that we can link to. One Swedish municipality Uppsala has shared their shelters as open data: https://opendata.uppsala.se/datasets/vindskydd-2. They can be imported as wikidata items because they are useful for Wikivoyage. See example use here: https://en.wikivoyage.org/wiki/H%C3%A4rn%C3%B6sand--So9q (talk) 13:02, 21 April 2021 (UTC)

Project Grant application for SDC support in OpenRefine: feedback and endorsements welcome

Hello everyone! Since 2019, it is possible to add structured data to files on Wikimedia Commons (SDC = Structured Data on Commons). But there are no very advanced and user-friendly tools yet to edit the structured data of very large and very diverse batches of files on Commons. And there is no batch upload tool yet that supports SDC.

The OpenRefine community wants to fill this gap: in the upcoming year, we would like to build brand new features in the open source OpenRefine tool, allowing batch editing and batch uploading SDC :-) As these are major new functionalities in OpenRefine, we have applied for a Project Grant. Your feedback and (if you support this plan) endorsements are very welcome. Thanks in advance, and many greetings – Pintoch (as OpenRefine developer) and SFauconnier (talk) 09:24, 16 March 2021 (UTC) (aka Spinster, as member of the OpenRefine steering committee)

QuickStatements is a batch upload tool which does work for SDC. Strobilomyces (talk) 11:40, 16 March 2021 (UTC)

Hi Strobilomyces, it is correct that you can batch edit SDC QuickStatements (with workarounds), but the last time I tried to use it, it definitely did not support uploading new files. In the project grant application we do talk about QuickStatements and other batch tools that already exist, what they can and cannot do, and why we think it would be very valuable to have extended SDC functionalities in OpenRefine. SFauconnier (talk) 13:11, 16 March 2021 (UTC)

Hi. Thank you for your answer. Uploading new files and fairly arbitrary wikitext through batch would be very useful for me. I thought that was an independent question from editing SDC. When I looked at the existing tools I thought they were too restrictive on the Wikitext which could be loaded, but perhaps I should look again. I certainly hope that your upload tool will also support Wikitext. Strobilomyces (talk) 16:04, 16 March 2021 (UTC)

Help:Gadget-ACDC is a gadget to add a collection of structured data statements to a set of files. Why do we need a new tool instead of better support of existing ones? --Schlurcher (talk) 16:09, 16 March 2021 (UTC)

I really like ACDC and use it a lot, but it runs in the browser with all the limitations and problems this has. I also used QuickStatements for Wikidata and for my usecases it was good. I never tried OpenRefine. --GPSLeo (talk) 16:45, 16 March 2021 (UTC)

I think that we are comparing apples and oranges here when we are comparing Quickstatements, SDC and Open refine. Open Refine is tool for human-assisted or automatic data matching and data conversion which can store the result to Wikidata (or to SDC i hope). I personally use python for this but I can perfectly understand why somebody would like to use some higher level tool for that. Zache (talk) 12:11, 6 May 2021 (UTC)

digital representation of (P6243)

I'm not so sure about using digital representation of (P6243). I think it's the right thing to do with digitized postcards or photographed paintings. It may also be the case with sculptures and statues in museums. However, an uncertainty begins here. For sculptures in the public area (as in File:Madonna_an_der_Aa_-_Muenster_-_2021.jpg) I find it rather inappropriate. Then one would have to treat building photos (like File:Langer Eugen.jpg) in the same way. Or automobiles. Or persons. And then the question arises, only if it is the only object or also if it is a compilation (like File:Bonn-Gronau Post Tower Schürmann-Bau Langer Eugen Luftaufnahme 2015-05.jpg). --XRay 💬 08:40, 3 May 2021 (UTC)

When WCQS will be out of beta?

Do we have any idea when WCQS would be out of beta? I think that the main feature what I would be searching is to be able query it without OAUTH. Zache (talk) 12:14, 6 May 2021 (UTC)

Thanks for checking in on this. The Search Platform team has recently created the necessary tickets for productionizing WCQS (including authentication) and will be starting this work in the next few weeks, with the aim of fully moving WCQS to production by mid-summer (actual timeline may change). MPham (WMF) (talk) 21:14, 18 May 2021 (UTC)

Media search feature suggestion (Redlinks)

In the old search if I would search for "Category:Nonexistent category" it would say "There doesn't appear to be a page with that name" and add a redlink where I was able to create this, I would like to suggest for this to also be added to the Structured Data on Wikimedia Commons (SDC) Media search feature. Overall I like it and I like how it's "a Google-clone", however, I do not like that like external search engines it is mostly content consumption based rather than content creation based (as the old search was more). In general I think that this new search engine is superior, but I do would like to see the old features be added to this one as well --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 19:29, 23 May 2021 (UTC)

any tools to remove bad SDC statements

Category:Artworks_with_wrong_Wikidata_item has today bunch of files with bad digital representation of (P6243). The files seem to have a mess of bad templates, and then based on them a bot added bad digital representation of (P6243). There are bunch of tools to add SDC, but is there any way to remove all P6243 from sculpture photos in this category without going into individual files? I know one way with QuickStatements tool but it is rather painful. --Jarekt (talk) 17:43, 28 May 2021 (UTC)

ACDC tool perhaps? Fuzheado (talk) 17:36, 25 June 2021 (UTC)

The impact of Suggested Edits on Commons: findings and discussion

Hi everyone. I've just posted some data, findings, and recommendations regarding Suggested Edits, which may be of interest to watchers of this page: User:Rhododendrites (WMF)/Suggested Edits. Feedback, thoughts, and questions welcome on the talk page. See also the main notification, with key points, on the village pump. --Rhododendrites (WMF) (talk) 18:26, 10 June 2021 (UTC)

Two developer positions related to OpenRefine and SDC (paid, freelance, part-time, remote)

Hello everyone! As we have been awarded the grant that I mentioned here earlier, OpenRefine will soon start developing Structured Data on Commons functionalities. For this purpose, we now have two Junior Developer job openings (paid contractor positions; part-time, fully remote). Obviously Wikimedia-savvy developers are strongly encouraged to apply:

Junior Developer - Wikimedia Development (6 months, from September 2021 till February 2022)
Junior Developer - OpenRefine Development (8 months, from November 2021 till June 2022)

All the best, SFauconnier (talk) 14:33, 9 July 2021 (UTC)

First time I ever heard of OpenRefine. One click and lo: Yes, it is yet another Google washout — couldn’t make up this stuff. That’s what all the work we’ve been doing in Wikipedia, Commons, and Wikitionary is for, folks: We create an amazingly sucessful new kind of online resource through a truely innovative form of online peer collaboation, only to have the donations it accrues being syphoned off to rescue aborted bad ideas discarded off the table of corporate incompetence. Huzzah. -- Tuválkin ✉ ✇ 15:14, 9 July 2021 (UTC)

New user script for WD linking to WCQS

See https://www.wikidata.org/wiki/User:So9q/DepictsThisSense.js I first tried to embed an iframe but got an error because of the X-HTTP-HEADER being "DENY" on WCQS so I made it a link instead, see https://www.wikidata.org/w/index.php?title=User:So9q/DepictsThisSense.js&oldid=1475881792 for a version with the error.--So9q (talk) 10:46, 8 August 2021 (UTC)

A picture is worth a thousand words

If you want a very succinct way of conveying to your average Commons user who doesn't understand what "Depicts" is about, I found it while visiting https://isa.toolforge.org/ and saw the following Commons image at the top of that page:

(click to enlarge)

That one photo really clarifies the purpose of "Depicts", at least to me. If that image was displayed via a prominent "See an example" button near where you add "Depicts" data, I think that would help reduce confusion to those unfamiliar with SDC (like me). Itsfullofstars (talk) 06:21, 7 August 2021 (UTC)

And it would be much better, if all the information would be set at the image. ;-) --XRay 💬 12:00, 8 August 2021 (UTC)

Good catch! I didn't check to see if any of the info in the image had actually been entered by anyone. Someone has recently filled in the missing info, probably due to your comment. - Itsfullofstars (talk) 17:21, 9 August 2021 (UTC)

Wikimania 2021 session

Hi everyone! There will be a session during Wikimania 2021 about Structured Data on Commons. Find the abstract, schedule, and other details for the session below.

Abstract

The development of Structured Data on Commons, or SDC, took place between 2017 and 2019. Since then, it has been an important feature to make media files on Wikimedia Commons more discoverable and available for more people, in different languages. It is transforming Commons into a multilingual, accessible, and machine-readable platform, as well as making its files and data more interconnected, findable, and reusable.

In this presentation, we plan to provide a background on Structured Data on Commons, share some GLAM-Wiki initiatives using SDC, give an update on the future direction of the project, and have a conversation about metadata modeling and the use of the depicts (P180) property.

Schedule

Introductions and background on SDC - Andrew Lih
Structured data adoption in the community - Giovanna Fontenelle
Update on future direction of SDC - Carly Bogen
Discussion about modelling and depicts (discussion will follow to Unconference - August 15 21:00 - 21:30 UTC, floor 1, table A)

Details:

Structured data on Commons: today and tomorrow

Speakers: Andrew Lih, Fiona Romeo, Giovanna Fontenelle & Carly Bogen
Date: Sunday, August 15
Time: 19:45 - 20:45 UTC
Place: Building 5
Language: English
Format: Roundtable
Assume basic understanding (90% of the community)

Find further details on the page for the page for the session. We hope you can join us! --GFontenelle (WMF) (talk) 21:08, 11 August 2021 (UTC)

How do I get the WikiData page or identifier for a file that has structured data?

Example: File:Mikoyan-Gurevich MiG-35 MAKS'2007 Pichugin.jpg Intralexical (talk) 15:41, 29 June 2021 (UTC)

[4] (action info) --XRay 💬 16:06, 29 June 2021 (UTC)

Even easier is the Concerp URI link in the Tools section in the left column. The Mxxxxx in the end is the identifier, in your example https://commons.wikimedia.org/entity/M5561527 -> M5561527. Ainali (talk) 18:13, 29 June 2021 (UTC)

Hello @Intralexical, XRay, and Ainali: ~~from my point of view, a single file does not have a unique wikidata object ID. Instead,~~ a file might have multiple wikidata objects assigned using SDC.

For example, the given image above currently has assigned the object d:Q220529. All images with this object can be found with search for haswbstatement:P180=Q220529 (motive: Q220529)

SDC als can be queried using SPARQL. Examples: User:Stefan Kühn/SDC. --M2k~dewiki (talk) 18:52, 29 June 2021 (UTC)

Just saw on Commons:SPARQL_query_service, that The only Commons specific part of the Query Service are M IDs, which are a unique identifier for each file on Commons, equivalent to Q IDs in Wikidata. --M2k~dewiki (talk) 18:57, 29 June 2021 (UTC)

@M2k~dewiki: which was exactly my answer: the M-id on Commons, which is the identifier for the file. Ainali (talk) 19:33, 29 June 2021 (UTC)

Also noted that the (least defacto standard) is that commons media id is M + page_id of the image. Ie, it can be generated from page-id:s. --Zache (talk) 14:13, 30 June 2021 (UTC)

Was the Minefield tool already mentioned? https://hay.toolforge.org/minefield/ - --OlafJanssen (talk) 14:06, 17 August 2021 (UTC)

Search structured data

Hi, Can a search feature be added whereby you can search captions?, We have an IP who thinks it's clever to to add Opel <model> to non-Opel cars[5] and my guess is that they're probably doing this to a lot of images,
I did try searching "caption / es Opel Corsa" but obviously got nothing back
Thanks, –Davey2010^Talk 14:03, 16 August 2021 (UTC)

You can search captions by using the keyword hascaption in the search bar. For example, hascaption:es Opel Corsa. Hope that helps. CBogen (WMF) (talk) 14:13, 16 August 2021 (UTC)

Brilliant thank you so much CBogen (WMF) - You're a life saver! :), Much appreciated, Thanks, –Davey2010^Talk 15:35, 16 August 2021 (UTC)

References to Structured Data on Commons

Hi everyone! In the last few weeks, a greater interest in SDC references was raised in the community. Especially on Telegram groups and on a Phabricator ticket (T230315). As part of the GLAM team, we understand that if this feature is developed it would be very useful to have more GLAMs sharing their content on wiki, by allowing a better provenance system and for campaigns, such as 1Lib1Ref.

We already have the answer that some users find this important, through the ticket on the phabricator, but the Structured Data team would like to better understand if this is, in fact, the interest of Commons users as well. Please, share your thoughts here and let it be known if that is in fact true and we might have this feature developed at some point. Thanks! -- GFontenelle (WMF) (talk) 21:00, 18 May 2021 (UTC)

Feedback from Andrew Lih

This issue has become a point of discussion largely because of my recent proposal for METbot on Wikidata to add depicts statements, wholesale, from The Met Museum's tagging project. So I will try to lay out the full spectrum of issues as I see them now. It's pretty vast, so buckle up.

Summary: To be responsible adders of depiction info into both Wikidata and Commons, we wanted to add attribution information to the depicts (P180) statements. We had two choices:

Reference/sources statement: determination method (P459) -> Metropolitan Museum of Art Tagging Initiative (Q106429444) (example)

Some others have suggested stated in (P248) as an alternative, which would make this differ from the qualifier approach below.

Qualifier: determination method (P459) -> Metropolitan Museum of Art Tagging Initiative (Q106429444) (example)

Issue: Some highly notable artworks from The Met might have a Wikidata item, so it makes sense to add this depiction information (and attribution info) in Wikidata. But if an artwork is not highly notable, the artwork would have a file in Commons, and via SDC, we could add the depiction info (and attribution info) there. Here's the problem - SDC doesn't have reference statements, or at least they are not visible in the UI. Since SDC is running Wikibase on the back-end, you actually can add reference statements to SDC via a tool like Quickstatements (example) or via API. But ordinary users have no way of viewing or editing these. For all practical purposes, they are invisible, and not useful, in Commons.

Solutions: After some discussion, we had some choices:

Qualifier only. Do we just stick to the qualifier solution for both Commons and Wikidata? That would make it consistent across both projects and easier to do Wikibase Cirrus Search and SPARQL queries. But the Wikidata METbot discussion showed some folks were not happy with this, and that using the reference statement was a Wikidata practice that should be adhered to.
Qualifier on Commons; Reference on Wikidata. The downside here is that we would have a system where Wikidata does this one way, and SDC, with effectively no reference statements, would need to do it by another. Being able to search across Wikidata and SDC for all attributed P180 statements would be complicated by the fact that we have two different practices. This would be true not just for The Met, but for any mass additions of P180 statements across the projects.
Qualifier AND References on Wikidata; Qualifers on Commons. This is the maximalist approach, of trading off data being replicated in multiple places (generally an undesirable practice in data science) for user-friendliness (being more discoverable and expected across projects). To be clear, we have our fair share of duplicated data in Wikidata in the service of usability. For example, as a best practice, we repeat collection (P195)/inventory number (P217) and the inversion of that with inventory number (P217)/collection (P195) in the interests of discoverability. In general, if there is a pragmatic reason for a decision in the interest of usability, we don't necessarily stay completely strict about the data modeling. Would a double addition of a reference and qualifier statement fall into this category?

There's no easy answer to these. But here is a possible summary matrix:


Approach for P180 attribution	Wikidata	Commons	User experience	Challenges
Reference statement only	Good	Bad	Not practical, as references are functionally invisible on Commons	Not practical, as Commons has no reference statements
Qualifier statement only	Fair/Bad	Good	Good for Commons, but inconsistent with expected Wikidata practices	Wikidata users expect attribution in references
Qualifier and reference statements	Good/Fair	Good	Metadata found in expected places for all	Keeping replicated data consistent across projects

Other factors. Here are some other things to factor into the decision making process:

Qualifiers have elevated status and references do not. Qualifiers added to SDC and Wikidata are indexed by Wikibase Cirrus Search, whereas reference statements are not, making them second class citizens in Wikibase. With qualifiers, very fast lookups are possible with the basic search interface. That is, with a qualifier approach entering this into the search box on Commons or Wikidata: haswbstatement:P180=Q2934[P459=Q106429444] works fast and precisely to return all goats depicted in artworks, according to The Met tagging project. If this was only a reference statement, you would need to use the full SPARQL query engine.

Scalable approaches for future attribution metadata. Right now most of the discussions on how to properly attribute metadata have been focused around a particular database (a GLAM API or database dump) or a reference URL (a published web page). But we have already encountered situations where depiction information is being added by a "tool" and not a firm "reference" per se, such as via the Wikidata Distributed Game, ISA, Wiki Art Depiction Explorer, Wikidata Image Positions or even through AI/machine learning engines. Does that affect how we decide how to model attribution metadata? In these cases, is a "reference" statement the right direction? This may argue for a more general qualifier as a way to record "how" P180 was arrived at, and not just "who said that."

An extended conversation around this issue can be found here at the bot proposal page and in the GLAM newsletter:

The Future. As a more general observation - the more we put off trying to find a long-term, cooperative decision making process for Commons and Wikidata modeling practices, the worse we are going to be. Those of us working on GLAM topics tend to be the ones first encountering the harsh realities in this area. We are the ones doing the most with object records and mass systematic uploads of images with rich metadata, with one foot in each domain of Commons and Wikidata.

Thanks. -- Fuzheado (talk) 15:49, 19 May 2021 (UTC)

To be clear, the above post is primarily documenting the dynamics we see today as I try to navigate how to implement a solution that's useful right now. It is not necessarily arguing for or against a particular implementation of references in SDC in the immediate future. But hopefully, it does shine a light on the difficulties of dealing with this. Myself: I'm somewhat undecided on next steps. Working with references in Wikidata/SPARQL has always been a bit obtuse (using prov:wasDerivedFrom), and since these are not indexed by Wikibase Cirrus Search, I'm unclear about the future of this approach. -- Fuzheado (talk) 16:17, 19 May 2021 (UTC)

Feedback from John Cummings

Copied from Phabricator so its in the same place as other's feedback

I think this would be useful for a number of reasons, I'm basing this on 10 years of working as Wikimedian in Residence for cultural institutions, UN agencies and parts of the EU. The main use case is from my perspective is for any content created by external organisations, which runs to 10s of millions of files on Commons. Many of these organisations share quite extensive metadata with their content way beyond depicts, copyright and author. The main benefits I see are the same as for references on Wikipedia, verifiability and credit.

Wikipedia Allowing users to know that the metadata comes from an organisation creates a level of trust in the information. I think SDC could be widely used and useful on Wikipedia but without references to provide verifiability it seems unlikely it will get used, in the same way Wikidata data without references are blocked on English Wikipedia infoboxes in a lot of situations. Another benefit for Wikipedia specifically is to make creating Wikipedia articles for things depicted on Commons (eg an object in a museum) easier because the references which are collated in SDC can most probably be reused on Wikipedia.

Organisations sharing content Many organisations adopt an open license specifically so they can share it on Wikimedia projects, most of my job in the UN the last 5 years has been around helping orgs adopt open licenses. Generally speaking organisations who share content on Commons want recognition and metrics around page views and a clear delineation between their content and Wikimedia community contributions to avoid confusion from readers. Have references in SDC will give the organisations credit for the metadata they share and reduce concerns about their content being confused with community contributions which may be incorrect. It will also encourage them to start using Wikidata and SDC on their own website eg providing multilingual labels. There's an extra barrier to them adopting open licenses with the CC0 license for SDC statements, generally organisations are willing to share under CC BY or SA for content but CC0 is difficult because is doesn't by its nature give them credit for their content. We get around this with Wikidata because we can say 'there will be references so people can see you added this data'. Generally speaking 'please can you spend a significant amount of time to understand and change your license so you can share your content with us, we won't give you credit for any of it' is really not going to work.

John Cummings (talk) 10:21, 20 May 2021 (UTC)

An additional thought that's not on Phabricator, WMF have just announced the new Commons search has been implemented which uses SDC to suggest search results to help people find information. Directing people to information they cannot verify because there are no references doesn't feel very great. John Cummings (talk) 10:34, 20 May 2021 (UTC)

Feedback from PKM

I agree we should capture references in SDC, and if we're going to do that, I would prefer us to add a proper UI for entering and editing references as we do in Wikidata. - PKM (talk) 20:56, 20 May 2021 (UTC)

Other priorities first

If I look at the current state of SDC, it's immature and developing at best. Time and effort should be invested to make it grow for example by making files more findable and making it easier to curate files. Commons doesn't really use references at the moment so I'm very reluctant to adding a new feature while we're still sorting out the basics. One thing with setting priorities is to assign high priority to some things and lower priority to other things. I don't think we should give this priority. Is this feature really helping Commons? I'm not convinced. Multichill (talk) 18:32, 21 May 2021 (UTC)

I would understand that to some extent, if the level of effort is high—but from what I am understanding, it wouldn't necessarily be a heavy lift to implement this suggestion, since the feature is already standard in Wikidata. If that is the case, I don't really see how implementing it would be taking the focus away from higher priority tasks (whatever that might be). Dominic (talk) 00:37, 26 May 2021 (UTC)

Feedback from Dominic Byrd-McDevitt

I agree strongly with Andrew's assessment, in which he largely focuses on issues around attribution and standardization, but I want to add a couple more potential benefits for references in SDC. First, it's worth pointing out that references also help user-contributed statements stand out from those added by bot from a data source, and help reduce potential points of conflict between the Wikimedia community and institutions. If I am importing descriptive metadata from a repository, I would prefer to reference that source URL (and institution as publisher) in all the statements associated with that import. When synchronizing the metadata in the future, I would then only make changes to statements using that repository as reference, to ensure I never remove or overwrite an editor's contribution. This is how the Cleveland Museum of Art's bot on Wikidata is programmed to work, for example. The other benefit of references is that they typically (for URLs) use an "access date" qualifier, which allows users to assess how recently the statement was checked against the source, and is also useful for workflows when bulk updating from a source (like institutional metadata) that can change at any time. Dominic (talk) 13:58, 1 June 2021 (UTC)

Initial assessment

Thank you very much to you all for sharing your feedback! We discussed this issue yesterday and made an initial assessment, both in terms of effort and concrete actions. I don't think we can provide you with a definitive answer about it yet, but we're evaluating it.

What we assessed for now is that it is a complex task, because it depends heavily on which kind of inputs we're dealing with, and will also require some significant re-design before we can accurately estimate the workload, and prioritize it among the other significant projects in our backlog. We can promise you an update in the coming weeks. Sannita (WMF) (talk) 17:58, 23 June 2021 (UTC)

Update

Tracked in Phabricator
Task T230315

The Structured Data team will be working on implementing SDC references in the coming weeks. You can follow along with designs and updates on Phabricator. CBogen (WMF) (talk) 15:34, 18 August 2021 (UTC)

Capturing original captions attached to illustrations

How would one capture the concept of the original caption of an image? For example, consider the file File:The Chaldean Account of Genesis (1876) - illustration - page after 306.png. The original caption from 1876 is Oannes. From Nimroud Sculpture. However, the caption at Commons is more likely to be something like "Illustration from page after 306 of The Chaldean Account of Genesis depicting Oannes."

Also, some images have an "id" in the original work like "Figure 1", which is a little difference from a caption.

What is the correct SDC to use here? Inductiveload (talk) 14:39, 19 August 2021 (UTC)

Add title (P1476) with the caption. The file caption "Illustration ..." is still the file (!) caption, not the subject caption. --XRay 💬 15:03, 19 August 2021 (UTC)

@XRay: thank you for the clarification!

What is the "perfect" subject caption (i.e. entry in the "captions" list in the SDC File Information" tab) in this case?

And is there a specific property for the text like "Figure 1"? Inductiveload (talk) 15:23, 19 August 2021 (UTC)

There is a good documentation: Commons:File captions. For me (as photographer) the photograph has to be described first. The subject or the subjects are part of the description. IMO SDC is the way to describe the photograph and not (only) one of the subjects. There is still a discussion who to model the data: Commons:Structured data/Modeling/Depiction. --XRay 💬 17:14, 19 August 2021 (UTC)

@XRay: then I guess that "Illustration, from page after 306 of The Chaldean Account of Genesis, depicting Oannes." is a tolerable caption in this case? Inductiveload (talk) 17:26, 19 August 2021 (UTC)

How should I judge that? From my point of view, that's fine. --XRay 💬 17:42, 19 August 2021 (UTC)

@XRay: I'm looking for a fairly definitive statement of best practices so I can write documentation for "ideal" SDC for this kind of image. I don't know who to ask or where to ask except here. Inductiveload (talk) 17:57, 19 August 2021 (UTC)

Add structured data from an Excel sheet

Hi, I just finished a script for adding structured data to files on Commons from an Excel sheet, see https://github.com/KBNLwikimedia/SDoC/tree/main/writeSDoCfromExcel

I wrote this because I couldn't find an exisitng bulk script/tool that works with Excel (but I might have overlooked that, any pointers welcome), and OpenRefine does not yet seem to support SDoC.

I was wondering if it is suitable to be included on the SDoC homepage, eg under the heading "Tools to add structured data to files"

Best, --OlafJanssen (talk) 14:00, 17 August 2021 (UTC)

Although i'm not an Excel user, i'm sure this is really useful for people. Thanks!

Husky ^{(talk to me)} 10:47, 26 August 2021 (UTC)

Depictor

Hey everyone, i've made a new tool that lets you easily add depicts statements using a game-like interface. Currently the default is to give you people from a random birth year that have a connected Commons category and media files without a depicts statement. But you're able to customise it using specific QID numbers, categories or you can even write a custom SPARQL query. It contains a leaderboard and keyboard shortcuts and it's designed to be used on mobile, so you can add statements when waiting on the bus or when you've got a couple of minutes to spare.

Try Depictor here.

Husky ^{(talk to me)} 10:50, 26 August 2021 (UTC)

Unknown creators

At Wikidata:Wikidata:WikiProject_Visual_arts/Item_structure#Use_of_creator_(P170)_in_uncertain_cases, it's instructed to use the following construction in the case of "unknown" creators:

creator (P170) → unknown value
- object has role (P3831) → anonymous (Q4233718)

However, there is no option for unknown value when setting creator (P170) at Commons: phab:F34606911. How should this information be captured? This is a very common case when the author of an image is not stated in the source material (for example the illustration could either be by the author, or produced by an anonymous contract artist for the publisher). Inductiveload (talk) 18:04, 19 August 2021 (UTC)

According to phab:T239172 it sounds like one uses some value for this case? Presumably object has role (P3831) → anonymous (Q4233718) still applies as a qualifier? Inductiveload (talk) 18:22, 19 August 2021 (UTC)

@Inductiveload: the concept is "somevalue" which has "unknown value" as label on Wikidata and "some value" as label here. The "somevalue" means that we don't have an item for the concept or we don't know what the item is for the concept. It's used a lot on Commons see this example. Multichill (talk) 17:21, 17 September 2021 (UTC)

Extracting structured data from a file's Artwork template and re-uploading as WD statements (using OpenRefine?)

Hello, I work at Auckland Museum, and we're running a number of ISA campaigns at the moment. I'm trying to extract the results of that work, partly in order to round-trip the data back to our system.

I worked up this SPARQL query with some help from people over at Request a query:

SPARQL query for all files in '1930s photographs in Auckland Museum' category

SELECT ?pageid ?file ?title
(GROUP_CONCAT(DISTINCT ?mi_label;separator="; ") as ?miLabel)
(GROUP_CONCAT(DISTINCT ?en_label;separator="; ") as ?enLabel)

WITH {
  SELECT * WHERE {
    SERVICE wikibase:mwapi
    {
      bd:serviceParam wikibase:api "Generator" .
      bd:serviceParam wikibase:endpoint "commons.wikimedia.org" .
      bd:serviceParam mwapi:gcmtitle "Category:1930s photographs in Auckland Museum" .
      bd:serviceParam mwapi:generator "categorymembers" .
      bd:serviceParam mwapi:gcmtype "file" .
      bd:serviceParam mwapi:gcmlimit "max" .
      ?title wikibase:apiOutput mwapi:title .
      ?pageid wikibase:apiOutput "@pageid" .
    }
    BIND (URI(CONCAT('https://commons.wikimedia.org/entity/M', ?pageid)) AS ?file)
  }
} AS %files

WITH {
  SELECT ?file ?title ?depicts WHERE {
    INCLUDE %files .
    ?file wdt:P180 ?depicts .
  }
} AS %depictions

WHERE {
  INCLUDE %depictions .
  INCLUDE %files .
  SERVICE <https://query.wikidata.org/sparql> {
      OPTIONAL {?depicts rdfs:label ?mi_label . filter(lang(?mi_label) = "mi")}
      OPTIONAL {?depicts rdfs:label ?en_label . filter(lang(?en_label) = "en")}
  }
}

GROUP BY ?pageid ?file ?title

Try it!

It obtains all the wdt:P180 labels for files tagged using the ISA Tool for campaign #150. I've loaded the results into OpenRefine, where I hope to be able to extract the contents of each page's {{Artwork}} template within its filedesc, using the parse API (action=parse) with prop=wikitext.

Using Procession on rural road led by musicians in kilts (AM 81493-1).jpg (ID 65558537; API call) as an example:

Wikitext from API call

{"*":"== {{int:filedesc}} ==\n{{Artwork\n| description = {{en|1=Bridal party is entering through a small door. The bride is led by her father (possibly). There are women gathered who are not bridesmaids.}}\n| title = (Bridal party entering a church on wedding day)\n| artist = Collins, Tudor Washington, 1898-1970, photographer\n| date = {{other date|circa|1930}}s.\n| place of creation = \n| source = {{Images from Auckland Museum|section=library|object=photography|id=74819}}\n[http://api.aucklandmuseum.com/id/media/p/b17463186a8257c65a2b0c8b5f1ec90d412d3180 Photo]\n| accession number = 74819 (object number)\n| object type = \n| technique = Silver gelatin dry plate\n| dimensions = \n| institution = {{Institution:Auckland War Memorial Museum}}\n| permission = This image has been released as \"CCBY\" by Auckland Museum. For details refer to the [[Commons:Batch_uploading/AucklandMuseumCCBY|Commons project page]].\n| credit line = \n| notes = \n| other_versions = \n<gallery>\nBridal party entering a church on wedding day (AM 74819-2).jpg\n</gallery>\n}}\n\n== {{int:license-header}} ==\n{{CC-BY-4.0|1=Auckland Museum}}\n\n[[Category:Images uploaded by Fæ]]\n[[Category:Wedding photographs in Auckland Museum]]\n[[Category:1930s photographs in Auckland Museum]]\n[[Category:Tudor Washington Collins]]\n[[Category:Images from Auckland Museum]]"}

This Wikitext contains within its Artwork template another template, {{Images from Auckland Museum}}, which contains some data I need: section (library), object (photography), and id (81493). (The first two of these are misnamed: "section" ought to be "department", and "object" ought to be called "collecting area". I think if I were to change these on the template it would mess things up though.)

Elsewhere in the Artwork template are various other data points such as the production/publication date, a description, and the accession number (which in this case, for internal administrative reasons at the Museum, is the same as the ID, and, to digress, is unique only within its object dataset—for example, there is an iD 81493 in our manuscripts collection as well).

My questions are:

Does anyone have experience parsing Wikitext such as this in OpenRefine? Do I just need to use regex?
Can this data be added back as structured data? For example, could the description be added as the caption?
We have an Auckland Museum person ID (P7298), but not one for objects in our collection yet. ~~Should I propose that one be created, so that the data from the Artwork template, such as the creation date, department, collecting area, and accession number, can be added?~~ (I've started a discussion about the property).
There is data in our system that is not in the Artwork template, such as the Reference Number.
There are bound to be discrepancies between what is in Commons and what is in our system, for example modified or newly added descriptions, or works for which we have identified creators since the upload to Commons.

Finally, forgive my ignorance, but since Commons files themselves don't have corresponding Wikidata entities, how is this structured data, which lives in Wikidata, connected to them?

Thank you. — Hugh (talk) 00:40, 14 September 2021 (UTC)

Tracked in Phabricator
Task T289971

Kia ora @HughLilly: very nice to meet you! 😄 Thank you so much for this detailed and well-informed and insightful question!

I'm Sandra and I work with OpenRefine on development of Structured Data on Commons support in the tool (funded by a Wikimedia Foundation grant). Basically, what you would like to do is indeed convoluted today, but will be possible around January-February 2022. We are actively working on brand new functionality that makes it possible to retrieve Wikitext from a set of Commons files, massage that Wikitext, and translate it to structured data. It is not yet possible to edit the structured data of Wikimedia Commons files with OpenRefine, but we will make that possible as well.

Unfortunately this means that (to my knowledge) today, you indeed still need to work in the convoluted way you describe (via API and regex and data wrangling)... maybe others have tried it before and can give you some more detailed tips.

And re: Finally, forgive my ignorance, but since Commons files themselves don't have corresponding Wikidata entities, how is this structured data, which lives in Wikidata, connected to them? - that is a very sensible and good question. It basically works through federation: every file on Wikimedia Commons has its own entity (M-id) which points to Wikidata entities through statements stored here on Wikimedia Commons. This blog post explains a bit more. And this image is slightly outdated but should also show the general principle. (I hope this does answer your question - if not, feel free to ask further!)

Again, very nice to meet you. Don't hesitate to ping and/or contact me with any further questions. SFauconnier (talk) 08:19, 15 September 2021 (UTC)

Tēnā koe, @SFauconnier: ! Nice to meet you too. Thanks for the great reply here, and good to follow you on Twitter now too.

I had seen the proposal (and job ads) about adding SDC to OpenRefine, and thought this would, when it's released, be the only way to do what I want. I would love to get involved in any way I can—is there a way I could test early versions of OR, for example?

Thanks for the link to the blog post and image; that does clear it up nicely. There are 131 of our collection items in WD; I hadn't realised so many of them already have WD entities, and some aren't connected to images, which I am sure exist in Commons. Perhaps there is some work I can do there too, in the meantime. There's definitely structured data for them that exists only in the Artwork template; maybe I'll move it over.

I like the example of File:Gottfried Lindauer (1839-1926) - Terewai Horomona (b.1866) - RCIN 406702 - Royal Collection.jpg, where the WMC page simply has {{Artwork}}, and all the data is controlled in WD.

(As an aside, I have opened a feature request on the openrefine-wikibase reconciliation service repo regarding filtering claims by language.)

Thanks again, and great to know work is being done to connect the dots. I'm looking forward to your talk (which thankfully is at a reasonable hour for me—7pm!).— Hugh (talk) 06:03, 16 September 2021 (UTC)

Hello @HughLilly: Thanks for replying! It would actually be really great if we can ask you to help us with feedback and testing every now and then. I have just created a signup page for folks who want to be notified about that - feel free to subscribe there.

I also like the example you are providing with the very simple {{Artwork}} template. We are definitely considering to encourage use of such simple templates in the new OpenRefine workflow. FWIW, there's a (very drafty) document in which we are already collecting some thoughts and examples. If you like, feel free to take a look and add feedback!

All the best! SFauconnier (talk) 08:28, 21 September 2021 (UTC)

Thanks, @SFauconnier; I've signed up. I watched your talk (on replay) and found it useful—although I did already know quite a lot about OpenRefine. The document is very interesting and I will set some time aside later in the week; I'm sure I will have some suggestions. Yes, the bare Artwork template with all data in WD/SDC certainly looks the gold standard to aim for. We at the museum are very excited by all the work in this area! We've just discovered the locator-tool (example file) and it seems promising, though not as user-friendly for entry-level users, perhaps, as ISA. —Hugh (talk) 01:19, 22 September 2021 (UTC)

p.s. do you know anything about the Gadget-TabularImportExport.js tool? I've activated it but it doesn't appear in my sidebar. I may try to contact the author as it seems like it would be useful. —Hugh (talk) 01:20, 22 September 2021 (UTC)

MediaSearch functions (All pages Vs. specific pages)

One thing to note about the MediaSearch search engine is that it appears very image-centric, for example a random search will immediately showcase images first, which is generally fine, but it might be wise to create an "All pages" category, and then change the default search for registered users to "All pages". While I really like the MediaSearch search engine I think that it could be improved in a number of ways.

A couple of days ago or perhaps last week I came across the old "Wikimedia Discovery" search engine concept (or the "Knowledge Engine"), which was unfortunately scrapped over a transparency scandal. But its GUI seems to be a bit more user friendly than what we have now. We could try to implement more features that help users "discover" more content like having special tabs for "Nearby" (which I believe is already an option or button somewhere), having a page for ~~nee~~ new Quality Images (QI's) and Valued Images (VI's), as well as having "personalisable tabs" in the future such as the ability to add "Books", "Newspapers", and other specific media (these would be accessible in settings for registered users and cookies for unregistered users). --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:23, 4 October 2021 (UTC)

Regarding the finding of "books" and "newspapers" and other categories, Wikidata has the option "this is an instance of X", users can simply define a PDF file as being a "book" and it would then be searchable. The current PDF search simply includes all files with that file type and doesn't discriminate based on anything else. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 21:27, 4 October 2021 (UTC)

AbuseFilter and SDC

As far as I can see AbuseFiter does not support SDC in any way. On the other hand, from my experience I can say that we have a huge problem with IP editors – it is hard to find a constructive SDC related edit made by an IP. Usually they remove valid data or add nonsensical data (too general at best). Do you know if ABF is going to somehow support SDC? Perhaps SDC team is going to take different measures, if any, to solve the problem of IP vandalism which seems to consume a lot of volunteer's work? --jdx ^Re: 05:08, 11 November 2021 (UTC)

Special:AbuseFilter/216 seems to be on SDC? Jean-Fred (talk) 12:27, 11 November 2021 (UTC)

Yes, but it is just workaround which relies on appropriately formatted summary. Will not work if a vandal uses e.g undo or popups. Or a custom tool. --jdx ^Re: 12:57, 11 November 2021 (UTC)

AbuseFilter was originally designed to work with "single-slot page" wikitext (which is e.g. parseable using regular expressions), but SDC are stored as JSON on a "multi-slot page". There hacks which make it possible to parse SDC within AbuseFilters, but it looks very dirty. --Matěj Suchánek (talk) 11:19, 14 November 2021 (UTC)

Statistics

Do we have any statistics regarding SD? Do we have any tool to browse SD statistics?--Juandev (talk) 21:43, 14 November 2021 (UTC)

Image description

The use of this template appears to prevent adding in descriptions from Geograph which are not picked up when the file is transferred. Compare this upload with the original and you will see that there are three lines of text description below the image title which are missing and it is not clear how they can be added manually. Lamberhurst (talk) 12:38, 16 November 2021 (UTC)

@Lamberhurst: that's intentional. For some images this contains whole essays not relevant to Commons. Also not sure how relevant it is in this case. You can always update the caption to improve it. Multichill (talk) 14:30, 24 December 2021 (UTC)

Property for storing the software used for creating this file

So far, I've uploaded three XCF files (link link link) that I created using GIMP. My question is: Is there any property for stating that those XCF files were created using GIMP?

A user that creates a SVG/PNG/JPG file using Inkscape, Krita or other graphics software might have the same question.

Rdrg109 (talk) 22:36, 16 December 2021 (UTC)

CC-0 conflicts with PD statement

Hi, Thousands of images from the Cleveland Museum of Art were added (great!) by bots and people (e.g. File:Alfred Stieglitz - Spring Showers, New York - Cleveland Museum of Art.jpg). However there are a number of issues:

A new wikidata item was created for each of the work of art creating many duplicates in WD (bad);
The images were added as image (P18), but the WD Q number was not added in the Artwork template (bad);
These files are usually not linked to the creator, either with a Creator template or a category (bad);
These images were release as CC-0 by the museum, so a CC-0 license was added to each file (good, specially for 3D works). For 2D works, it would have been better to also add a PD template corresponding to the status of the work. A CC-0 statement was added in SDC, but this conflicts with a PD statement (bad).

@Multichill and Madreiling: What's the solution for these conflicting statements? Can this be fixed by a bot? Thanks, Yann (talk) 10:14, 13 November 2021 (UTC)

I think the problem here is not so much conflicting statements, as differing targets of the statements. SDC is, in practice, usually conflating the media file and the depicted work for images from museum collections. In this case, the museum is trying to say that the digital representation is released under CC0 by the institution. This causes property constraint violations, because the Wikidata property (correctly, from a legal perspective) thinks you can't use a copyright license on something that is simultaneously described as in the public domain. Dominic (talk) 00:46, 7 December 2021 (UTC)

@Dominic: Basically, the museum statement is useless, as they don't own the copyright. IMO, it would be better to show the real copyright status of these files (PD because of law) rather that the museum statement. Regards, Yann (talk) 22:35, 24 December 2021 (UTC)

@Yann: we encourage GLAMs to release all their images under CC0. For 3D works because otherwise we can't use it, for 2D works as a fallback to remove any doubt that the image is in fact free.

When you running into a 2D work which is in the public domain and it has only CC0, you can wrap it in {{Licensed-PD-Art}} (example). This way you can add the appropriate PD-art tag and you preserve the original license as a fallback. We've been doing it like this in wikitext for years.

SDC generally follows established practice, just tries to model it as data. So don't remove the existing copyright status (P6216) -> copyrighted, dedicated to the public domain by copyright holder (Q88088423) & copyright license (P275) -> Creative Commons CC0 License (Q6938433) statements, but add an additional copyright status (P6216) -> public domain (Q19652) statement as "Prominent" (preferred rank). Just like in the wikitext, we override the copyright status. We do have to add some determination method (P459) qualifiers. I have to search what to put in these again. Multichill (talk) 12:40, 28 December 2021 (UTC)

@Multichill: OK thanks. That's clear. However it doesn't work. We can't have 2 copyright status (P6216) for one file. Error message: This property should only have a single value with the same set of qualifiers for these properties. Yann (talk) 12:52, 28 December 2021 (UTC)

@Yann: yeah, that was the one I was still looking for. Multichill (talk) 13:12, 28 December 2021 (UTC)

Request for Comments: adding description metadata to map images

The case:

We have in Commons a set of Spain's National Topographic Map (Q5995033), 7847 ones exactly,
In WD we have 6980 municipality of Spain (Q2074737) elements.
thanks to the Geographic Nomenclature of Municipalities and Population Entities (Q95877977) dataset we can map Spanish municipalities with their related map
after some discussion we concluded it has no sense to create new elements for the maps in Wikidata and instead do the interlinking in SDC.
so I've made a proof of concept project with OpenRefine:
- QS batch: 71573
- example: File:MTN25-0970c2-2009-El Hacho.jpg
- added metadata for the image
relevant properties used:

The question:

How do you see it? Is this a correct practice for SDC?

I could make the upload right now (in practice applied to near 3000 files, the most recent releases of each map zone) but I would want feedback as it is my first SDC project.

Thanks. —Ismael Olea (talk) 13:30, 24 December 2021 (UTC)

@Olea: thanks for bringing this up here. This is very useful to finally get Commons:Structured data/Modeling/Maps filled.

I noticed attribution only license (Q98923445) being used and traced that to here. Can you remove that one while you're at it?
depicts (P180) usage looks correct to me, but digital representation of (P6243) seems incorrect. This is not a "Representación digital fiel de un objeto u obra bidimensional." of a municipality.
You should probably use named place on map (P9664) too.
Not sure about the current instance of (P31) targets. topographic map (Q216526) seems correct, but why add version, edition or translation (Q3331189)? Some constraint tripping up?
Instead of part of (P361) you should probably only use part of the series (P179) as this appears to be a series of maps
You added language of work or name (P407) twice
Not sure about the contents of main subject (P921). Isn't geography (Q1071) & Spain (Q29) a bit broad?
I would add date depicted (P2913) too. Very useful property in the context of maps.
Coordinate properties seem to be missing. Do you have something in the metadata?
Maybe we should also record something about the projection used?

Multichill (talk) 14:14, 24 December 2021 (UTC)

(edition conflict with Multichill) @—Ismael Olea:

I typically use named place on map (P9664) to link places included ("named") in a particular map sheet (towns, mountains, mountain ranges, natural parks, and, even, buildings, rivers and so on (subclasses of geographical feature (Q618123)). But whether in National Topographic Map (Q5995033) map sheets are "mentioned" the localities or the municipalities is a moot question. I typically use in these maps (when populated places and P9664 are involved) the item of the locality rather than the one belonging to the municipality.
Using depicts (P180) in map files is a moot question for me. IMHO when places are "explicitly mentioned", P9964 is a better choice. Maybe, if the approach of using "P9964" with localities is followed, it would be OK using P180 with the municipalities depicted, instead.
I would not use digital representation of (P6243), since IMHO it would necessary linking the item of the particular "map/work" (File:MTN25-0914c3-2012-Benejuzar.jpg is a digital representation (P6243) of "MTN25, hoja 0914, cuadrante 3, year 2012 edition" (these items do not exist so far).
WRT main subject (P921), that property could probably be used to link the item of the map sheet title.

Regards. Strakhov (talk) 14:29, 24 December 2021 (UTC)

It's a great idea to relate MTN25 maps to geographical information from IGN.
My personal wish list is an easy link from municipalities to maps where they are included even if no populated place is located on that map. The reason why is that many times what is been looked for is not a populated place, but something else. For instance, you want to know about "Partida del Alimoche" and you know that it is in Campillo de Llerena. It is dificult to find. But if you knew that Campillo de Llerena municipality is covered by several MTN25 maps, you could start searching for it. One problem is that the map where El Alimoche is placed [6] does not include any populated place (as listed by IGN, I'm sure some people live in the scatered cortijos, but they are nor on the lists). So I wish we had a P on Wikidata saying Campillo de Llerena municipality is covered by all these maps in Commons; an inverted P9664. Or something like that: a few clicks, no Computese writing.
I think that all editions of MTN25 should be included as they are useful historical sources. For instance, comparing the mentioned map with its 1999 edition it can be seen that some access lanes have developed around Córdoba-Badajoz-Portugal Gas Pipeline. In urban and suburban areas evolutions can be far more significant.
B25es (talk) 10:53, 25 December 2021 (UTC)

As mentioned my method only works processing elements with the INE municipality code (P772) property set. Any other approach should be by hand, AFAIK. Sorry. —Ismael Olea (talk) 11:09, 25 December 2021 (UTC)

@Multichill: , @Strakhov: I'll answer both.

* I noticed attribution only license (Q98923445) being used and traced that to here. Can you remove that one while you're at it?

Not sure to understand. If you mean to remove from the template, I don't think I could it with QuickStatements. But if you mean not to add as SDC, yes, I could, but I think it's not the goal, am I right?

* depicts (P180) usage looks correct to me, but digital representation of (P6243) seems incorrect. This is not a "Representación digital fiel de un objeto u obra bidimensional." of a municipality.

Ok.

* You should probably use named place on map (P9664) too.

Ok.

* Not sure about the current instance of (P31) targets. topographic map (Q216526) seems correct, but why add version, edition or translation (Q3331189)? Some constraint tripping up?

I really don't have an strong opinion on neither both. The idea is to describe the original map, which is a printed edition. But really don't know what is the best practice in SDC.

* Instead of part of (P361) you should probably only use part of the series (P179) as this appears to be a series of maps

Ok.

* You added language of work or name (P407) twice

Ok.

* Not sure about the contents of main subject (P921). Isn't geography (Q1071) & Spain (Q29) a bit broad?

I wasn't sure about how broad should be. I feel very confortable removing those. Ok.

* I would add date depicted (P2913) too. Very useful property in the context of maps.

I could use it, but it would be the same date used at publication date (P577). Not sure about keep using point in time (P585). So, what would be the better practice?

* Coordinate properties seem to be missing. Do you have something in the metadata?

If you are thinking in the coordinates of the digitized map, no. But if you mean the localities stated in depicts (P180), yes, I could.

* Maybe we should also record something about the projection used?

I think I have the information but I can't find any WD property related with map projections :-m

* I typically use named place on map (P9664) to link places included ("named") in a particular map sheet (towns, mountains, mountain ranges, natural parks, and, even, buildings, rivers and so on (subclasses of geographical feature (Q618123)). But whether in National Topographic Map (Q5995033) map sheets are "mentioned" the localities or the municipalities is a moot question. I typically use in these maps (when populated places and P9664 are involved) the item of the locality rather than the one belonging to the municipality.

There is no problem linking Spain localities using Geographic Nomenclature of Municipalities and Population Entities (Q95877977) as long each element includes INE municipality code (P772)

For geographic names is more tricky, the Basic Geographical Gazetteer of Spain (Q106767497) maps each entry to their MTN map and uses their own ID's. The problem here is AFAIK there is no way to relate that ID with WD elements.

* [...] if the approach of using "P9964" with localities is followed, it would be OK using P180 with the municipalities depicted, instead.

I really agree.

* I would not use digital representation of (P6243), since IMHO it would necessary linking the item of the particular "map/work" (File:MTN25-0914c3-2012-Benejuzar.jpg is a digital representation (P6243) of "MTN25, hoja 0914, cuadrante 3, year 2012 edition" (these items do not exist so far).

Ok.

* WRT main subject (P921), that property could probably be used to link the item of the map sheet title.

I disagree. The map name seems to be more or less arbitrary. In this precise example the map is the main representation of Alamedilla (Q1631944) municipality but it's named for a really small locality.

I've updated the SDC according my answers —Ismael Olea (talk) 11:07, 25 December 2021 (UTC)

@Olea: I'd say most of "localities in Spain" items (unlike the municipality ones) do not have a INE municipality code (P772) in Wikidata, if that's what you meant by "as long each element includes código INE (P772)". Related to this, there's an unsolved dilemma pending in Wikidata since 2018 ([7]). De hecho ha pasado tanto tiempo que creo que yo mismo he cambiado de opinión desde entonces. Strakhov (talk) 11:20, 25 December 2021 (UTC)

Yep, the coding of P772 is an arid issue I don't pretend to solve soon :-D —Ismael Olea (talk) 11:26, 25 December 2021 (UTC)

Entonces si vas a utilizar información de items con código INE para agregar en masa statements de Structured Data, lo que haría es usar items de número "corto" (5 dígitos) para P180 y quizás dejaría la parte de P9964 (y las "localidades", les correspondan a éstas códigos de 9 u 11 dígitos) para más adelante.

Respecto a main subject (P921) como se ha dicho no agregaría nada (sobre todo teniendo en cuenta que al menos a mí ya se me empiezan a cargar relativamente lento los datos estructurados en ejemplos como el de File:MTN25-0970c2-2009-El Hacho.jpg), pues son datos bastante superfluos/arbitrarios, además de menos necesarios/útiles una vez usados esos items también en P180.

Respecto a las fechas, decir que no sé cuán necesario será usar "fecha representada" una vez usada "fecha de publicación" (un mapa puede publicarse en el año 2000 y representar el siglo XIX y creo que en tal caso la fecha más relevante vendría a ser la representada y no la de publicación), pero si se sigue esa lógica más innecesario me parece repetir la fecha con la propiedad "fecha". Saludos. Strakhov (talk) 11:39, 25 December 2021 (UTC)

@Strakhov

about the excess of data, this is why I'm asking here looking for a reference practice. I don't have any other verbosity interest;
about dates, now I see your point, in this example date depicted (P2913) and publication date ( P577) are the same and I missed it. I understand the relevance of date depicted (P2913);
about using publication date ( P577) is because bibliographic methods (like made from material (P186), for example); I'm not an expert and I don't have a clear criteria;
in any case, I think we can all agree point in time (P585) is unnecessary, right?

—Ismael Olea (talk) 22:00, 25 December 2021 (UTC)

I would probably just leave all three date statements. Depending on the context either can be useful and this way we offer maximum flexibility. Multichill (talk) 18:49, 27 December 2021 (UTC)

Commons talk:Structured data/Archive 2021

What is correct way to describe coordinates from multiple sources?

Adding perceptual hashes to SDC

Questions still unanswered

Updating messaging

Commons Structured Data SQL access?

Converting the GLAM's subjects tags to P180 values?

alt text property proposal

Query structured data and Wikidata

Captions in wrong langcode

How can I add SDC to Categories?

Proposal: Move captions from file information to structured data tab

Is it possible or is it planned to have references for facts in SDC?

Text about Image Annotator

bad request

Geograph restarted as structured data upload

SDC at GLAMHack 2021 on 16th and 17th April + can we request a Query Service dump request on a specific day?

Clothing and costume accessories

Modeling picture taken in a certain municipality

Project Grant application for SDC support in OpenRefine: feedback and endorsements welcome

digital representation of (P6243)

When WCQS will be out of beta?

Media search feature suggestion (Redlinks)

any tools to remove bad SDC statements

The impact of Suggested Edits on Commons: findings and discussion

Two developer positions related to OpenRefine and SDC (paid, freelance, part-time, remote)

New user script for WD linking to WCQS

A picture is worth a thousand words

Wikimania 2021 session

How do I get the WikiData page or identifier for a file that has structured data?

Search structured data

References to Structured Data on Commons

Feedback from Andrew Lih

Feedback from John Cummings

Feedback from PKM

Other priorities first

Feedback from Dominic Byrd-McDevitt

Initial assessment

Update

Capturing original captions attached to illustrations

Add structured data from an Excel sheet

Depictor

Unknown creators

Extracting structured data from a file's Artwork template and re-uploading as WD statements (using OpenRefine?)

MediaSearch functions (All pages Vs. specific pages)

AbuseFilter and SDC

Statistics

Image description

Property for storing the software used for creating this file

CC-0 conflicts with PD statement

Request for Comments: adding description metadata to map images

Navigation menu

Search