Commons talk:Structured data/Archive 2023

This is an archive of past discussions. Do not edit the contents of this page. If you wish to start a new discussion or revive an old one, please do so on the current talk page.

Question about geo-coordinates

Should I add geo-coordinates to the Description part of a file as well as to the Structured data part? Or is one enough and is there an automatic exchange from one to the other? JopkeB (talk) 06:31, 13 January 2023 (UTC)

In theory, structured data eventually picks it up from the wikitext. - Jmabel ! talk 16:11, 13 January 2023 (UTC)
And in practice? Can I only add geo coordinates to the description part and after a while they will be copied to the structured data of the file? --JopkeB (talk) 07:23, 14 January 2023 (UTC)
- @JopkeB: Eventually, but don't hold your breath. I've seen times where it happened years after the fact. - Jmabel ! talk 08:08, 14 January 2023 (UTC)
  Thanks, Jmabel, for your advise. So it might be best to add always both geo-coordinates. JopkeB (talk) 11:47, 14 January 2023 (UTC)

Still a question: What P-code should I use? When I use P625 - coordinate location I get an error message: "The property geografische locatie should not be used on this type of entity, the only valid entity type is Wikibase-item." The only other I see is P1259 - coordinates of the point of view, but that is not what I want, I want to show the coordinates of the object. --JopkeB (talk) 15:24, 14 January 2023 (UTC)

@JopkeB: Then you'd have to put that particular object in a depicts (P180) statement and attach the coordinate location (P625) as a qualifier to the object. Of course, if (for example) it is a building with a Q-item of its own, then the coordinate location (P625) belongs on that item in Wikidata itself. - Jmabel ! talk 18:43, 14 January 2023 (UTC)
Thanks, Jmabel, but unfortunately this gives the same error message. I just found out that SchlurcherBot automatically (probably within hours) exchanges {{Object location dec}} in the description part to coordinates of depicted place (P9149). in the structured data part. So my problem/question has been solved:
- the correct P-code for object location is coordinates of depicted place (P9149)
- yes, there is an automatic exchange from the description tab to the structured part.
JopkeB (talk) 08:02, 15 January 2023 (UTC)
I am adding {{Location}} (not the {{Object location}}) and the bot typically adds the point of view coordinates to the structural data within a few days. The object location indeed gets inferred from the object coordinates in Wikidata.--Ymblanter (talk) 19:13, 14 January 2023 (UTC)
I think this indeed may be the best way, Ymblanter, because Wikishootme only looks to geo data at coordinates of the point of view (P1259), files with only coordinates of depicted place (P9149) are not shown (what I think is odd, but that is not the discussion here). Wikimap does show both geo-locations, but only from the tab Wiki information, not from the tab Structured data. --JopkeB (talk) 12:36, 15 January 2023 (UTC)
Actually, the best way is to show both (in many cases the precise depicted locations such as single houses or trees are not notable and their coordinates can not be imported from Wikidata), but the upload wizard prompts the coordinates of the point of view, and I am too lazy to add additional set of coordinates after the upload. Ymblanter (talk) 12:41, 15 January 2023 (UTC)
Yes, you are right, add both. JopkeB (talk) 04:50, 16 January 2023 (UTC)

This section is resolved and can be archived. If you disagree, replace this template with your comment. JopkeB (talk) 09:54, 15 January 2023 (UTC)

Scanned page of a book: Depicts page(Q1069725) or book(Q571)?

For ex https://commons.wikimedia.org/wiki/File:Voyage_ou_il_vous_plaira_1843_(144926870).jpg I would say it's depicting a page? (if the book was closed then maybe book would be applicable?) Curious what you guys think Thibaultmol (talk) 23:32, 25 July 2022 (UTC)

Made a cross reference to this question on Wikidata:Property talk:P180#Open questions on how to use this property on Commons talk:Structured data. -- Juergen 217.61.195.60 23:02, 22 October 2022 (UTC)

Neither of those should be used for Depicts. Book or page is the medium being used. Consider a file that was taken with a camera. We don't use Depicts→Photo. We use Instance of→digital photograph. -Senator2029 06:12, 4 January 2023 (UTC)

Proposal to remove duplicate information from file description templates that is already stored in structured data

For several years now bots operated by Multichill (talk · contribs) and myself have been focusing on replicating information from file descriptions in structured data. Since then adoption and template support for structured data has significantly improved. The next step would be to remove the duplicate information from file description and use the information in structured data instead. If the information is only removed from template parameters that are automatically recoved by the corresponding template from associated structured data, there will be no visual change to the file description page. Please share your thoughts on a corresponding proposal at: Commons:Bots/Requests/SchlurcherBot11 --Schlurcher (talk) 22:17, 1 November 2022 (UTC)

I am in principle in favor of storing information on one place and show it on other places, and get rid of duplicate information. My question: can someone who is not familiar with structured data still change (all the) information in the file description page, also when that information is stored in structured data? JopkeB (talk) 06:52, 13 January 2023 (UTC)

Interwikilinking, descriptions and pinging

Hello, someone hanging around to reach out? Since a couple of days, I have problems with all the above. Did something change? To give an ʽFile:Histoire du tissu ancien à l'exposition de l'Union centrale des arts décoratifs (1883) (14597541669).jpg|example] to show how it shows when I wish to add hard returns or wikilink to the file. Also, I cannot ping anymoreː ̺ping|Vystoskyˌ

thank you so much for your time. Lotje (talk) 09:50, 9 January 2023 (UTC)

Didn't you accidently activate "Input tools". On the top of the page, try clicking the language name you use, then Input settings and Disable input tools. --Matěj Suchánek (talk) 07:30, 10 January 2023 (UTC)

Thank you so much Matěj Suchánek, indeed, going to the top of the page I was able to de activate the input tools. I think it is okay now. Cheers. Lotje (talk) 09:11, 10 January 2023 (UTC)

Creators with Wikidata items

See Commons:Bots/Work requests#Creators with Wikidata items --Nintendofan885^{T&Cs apply} 13:06, 6 December 2022 (UTC)

Now archived, but still needs doing. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:33, 27 January 2023 (UTC)

located in the administrative territorial entity

When adding this property to an image taken in the US should the value be the county the photo was taken in or should it be the state it was taken in? Trade (talk) 15:03, 27 January 2023 (UTC)

@Trade: The most local administrative entity possible. County (or, in Louisiana, parish) always wins over state. Often a city, census-designated place, etc. will be even more specific. In the weird case of New York City, where five boroughs/counties are within the city, the borough is preferred; in other weird cases like Bothell, Washington, where a city spans two counties, this would ideally be double-valued, indicating both city and county. But if none of that is known, state beats having nothing. - Jmabel ! talk 16:09, 27 January 2023 (UTC)

@Trade: according to Commons:Structured data/Modeling/Location, it's not clear that P131 is intended to be used like that at all. Strictly speaking all files are located in the Wikimedia servers. Strakhov (talk) 19:10, 27 January 2023 (UTC)

I assume there would have been constraints if that wasn't intended Trade (talk) 19:45, 27 January 2023 (UTC)

@Strakhov: it's not about the location of the file, it's about the location represented in the image. - Jmabel ! talk 01:03, 28 January 2023 (UTC)

located in the administrative territorial entity (P131) should only be used for the lowest administrative level with a self-administration (NUTS 4/5 or LAU 1/2). In most cases this will be the municipality. location (P276) can be used for more specific location without any kind of autonomy like neighborhoods. GPSLeo (talk) 07:06, 28 January 2023 (UTC)

@Jmabel: . Of course I'm aware you are trying to "structure" the location of places depicted in photos. I'm just pointing out how current guidelines suggest P1071, P7108 and P180 should be used for this. You are certainly free to not follow them. Apart from that, I refer to Multichill's comment here. Strakhov (talk) 08:09, 28 January 2023 (UTC)

For UK images bulk-imported from Geograph, I think Multichill has been using location of creation (P1071) to indicate the administrative entity where an image has been taken; as distinct from location of the point of view (P7108) if we can identify a specific structure the pic has been taken from (eg bridge, viaduct, etc).

I know Multichill has also been adding the administrative entity as a depicts (P180) value. I am uncertain as to whether that is so appropriate, given that (even for an image say of a valley, never mind an image just of a brick) the image will be at most depicting only such a small fraction of the administative entity as a whole. But that can perhaps be an open topic for discussion. Jheald (talk) 13:24, 28 January 2023 (UTC)

I think that's a good approach (location of creation (P1071) as a property to indicate places (mostly "administrative", and the most precise) where the picture was taken, and location of the point of view (P7108) for buildings, mountains, bridges (or administrative territorial entities when used in art works depicting places, too)). On the contrary, IMHO depicts (P180) should only be used when the depicted item ...is depicted in a significant manner. Not a infinitesimal fraction of it (wrt to municipalities, valleys may be OK with P180 (or it may not), but bricks, portraits of people, or doors certainly would not be OK). Strakhov (talk) 13:42, 28 January 2023 (UTC)

@Trade: these bulk edits are just plain incorrect. All images in this search shoud, as Jheald pointed out, use location of creation (P1071). Maybe use a bot to fix these? I fixed the constraint.

@Jheald: someone (forgot who) was very insistent on me adding depicts (P180) with the (broad) location. Of course you can always replace it with the more specific item like pushing it down the category tree. For example if this file would have depicts (P180) -> Haarlem (Q9920) you can replace it with depicts (P180) -> Grote Kerk (Q1545193) because Grote Kerk (Q1545193) is linked to Haarlem (Q9920) through located in the administrative territorial entity (P131). We don't want "COM:OVERDEPICTS" :-) (free after COM:OVERCAT). Multichill (talk) 13:48, 28 January 2023 (UTC)

@Multichill: someone (forgot who) was very insistent on me adding depicts (P180) with the (broad) location: then they were insistently wrong. - Jmabel ! talk 17:08, 28 January 2023 (UTC)

I disagree with that classification. It might be very imprecise, but it's not wrong otherwise I wouldn't have added it to several million files. Found it at Commons talk:Geograph Britain and Ireland/Reverse geocoding#Testing_in_Devon.

File:The bridge at Poolewe - geograph.org.uk - 4342059.jpg is now the most recent upload. Doesn't that depict Gairloch (Q68815558)? Just like Category:Gairloch (civil parish) can be replaced with a more specific category, Gairloch (Q68815558) can be replaced with a more specific item. Multichill (talk) 17:20, 28 January 2023 (UTC)

@Multichill: On the other hand, would we put depicts = human on an image of a fingernail ?

For me I think there are a couple of things this raises: (a) we need to be able to distinguish when what is being depicted is all/most of the depicts (P180) value (so the P180 has a 'good' value) from when what is being depicted is only a fraction of the depicts (P180) value (so the P180 value could be improved). For this second case it may be useful to include a P180 value as a placeholder to indicate a slot for description there that is potentially capable of improvement (eg Gairloch (civil parish) -> Gairloch -> some part of Gairloch -> ...); but if we do this it seems to me important at least to qualify it with something like applies to part = somevalue depicted part (P5961) = somevalue, to distinguish that usage from the case of the statement value being a 'good' P180 value.

(b) there is the question of whether this use is redundant and/or inappropriate if the essentially same rather approximate location information is already being carried elsewhere by a location of creation (P1071) statement. Is flagging the existence of a slot to be filled a good enough reason to duplicate it with a P180 statement ?

Also perhaps (c) Is the P180 statement necessary if we want the image to be included in the returns of an eg "Show me all images in Yorkshire" request ? Or would the presence of a 'somewhere in Yorkshire' P1071 statement be enough to capture searches for the keyword "Yorkshire" and/or get it to register in sensible ways to query for it ?

I don't have decisive answers, just that these seem to be some things to think about for discussion. Jheald (talk) 18:04, 28 January 2023 (UTC)

It is useful for cases like "I would like to geotag all of the photos in Yorkshire which doesn't have geotags". Of course it can be done also using Petscan and categories, but if we think that we should replace categories with SDC you cannot easily do this. -- Zache (talk) 18:28, 28 January 2023 (UTC)

FWIW I don't think we should replace categories with SDC. I think each complement the other, and each can assist in populating the other, and both have a role to play (and should continue to be enhanced and invested in), both now and long-term. The question here though is: does the P180 tag help us to do something the P1071 doesn't ? Jheald (talk) 18:51, 28 January 2023 (UTC)

Still, what are we doing if a photographer was located in one administrative unit and the object is located in another one? I have for example nice pictures of Olympic Peninsula (Washington, US) taken from Canada.--Ymblanter (talk) 13:18, 29 January 2023 (UTC)

I guess that would be P7108 for Canada Trade (talk) 18:08, 29 January 2023 (UTC)

Thanks. Ymblanter (talk) 00:48, 30 January 2023 (UTC)

Using categories as a proxy for P180 depicts values

I have been testing the idea that it would be possible to derive depicts (P180) values for files from categories using Lua. The basic idea is that template would be added to categories, making the derived values visible on the category page. There would also be an invisible link to store the values so they can be accessed programmatically and exported from an external links database table. It would also mean that the template would be added to 100k - 500k categories as minimium (1M is max)

Using standard templates and Lua modules allows users to report incorrect values to the template's talk page or fix problems by creating and updating wikidata items for categories. This would allow the system to scale, as most improvements come from updating the Lua module code, which would affect multiple categories simultaneously. Corner cases could be fixed by adding wikidata items.

As result suggested values could be used as expected values for machine vision. So, the machine vision problem would be: "Is there a thing defined by the suggested value in the photo?" Exact options are much less error-prone to false positives than trying blindly to figure out with machine vision what is in the photo.

This makes it possible to scale the adding SDC P180 values to tens of millions of images.

Code

Examples

In any case, what do you think about the idea? Is it good/bad/dead-end etc... Personally I think that, populationg P180 values to the photos must be mainly automatic because of the vast number of images so the question is how to do it. --Zache (talk) 12:14, 28 January 2023 (UTC)

Only one of the six images in Category:Salt Lake City 1999 Tornado depicts the tornado. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:20, 28 January 2023 (UTC)

That is the reason why the final values for photos comes in conjunction with other methods such as machine vision. I.e. multiple separate methods to get usable value. -- Zache (talk) 12:24, 28 January 2023 (UTC)

Something I have wondered for a while is whether we could helpfully use a "possible value, but confirmation required" rank of truthiness, alongside the existing "preferred", "normal", and "deprecated" -- to indicate both potential values inferred from categories, and potential values inferred from machine vision, both of which probably need manual or other additional confirmation.

A "poor man's way" to do this given that we don't have such a rank might be to add such values with rank = "deprecated", with reason for deprecated rank (P2241) = "inferred value, confirmation needed". Then values would accessible by tools or queries that specifically looked for them, but not regular tools or queries otherwise.

Eg it's quite common to find "view from" images in categories (--> location of the point of view (P7108)) in addition to "view of" images (--> depicts (P180)).

I do think a way to indicate on-wiki potential values that need confirmation is something we strongly need (and ideally something that ideally should have been in place before so much computer vision tagging was rolled out). Jheald (talk) 13:34, 28 January 2023 (UTC)

Yeah, though I was thinking with this that we would have something where we can do mass changes to suggested values just by editing Lua module code or changing Wikidata values or updating categories. Not so that we need to edit files one by one and it leaves edit to their edit history. After confirmation (either by human or another method) it would be something which become something which would be stored in version history. -- Zache (talk) 13:59, 28 January 2023 (UTC)

@Zache: Not so that we need to edit files one by one and it leaves edit to their edit history -- Bots. That's what bots are for, and why we have them. Also IMO yes, we very much do want to leave an edit in their edit history. Any statement on any page should have its footprint in the audit trail, so we can always ask when the statement was added and by who. Secondly, to be useful these statements need to be in the SPARQL copy of the data. And statements only get changed or added there when an edit is made. Jheald (talk) 16:43, 28 January 2023 (UTC)

One other thing to think about is that we similarly would also need bots to be ready to remove such statements if the categorisation of an image was later changed (ie due to a changed identification of what was being depicted). Bots to add statements are quite often made. Bots to remove them (or, at least, flag the statements as needing a re-check) maybe a bit less so. But such bots are also needed. Jheald (talk) 16:58, 28 January 2023 (UTC)

@Jheald about first comment. In this case the information is comparable to other modules. The changes are done to separate entities (ie. to Wikidata items, to categories, to Lua module which is rendering the information) and not directly to the page where information is visible.

About second, sure, I think that there should be documentation at the point when the statement is saved to the photo what is source and determination method. In this case it could be: Module:P180fromCategory with value https://commons.m.wikimedia.org/wiki/Category:Turku_Castle#P180=Q136893 AND Salesforce LAVIS Image Classification "Turku Castle" with 87% confidence using CLIP model and ImageNet dataset.

About SPARQL. There is multiple things which would be nice to get inside SPARQL endpoint (for example whole cirrus search index at start or categories. However, development is rather slow and personally I have started to think that if something is needed then one should do things which can load to own sparql endpoint and one can do federated queries. As long the Blazegraph transition is has not been solved Wikimedias SPARQL endpoint development is dead in the water.-- Zache (talk) 18:05, 28 January 2023 (UTC)

@Zache: My point was, that for something to be available from within SPARQL, the most straightforward way to get it there is with an actual SDC statement. And I think having things available from SPARQL is important, because for most people this is the way to query SDC that is most available, most accessible, most documented, and most similar to querying wikidata that they may already be familiar with. If the information is there as an actual SDC statement, it means that it is there where people will look for it, where people will see it, where people will query for it.

Yes, in principle it would be nice if more information was accessible from within a SPARQL query in a more virtual way. In fact category membership is (or has been) available, both as a SERVICE and as a separate SPARQL database that can be federated (though I'm not sure if that is still up at the moment) -- though so far only for information about category membership of categories, not individual images.

I don't actually see Blazegraph's EOL issues as such a deal-breaker here. Blazegraph has the capability to do most of what we want (and does it), both for WCQS here and with WDQS for wikidata, so I think actually we are very little limited if there is no longer any Blazegraph dev team, because we don't much seem to need one. Category information for files could be implemented either as a SERVICE or as an additional federate-able database without needing any code patches from Blazegraph. The issue is more that the Search engineering team (WMF) doesn't have any spare resources to do it (and doesn't have Commons as a priority), while the Wikidata team (WMDE) sees Commons as out of its scope (?).

Finally, as for determination method, Mathematics Genealogy Project ID (P549) is available and IMO absolutely should be included in the reference for a bot-added edit. IMO it's crazy that the CV-assisted edits weren't (and aren't ?) being documented in this way. Mathematics Genealogy Project ID (P549) = "inferred from category" would IMO be 100% legitimate to add in the referencing, ideally perhaps with inferred from (P3452) = <category item> -- except that sadly most categories don't (and won't) have items; but a Wikipage-URL valued referencing property could be made instead. Jheald (talk) 19:21, 28 January 2023 (UTC)

@Jheald The main problem is that editing everything using bots doesn't scale. Bots cannot add new values fast enough, but it doesn't scale regarding human labor effort either, as running such bots will need a substantial amount of time and effort. For example, the most active SDC edit bot SchlurcherBot does 2M edits per month. For comparison, Commons will get new files of 1M per month, and there are currently 90M files. The simplest case for adding values is media type (P1163) where the value comes directly from the database, with only one possible well-defined value. However, in current editing speed it will take five years to add P1163 values to the remaining 67M files, even if SchlurcherBot would not do anything else. However P1163 is also something the SDC could read directly from the mediawiki database without needing a bot as a middleman if there would be any developer resources to implement it. In my proposal, I proposed something which is like that. It could be done in a centralized inside Wikimedia Commons, and it would be useful in classifying tens of millions of files in limited timeframe. It would also allow to refine the results without needing to edit a massive number of files one by one again when something is changed. So, the idea is not to get end values which can be used directly in SDC but do a intermediate step for classifying the photos. Zache (talk) 15:56, 30 January 2023 (UTC)

These are interesting thoughts (and thanks @Pigsonthewing: for pointing me towards this discussion). I have four points to contribute:

Linking files to Wikidata items via categories was a big reason for the multi-million-edit work that I did to match Commons categories with Wikidata items (and this is still ongoing, please help!). There have even been proposals that SDC could replace Commons categories in the future, although that still feels like it's a long time away - but perhaps this work helps us work towards that.
@PMG: has already been contributing hundreds of thousands of depicts (P180) values via semi-automated editing. I'm in awe of their efforts here, and I think they could provide a lot of valuable input into this discussion.
Bots edit at the rate they are allowed to do so (both due to community and server constraints). There are ways of speeding them up. If we wanted to copy all category uses (where they are matched against Wikidata items) to SDC depicts values, we could probably do that within a month. The question is whether that *should* happen.
The bottom line is accuracy: if we're going to make changes by bot, Lua, gadgets, or otherwise, they need to be 99.9% accurate. How do we ensure that in this case?

Thanks. Mike Peel (talk) 21:18, 3 February 2023 (UTC)

This is answer for: The bottom line is accuracy: if we're going to make changes by bot, Lua, gadgets, or otherwise, they need to be 99.9% accurate. How do we ensure that in this case? -- If the use case is for generating likely P180 property values for categories using Lua (example: Category:Salt Lake City 1999 Tornado) It doesn't need to be 99.9% accurate if we can respond to errors by fixing the Lua module code in a timely manner. Errors can also be fixed by creating a Wikidata item for the category, which would give exact values. -- about mass creating actual Wikidata items or adding P180 values directly to files. I don't think that we have a single method for 99.9% accuracy, but we can minimize errors by combining techniques. For that, we can use machine vision to confirm what is in the picture, GPT style parsing for category names or descriptions, trying to ensure the category's topic by running machine vision against multiple images in the category, and finding common denominators. However, the tasks are too complex to be done in one step so we need to split them a smaller tasks and solve them one by one. -- Zache (talk) 16:36, 4 February 2023 (UTC)

@Zache: From the talk of "fixing the Lua module code" in that answer, I infer that you are still thinking about what we might call 'virtual' statements -- statements that are not 'real' statements, in the sense that they could not be queried or searched, would not be in the actual wikibase, would not be accessible from SPARQL; but would be able to be displayed in some form, some how.

My sticking point is still this: I'm still at a loss to know, what would be the use of such things? Jheald (talk) 19:14, 4 February 2023 (UTC)

Yes, virtual statements. Access to the data would be using SQL via toolforge database using external links table. It would allow us to access the data in mass and near real-time. External links can also be accessed through MediaWiki API, which Pywikibot and bots can use. It would be possible to access external links using the Mwapi service in SPARQL, but because of mawapi performance limitations, the usecases would be very narrow. ( If the idea works, the logical next steps would be to use a separate database table instead of external links table and add the ability to create dynamic SDC values using Lua code. However, this would be outside of this proposal's scope. This proposal focuses on using the structures we already have as it is something that we can realistically implement.) The resulted data itself can be used to narrow down false positives from other tools. For example, if one wants to create a tool that would like to detect P180 values using machine vision, then one can change the query, "what is in the photo?" to the question, "Is there an object $P180 in the photo?" for example. However, we do not know how far we can get on on automation before we have the data and we have tried it. My best guess in this is that it will reduce significantly the false positive rate from automated machine learning tools if we can combine multiple methods. -- Zache (talk) 09:32, 5 February 2023 (UTC)

Info from PMG

Hello. I am user that is marking photos. I am mostly using Depictor and Wikicrowd, and when category is bigger than 20 images I am speeding up stuff using AC/DC gadget (for less than 20 images I spotted that its faster to mark using Depictor). I am thinking that for this discussion also owners of this tools (@Husky: and Adam Shorland) can be very helpful. Adam Shorland published statistics that shows that percentage of ratio between "images in category" to "images that contain stuff that is mentioned in name of category" what can vary from 95% (is this a foto of firework?) to 3% (is this a foto of covered bridge?).

With "lets use bots" idea I am fine, but there is one "but". But I am Software Tester in real life and as every good tester I want to point some potential problems. Most of them are connected to what exactly people are putting in categories that can result in poor descriptions (en:Garbage in, garbage out). When I am making my edits I intentionally avoiding such areas, because they are very problematic and results in many images not marked even if they are in correct category.

artists. Categories of artists that are making somethin physical (painters, writers, architects, pipe organ makers) are terrible. Usually its like this: there is one photo of specific artist and for example 100 pictures of some of his/her art. Example: Category:Joseph Callinet, Category:Edmond Alexandre Roethinger, Category:Bolesław Biegas.
companies - this is very difficult topic, because I am not sure how to mark correctly photo of company. Is main building of company fine? Is main product of company (image of can for CocaCola?). What you will put as "is this a photo of Ford Company"? Example: Category:Manufacture d'Orgues Thomas
Mountains/national parks. This is always difficult for me. Should I mark such file as Category:Bory Tucholskie National Park?
events Category:Procession de la Sainte Coiffe. Is photo of one person also photo of whole procession/event?
People like Obama. On Wikicrowd only 28.29% images were marked "Yes, on this image there is Obama". There is many images that was made on events with Obama, but they dont show Obama.
Monuments - there are many times situations that people are mixing "this is a foto of this monument" and "this is a photo of person/event that was close do monument. Example: Category:United States Navy Memorial and this photo.
Concerts - many times there is category of some band making concert and photo shows people listening to this band (Category:IRA (Polish heavy metal band) in 2015 and File:Band-IRA fans 0367.JPG as example).
Bands - many times there is category of band, but you see images of individuals.
racing drivers. Many times there is situation that we can see his car - but not him. I spend a lot of time trying to solve this issue, and Depictor was tool that I was using to mark only this images that have this specific driver. Example: Category:Marco Andretti, Category:Alex Garcia
Fictional characters. I am not sure how Category:Jasmine (Disney) should work, if its fictional.
additionally Category:Love and other feelings. Weather stuff.

You can ask "ok, PMG, then what is good for marking such images". Its: cemeteries, churches, sportsman/sportwoman, ships, paintings, actors, fountains, military people (But not military units). If you want more info I am happy to share my experience in this subject, please ask. PMG (talk) 17:20, 6 February 2023 (UTC)

I don't have much to add to the discussion, i think PMG lined it out quite nicely here. In general i would say that a blanket 'let's add ML tags to images' is a bad idea because there is so much nuance in all the different topics. One approach that i think could work is using it more as a suggestion, and then it could also work in conjuction with a tool like Depictor. Run an algorithm on a bunch of images, get back some suggestions, and then use those together with a human reviewer to make sure the tag is actually correct.

Husky ^{(talk to me)} 12:03, 10 February 2023 (UTC)

AI generated images

What is the correct way to show that an AI generated image was created using DALL-E? I don't believe simply P180 (Depicts) would be appropriate Trade (talk) 18:29, 3 February 2023 (UTC)

A very good question; you mean, this is like indicating that an image is a drawing or painting, and then, e.g. an oil painting? Ziko van Dijk (talk) 18:42, 3 February 2023 (UTC)

the best way to indicate that an image is AI genersted would be by using genre (136) with artificial intelligence art (Q65066631) as the value.@Ziko: --Trade (talk) 21:25, 3 February 2023 (UTC)

fabrication method (P2079) ? Jheald (talk) 19:17, 4 February 2023 (UTC)

Another option might be something like instance of (P31) = computer-generated imagery (Q6002306); but I am a bit wary of recommending this.

A query https://w.wiki/6JKe breaking down a random set of 100,000 cases (out of 23,211,627 total https://w.wiki/6JKs) shows that instance of (P31) seemingly can be used in this way (although seeming not yet with Q6002306 as a value). But, from wikidata experience, there may be good reasons to prefer where possible to try to express information using attribute = value, rather than instance of (P31) with an ever-larger number of subclasses and case-classes.

Also the item computer-generated imagery (Q6002306) isn't so good -- at the moment it seems to be doing double duty for both (i) the general art of using computers to make works, and (ii) an individual item of computer-assisted work. This double use isn't good, and should be cleared up. Jheald (talk) 19:42, 4 February 2023 (UTC)

References

Yesterday I found this edit at one of my images: Revision of 728226300. The references were new to me, but I guess they exist now. As a reference, however, a template used only internally I find confusing for users. It not something to look up or check. Even otherwise, however, I find the references rarely helpful. The references should be however in any case also for users visible and understandable sources. --XRay 💬 06:51, 12 February 2023 (UTC)

failed-save,[object Object],[object Object],[object Object]

Constantly getting my edits ruined by this error message is getting real tiring. Trade (talk) 21:10, 16 February 2023 (UTC)

"failed-save" is a negative response from the server that should usually be accompanied by an error message. "[object Object]" is usually an indication of a programming error. So I guess when there is something wrong server-side, the client-side handling is also broken. It would be good if you wrote donw the steps to reproduce and reported the problem, so that it can be looked at. --Matěj Suchánek (talk) 17:58, 25 February 2023 (UTC)

I believe the problem is caused when one of the pictures are protected against being edited by non-admins Trade (talk) 22:15, 25 February 2023 (UTC)

I tried to reproduce this on File:FISHERMAN.jpg (BTW why has this picture been protected infinitely for edit warring?), but I don’t have any edit button in the SDC tab. So either you used a non-standard tool you forgot to mention, or something else is the culprit. —Tacsipacsi (talk) 16:17, 27 February 2023 (UTC)

Some Exif time fields are valuable for detecting errors

Some Exif time fields might be worth putting into the SDC, as one can detect lagging GPS coordinates. Jidanni (talk) 05:30, 3 March 2023 (UTC)

I usually prefer to add the GPS information from the file page to the SDC. If there is no GPS information in the file page, I would also advise to first transition it there. --Schlurcher (talk) 16:27, 3 March 2023 (UTC)

Redundancy

Is structured data supposed to be redundant - that is having more general descriptions in addition to specific? For example I see a photo of Barack Obama already with structured data "Barack Obama", and users add structured data "man" and "human"; for a photo of Chicago in addition to "Chicago" also "United States". I know that this is considered inappropriate in Commons Categories. Is it otherwise for structured data? Wondering, -- Infrogmation of New Orleans (talk) 21:07, 23 March 2023 (UTC)

@Infrogmation: sadly, as far as I can tell, there is no clear consensus about this. Apparently, computational difficulties in moving down the hierarchy in wikibase searches have driven things toward at least a lot of people going for more of a "tagging" approach here. - Jmabel ! talk 22:58, 23 March 2023 (UTC)

Relevant discussion from December 2022. Given that Commons' primary mission is to make media files available for people to use them, I'm starting to think that we should try to view things from a common re-user's perspective when we discuss these things. In that regard, I consider our Category system a great failure, as we tuck away useful files in subcategories of subcategories, never to be found by anyone who does not know their way around. Maybe a good rule of thumb is asking the question: "If I was searching for this term, would I be surprised to find this file?" before adding a depicts (P180) statement – and if the answer is "no" just go ahead. El Grafo (talk) 06:54, 24 March 2023 (UTC)

El Grafo - if our category system is any sort of "failure", IMO it is because it is so very hidden from casual users. I've encountered some history researchers who were familiar with the existence of Wikimedia Commons for some time before discovering categories - and then being very excited with all the material they could suddenly find. Commons default search function is IMO quite poor. -- Infrogmation of New Orleans (talk) 18:08, 31 March 2023 (UTC)

People cant seem to agree how to handle things like countries, cities and administrative units in photos Trade (talk) 23:10, 6 April 2023 (UTC)

Can anyone explain what is going on here?

https://commons.wikimedia.org/w/index.php?title=File:A_Gatherer_of_Faggots_at_Neah_Bay.jpg&action=history

It seems like some sort of edit war over which "depicts" are "prominent," but in some cases I see the same account on both sides of the edit war! And what is "campaign239@ISA"? An awful lot of distraction in my watchlist as this goes on. - Jmabel ! talk 04:35, 22 March 2023 (UTC)

It is a campaign as part of the WikiGap campaign. Commons:ISA Tool/Challenges ❙❚❚❙❙ GnOeee ❚❙❚❙❙ ✉ 06:41, 23 March 2023 (UTC)

The campaigns have also caused considerable problems in some cases in the past. On the one hand, money is offered and that seems to motivate some. On the other hand, changes are made without sense and reason, just to get the edits. The tool itself works inadequately (keyword: qualifier) and overall it is more about quantity than quality. --XRay 💬 09:40, 23 March 2023 (UTC)

Wait a second, we are paying people to make statements? Trade (talk) 21:48, 6 April 2023 (UTC)

I spotted some useless edits too, i.e. making all statements prominent even that all statement are kind of equal and no need to make all prominent. Raymond (talk) 10:40, 23 March 2023 (UTC)

This is continuing. Any idea why this one file is getting so many dubious edits? Perhaps it should be protected for a while. - Jmabel ! talk 14:58, 24 March 2023 (UTC)

Again very useless edits. Neither table, nor mask nor microphone are prominent parts of the image. Raymond (talk) 19:15, 29 March 2023 (UTC)

This is continuing on File:A Gatherer of Faggots at Neah Bay.jpg and is reaching the point of absurdity. - Jmabel ! talk 15:02, 30 March 2023 (UTC)

It's continuing here too :-( Raymond (talk) 06:42, 31 March 2023 (UTC)

Here too. I left a note on the Commons talk:ISA Tool/Challenges page as well, since it seems to be the source of the issues in the file I linked. (Thanks Raymond for the redirect to this discussion) Clay (talk) 14:52, 31 March 2023 (UTC)

Cross-posted at Commons talk:ISA Tool#Issues with rank flipping to gain contest edits as well, since there seems to have been some discussion there about the same issue some months ago. Clay (talk) 15:05, 31 March 2023 (UTC)

Annnnnnd again. 2nd time this day :-( Raymond (talk) 15:48, 31 March 2023 (UTC)

I've noticed the same pattern on several other files, including File:Mother Earth.jpg. It's getting ridiculous. - Eureka Lott 21:52, 31 March 2023 (UTC)

Even though the challenge ended at the end of March, User:Mutasim Elmahadi is continuing to make these pointless edits. There are several warnings on the user's talk page, but they've been unresponsive so far. Is further action needed to stop this? - Eureka Lott 13:43, 3 April 2023 (UTC)

T321272 is probably describing the same pattern (gaming the system by toggling the rank normal/preferred on depicts). Lucas Werkmeister (talk) 21:39, 6 April 2023 (UTC)

Yes, exactly the very same situation. That sucks really bad. We have to discontinue entirely such drives until this Phab is implemented. Anthere (talk) 22:21, 11 April 2023 (UTC)

Maybe we should just cancel the whole thing? Trade (talk) 21:57, 6 April 2023 (UTC)

Location of creation

If you look how we currently categorize our photos by topic. We generally have:

What or who we see in the photo. In structured data we use depicts (P180) for this (usage 11M+)
When the photo was taken. In structured data we use inception (P571) for this (no usage data: not in search index, query seems to time out)
Where the photo was taken. In structured data we use location of creation (P1071) for this (usage 8M+)

I'm working on the location of creation (P1071) at the moment. I'm doing some data clean up after some of our earlier experiments to get the data in line with Commons:Structured data/Modeling/Location. location of creation (P1071) should contain the most specific location. So for example this photo should only have location of creation (P1071) set to Elswout (Q2278595) and not also to Bloemendaal (Q9908), North Holland (Q701), etc. We should be able to derive that based on Wikidata. We want this to work for both query and search:

Query: a precise request for information retrieval made to a database or information system. For example Query for location of creation is Haarlem
Search: a more free form way to search unstructured data. For example For example search for Haarlem

In both cases we want to have recursion. For queries that is already possible. For example if I want to have all photos created in North Holland (Q701), I first ask the Wikidata query engine for all location in North Holland and than combine that here (version combined with coordinates so you get a nice map). You can just change the item in these queries to get it for your favorite area, the map query is quite useful for spotting mistakes.

Query is nice, but quite hard to use and very strict. We also want search! Search will only find things that are in the index (contents for the example image). So to be able to find things, we have to add it to the index. We do that by adding it to the wikitext. I've been doing that for a while for Geograph. See for example this file where the "Place of creation" field is populated with a tree from the location to the country. As a pilot I updated some of the monument templates with similar logic. See for example this image where the location tree is included. I intend to expand on this to make it more visible. That would make it more valuable to add location of creation (P1071) because it increases the ability to find the image. Multichill (talk) 16:19, 8 April 2023 (UTC)

Note (to everybody) that both location of creation (P1071) and location of the point of view (P7108) exist. I would suggest reserving location of the point of view (P7108) for when we know there was some particular very specific location the image was taken from -- eg from the top of a particular hill; from a particular building; from a particular bridge, etc.

IMO location of creation (P1071) is useful because it allows us to specify a region in which the image was taken, while leaving P7108 clear to indicate if a more sharply-defined place is known.

I would also suggest that if a location of creation (P1071) statement is present, that value should not generally appear as the value of a depicts (P180) statement. (Not sure if that is already implicit in what User:Multichill suggests above).

Counts: we currently have 2,813,420 files using location of the point of view (P7108) (https://w.wiki/6Yzx), and 8,216,139 using location of creation (P1071) https://w.wiki/6Y$2

That number for P7108 may be larger than perhaps expected; but on investigation it corresponds to 99.8% of uses having "location of point of view" = a value from class "space laboratory" (ie value = ISS), with the remaining ~7000 files all looking to have P7108 values from classes which do seem pretty localised (https://w.wiki/6Y$4). -- Jheald (talk) 23:05, 8 April 2023 (UTC)

I'd also think location of the point of view (P7108) would often be useful for a photo taken from across a body of water such as a river or lake, where very often the photographer is standing in a different jurisdiction than the thing photographed. - Jmabel ! talk 04:51, 9 April 2023 (UTC)

Yes, for example for all the images in categories like Category:Views from Tokyo Tower or Category:Views from KölnTriangle or Category:Views from the CN Tower. --XRay 💬 07:39, 9 April 2023 (UTC)

I'm still looking for a good property to add informations like "Taken in Category:Glacier National Park". --XRay 💬 07:44, 9 April 2023 (UTC)

@XRay: use location of creation (P1071) -> Glacier National Park (Q373567). Multichill (talk) 14:23, 10 April 2023 (UTC)

Too simple. What about Yellowstone National Park (Q351) and how to specify Montana, Wyoming or Idaho additional. And what about the other national parks and nature reserves in more than one state or district? --XRay 💬 18:25, 10 April 2023 (UTC)

So what's the problem with having both location of creation (P1071) -> Glacier National Park (Q373567) and location of creation (P1071) -> Montana (Q1212)? Properties don't have to be single-valued. - Jmabel ! talk 18:39, 10 April 2023 (UTC)

Good idea. --XRay 💬 18:53, 10 April 2023 (UTC)

IMO location of creation (P1071) and location of the point of view (P7108) have a different meaning. IMO location of creation (P1071) is for streets, cities, districts, countries and other regions, location of the point of view (P7108) for views from towers, bridges, ships and other like "views from …". BTW: A building is not necessary within one district only. --XRay 💬 07:49, 9 April 2023 (UTC)

BTW: IMO location (P276) was a good place for nature reserves, national parks, buildings (if building interior was shown). The nature reserve is not necessary shown, so location (P276) was a good solution. I am very sorry that this feature is now removed. --XRay 💬 07:55, 9 April 2023 (UTC)

At least for protected areas we also have located in protected area (P3018). But for all other location information I would also prefer to use location (P276) and located in the administrative territorial entity (P131) for the administrative region. GPSLeo (talk) 17:32, 9 April 2023 (UTC)

@XRay and GPSLeo: the structured data here applies to the files, so for this file the strucutured data is saying "this photo was taken in Elswout (Q2278595)". Elswout (Q2278595) has a location (P276) and located in the administrative territorial entity (P131). It would be redundant and incorrect to add these to the file. Why would you want to do that? Multichill (talk) 14:23, 10 April 2023 (UTC)

Because queries considering such a structure would be very komplex and might timeout in many cases. It is the same why Wikidata has country (P17) despite this is redundant to located in the administrative territorial entity (P131). GPSLeo (talk) 16:10, 10 April 2023 (UTC)

I agreee GPSLeo. Redundancy belonging to different properties can't be a problem and can't be a reason to remove properties. If you search for haswbstatement:P276 it is not the same as haswbstatement:P131. If both have the same value it isn't a problem. But it is a problem to search for several different properties to find the property with this value. --XRay 💬 16:38, 10 April 2023 (UTC)

One other thing to think about may be how location of creation (P1071) is used on images of artworks. If someone goes to a gallery and takes a photo of a painting or a sculpture, I suspect location of creation (P1071) may be used for the district where the photo is taken. On the other hand, if we have an image of a scan of an engraving, will location of creation (P1071) be used for where the scan is made? Or where the engraving was originally made, eg maybe Paris or Amsterdam? I am not sure what people will do in such a case, but we should probably make some guidance, otherwise there might be an uncomfortable transition between one kind of usage and another. Jheald (talk) 19:05, 9 April 2023 (UTC)

Lua module on other wikis?

Is there an equivalent to Module:Wd or Module:WikidataIB to retrieve structured data from a file on wikimedia commons and display it in the text on another wiki? E.g. something like {{#invoke:XYZ|getPropOfProp|File:Inhibition mechanism (irreversible).svg|P275|P1813}} would return "CC BY 4.0". Any ideas? I'm wanting to implement something automatic for populating template:attrib in image captions on wikiversity but I've no LUA skills. T.Shafee(Evo﹠Evo)^talk 03:08, 30 March 2023 (UTC)

Not possible I’m afraid, other wikis don’t have access to the structured data on Commons. Lucas Werkmeister (talk) 20:22, 30 March 2023 (UTC)

@Lucas Werkmeister Do you happen to know if there's any future prospect of other wikis getting access to structured data on Commons? Is it that there isn't current demand, or are there fundamental limitations? Thanks! T.Shafee(Evo﹠Evo)^talk 06:30, 24 April 2023 (UTC)

@Evolution and evolvability: I would say it’s at least blocked on significant, though not fundamentally insurmountable, technical challenges. (Other Wikimedia wikis would have to become Wikibase clients not only of Wikidata but also of Commons; I can’t actually find a task for that, surprisingly – the closest I found is T325955 where it’s somewhat implied.) Lucas Werkmeister (talk) 12:50, 24 April 2023 (UTC)

How far away is that Lua-clients would be able to use SPARQL queries to fetch data instead of direct SDC access? Zache (talk) 08:37, 26 April 2023 (UTC)

Even further away than that, I would say. (When using a SPARQL query, there’s no way to know when the page needs to be rerendered because the underlying data changed.) Lucas Werkmeister (talk) 18:41, 28 April 2023 (UTC)

Errors

https://commons.wikimedia.org/w/index.php?search=haswbstatement%3AP180%3DQ1329910&title=Special:Search&profile=advanced&fulltext=1&ns0=1&ns6=1&ns9=1&ns12=1&ns14=1&ns100=1&ns106=1 shows 70 images with https://www.wikidata.org/wiki/Q1329910 (view, MySQL view), but people who added those structured data, apparently confused that with https://www.wikidata.org/wiki/Q2075301 (view, scenery). How to easily change them? Bennylin (yes?) 12:37, 12 April 2023 (UTC)

Same with Trees (Q15755136). --XRay 💬 15:56, 12 April 2023 (UTC)

Another one: root (linguistic), applied to plant roots. Bennylin (yes?) 17:12, 12 April 2023 (UTC)

You can use petscan: (example) to change it. Example is made using these settings:

Select commons as target wiki (from categories tab)
Select file as namespace (from page properties tab)
Select search parameter and set "commonswiki" as target wiki (from other sources tab)
RUN SEARCH
in right side of screen above search results is "command list: P31:Q5 to add; -P31:Q5 to remove"
Click start QS
Login to quickstatements in new tab
close quickstatements tab
Add text command list field:
-P180:Q1329910
P180:Q2075301

Click start QS again
Click Run in quickstatements

-- Zache (talk) 19:36, 12 April 2023 (UTC)

But we also need a solution that such statements are not added again. GPSLeo (talk) 19:57, 12 April 2023 (UTC)

Upcoming work on data modeling and Lua templates documentation

Tracked in Phabricator
Task T335910

Hi all, as some of you may know, I have been involved in adding SDC functionalities to OpenRefine (editing SDC, and batch uploading files with SDC).

For batch upload tools that support structured data, like OpenRefine, it is very helpful to:

Have a few widely supported 'default' SDC data models that we can offer as options (as default templates) to all users
Avoid duplicating data between Wikitext and structured data - it's clearer for uploaders if this is not necessary

In this sense, the work by @Jarekt and @Multichill to develop Lua-powered file information templates (I call them "minimal Wikitext templates") has been tremendously useful; I have included the widely usable ones ({{Information}}, {{Artwork}} - where the work has a Wikidata item, {{Art Photo}} - where the work has a Wikidata item) already in OpenRefine.

I've been in touch with quite a few people doing batch uploads with OpenRefine. Observing their work has brought other information templates to my attention that would benefit from such a "minimal, fully Lua-powered Wikitext" approach (with data models that then can be offered by default in tools like OpenRefine). These include {{Photograph}}, {{Specimen}}, {{Book}}, {{Artwork}} (but in situations where the work has no Wikidata item, and may probably never have one)... To help move these use cases forward, I plan to spend a bit of time at the Wikimedia hackathon (next week in Athens) writing (more extensive) data modeling proposals for such situations, and documenting the existing, broadly applicable "minimal Wikitext" templates a bit better. Here's a Phabricator ticket for that work (which can continue after the hackathon): https://phabricator.wikimedia.org/T335910 I hope this makes sense, and welcome any thoughts and suggestion here, or on Phabricator. Help and input is very welcome. Cheers, Spinster (talk) 15:06, 12 May 2023 (UTC)

I would be happy if the capabilities of the search engine would be improved. For example, images where Dietrich-Bonhoeffer-Kirche (Q1223624) is set as depicted statement should also be found when searching for church building (Q16970). --XRay 💬 16:35, 12 May 2023 (UTC)

I agree. If I'm not mistaken, @Multichill has started working on ways to expose this kind of information to the search engine index - first for Rijksmonumenten as he's very familiar with that topic. See an example file here (the added lines e.g. depicts instance of expose the information for search) Spinster (talk) 17:19, 12 May 2023 (UTC)

These information can help the search engine, but: It should be inside the box and it does not fix the problem at the source. The index of the search engine itself should contain this information without such additives. --XRay 💬 17:48, 12 May 2023 (UTC)

If the WMF would fund a team to implement a solution for recursive search of structured data this could be fixed within a couple of month. But the structured data team is gone and the Commons:Product and technical support for Commons 2022-23 team also seems to be gone. GPSLeo (talk) 18:27, 12 May 2023 (UTC)

These are not good conditions. The framework for structured data is still incomplete. It follows that inadequate substitute solutions are now being created. Redundant data is already being considered in the rule set for the depicted statement. I find something like this catastrophic. --XRay 💬 18:56, 12 May 2023 (UTC)

+1 - Jmabel ! talk 20:05, 12 May 2023 (UTC)

Faceted, SDC-powered MediaSearch for Wikimedia Commons

Tracked in Phabricator
Task T337106

Browsing through the archives of this talk page, I noticed that we have discussed faceted search for Commons a few times, but no general Phabricator ticket had been created for that yet. I have created one: https://phabricator.wikimedia.org/T337106 Opinions and feedback are welcome both there and here. Cheers, Spinster (talk) 11:46, 20 May 2023 (UTC)

Superfluous depict statements

A bot set gable (Q1161370) (Revision of 750048576), but crow-stepped gable (Q1939660) already exists. IMO it is not good adding superfluous statements. I understand the problem, but a solution should be found elsewhere. To improve search results, the search engine should be improved. Superfluous statements only lead to confusion. --XRay 💬 17:34, 22 May 2023 (UTC)

BTW: See COM:DEPICTS, What items not to add --XRay 💬 17:36, 22 May 2023 (UTC)

I found the hint These generic "tags" should not currently be added if more specific depicts statements already exist. Is there any running discussion or voting? --XRay 💬 06:52, 25 May 2023 (UTC)

Question about the meaning ... checksum

I've found new items added to images: checksum (P4092). I certainly know how to determine checksums and what they are used for. However, I wonder what use they have in the SDC. Can anybody help/explain? --XRay 💬 15:59, 2 June 2023 (UTC)

If we had this on all files this could be used for duplicate detection or to check if a file download was successful. But of course this only works if every file has such a statement. GPSLeo (talk) 15:36, 4 June 2023 (UTC)

It is a part of the meta data too. OK, with SDC you can compare with SPARQL. Is this really interesting for users? --XRay 💬 16:19, 4 June 2023 (UTC)

I think the idea is to move all metadata to SDC. Not necessarily for queries but for regular API requests. That you have one API request structure for all information. GPSLeo (talk) 08:46, 5 June 2023 (UTC)

@GPSLeo: since the term "metadata" is variously used: are you saying there is an intention of completely removing wikitext from file pages, or something short of that? - Jmabel ! talk 16:10, 5 June 2023 (UTC)

My opinion: We are still many years away from removing the wikitext. There is still so much in there that cannot be mapped at the moment. --XRay 💬 16:18, 5 June 2023 (UTC)

You can check if the file is broken/original with md5/sha1. However, for duplicate detection it has less uses compared to something like w:Perceptual hashing. I am currently running imagehash dhash and phash index in mariadb in Toolforge for duplicate detection of finnish photos, but my index contains only 3M-4M photos of Commons 90M photos. Afaik those could in theory be useful in SPARQL index, but currently there is no meaningful way to do the hamming distance queries. -- Zache (talk) 17:57, 5 June 2023 (UTC)

Qualifier incomprehensible

I tried, but my English is obviously not good enough. What does the qualifier object has role (P3831)/subject of this media's item-level description that is not depicted in the media (Q117973992) mean? --XRay 💬 18:06, 22 May 2023 (UTC)

@XRay: The idea seems to be (according to the usage instructions for Q117973992) that if the "main subject" (main subject (P921) of the photo is not something actually depicted, you use that qualifier and value. It's not entirely clear to me what they have in mind, but I'd guess this would be something like a picture of a family member of someone famous, where the famous person might be considered the main subject, but not depicted, or similarly for a document which is of importance only because of who it is about (with that person being the main subject), or maybe a wall text for a painting (where the painting is the unshown main subject), etc. But I don't think it's terribly well described, and even as a native English speaker I could well have failed to understand the intent. - Jmabel ! talk 20:04, 22 May 2023 (UTC)
- Thank you. There are altogether a number of mysterious things, which are incomprehensible for third parties (and me). The qualifier above can be found for example at File:Münster,_Historisches_Rathaus,_Giebelfiguren_--_2019_--_3573.jpg. Have a look at main subject (P921) an item gable (Q1161370) was added by a bot (User:DPLA bot, a bot for uploading images from Digital Public Library of America). The item has the qualifier object has role (P3831)/subject of this media's item-level description that is not depicted in the media (Q117973992) - and a depict statement crow-stepped gable (Q1939660) exists. BTW: It's a photograph taken by me, not from the Digital Public Library of America. The same bot added the superfluous depict statement gable (Q1161370) (Revision of 765793772, crow-stepped gable (Q1939660) already exists) with the qualifier based on heuristic (P887)/inferred from DPLA subject term (Q114065533) as reference. In my view, such additions are unnecessary and incomprehensible to third parties. --XRay 💬 05:10, 23 May 2023 (UTC)
- If the main subject is not depicted, then it is not the main subject. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:58, 4 June 2023 (UTC)
  - Yes, but depicts (P180) and main subject (P921) are not both depiction items. main subject (P921) for example can be more abstract. The meaning may seem similar, but it is not identical. --XRay 💬 13:09, 4 June 2023 (UTC)
    - I made no attempt to argue that they are the same. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:33, 13 June 2023 (UTC)

Ordering

There are a lot of statements of SDC, but how to order these statements? IMO this should be done in a common way, not for each file. --XRay 💬 06:29, 22 July 2023 (UTC)

Hi, this is defined globally in this file: MediaWiki:Wikibase-SortedProperties. Please feel free to supplement the current first version. I've recently made a table with most frequent properties in use which might be helpful. Regards, Schlurcher (talk) 15:47, 22 July 2023 (UTC)

Thank you. The list is quite short after all. Thank you for the link of the most used statements. --XRay 💬 19:11, 22 July 2023 (UTC)

SDC for Commons Categories

Hi, there are now a couple of thousand files that use Commons category (P373) in the structured data, see here [1]. I'm not understanding the value add for this replication. The standard arguments to use structured data (accessibility, searchability, queryable, etc.) do not seem to apply for SDC giving the commons categories. For one, there are better and more efficient tools to handle categories. Also Commons Categories are mainly about the hierarchy, which is exactly lost in using structured data. Any concerns in actively discouraging the use of use Commons category (P373) in the structured data? @F. Riedelio: : Most of these seem to be added by you. Could you please share your thoughts? Thanks --Schlurcher (talk) 11:54, 27 July 2023 (UTC)

Answered

Unfortunately, I don't understand the error message or the help for this. Why does the name of the category on Wikimedia Commons, which contains files for the object, have special problems? If the property P373 has been applied incorrectly by me, it can also be deleted. F. Riedelio • 💬 13:06, 28 July 2023 (UTC)

@F. Riedelio Well, properties for SDC are the same properties as those on Wikidata. Not all properties that are useful on Wikidata are useful for SDC. Commons category (P373) is a very important category on Wikidata, where it is used to connect categories at Commons to topics on Wikipedia and other projects. It just doesn't make much sense to use it for files on Commons.

@Schlurcher Categories and depicts (P180) statements so some degree already duplicate each other, I see no good reason to add another layer to that. Do we have a way to block-list certain properties from being used on media files? There is a bunch of properties that should never be used on media files (things like Wikidata item of this property (P1629), for example). Might make sense to prevent them from e.g. showing up in the search results when clicking "Add Statement" on a file page ... El Grafo (talk) 09:02, 31 July 2023 (UTC)

P373 is entirely pointless nowadays (just follow sitelinks), and uses here really should just be removed. Thanks. Mike Peel (talk) 10:03, 31 July 2023 (UTC)

Thanks! I've included a corresponding table in Commons:Structured data/Modeling/Meta. @El Grafo: I'm not sure if or how it is possible to restrict certain of the properties from use. Maybe others can answer. What I can do is enforce this through my bot, i.e., configure it to remove all such statements if it sees them. --Schlurcher (talk) 11:13, 31 July 2023 (UTC)

@Mike Peel: P373 is pointless on Commons, but in Wikidata it is still useful for a Q-item that has a separate item for categories. Let's you get directly from the Q-item to the relevant Commons category without detouring through either that separate Wikidata item or a Wikipedia article or a Commons gallery page. Probably not that important for a piece of software, but very useful for a human who is trying to navigate by hand. - Jmabel ! talk 16:35, 31 July 2023 (UTC)

@Jmabel: It might be a convenience, but it is duplication, and we don't do that for Wikipedia categories. But definitely not useful here! Thanks. Mike Peel (talk) 20:47, 31 July 2023 (UTC)

@Mike Peel: Wikidata has quite a few conveniences/duplications. Think of has part(s) (P527) and part of (P361), where we are always supposed to use both reciprocally, even though in theory we could have only one and calculate the other. - 15:23, 1 August 2023 (UTC)

I think that if we want to expoce the categories via SDC mediainfo we should do it automatically using code, not manually using bots or by hand. --Zache (talk) 15:55, 1 August 2023 (UTC)

Disambiguation of Wikidata items

Tracked in Phabricator
Task T262142

On Tuesday, I opened a discussion on Wikidata, which is here, In brief, the problem is what I encounter by uploading a lot of photographs of Dutch streets on Commons using the upload wizard. After filling in all necessary fields, I am prompted to fill in P180 ("depicts") for the Structured Commons. In my case, this is the name of the street. If the name is relatively common, in the dropdown menu of 7(?) items I get "my" street does not show up. On Wikidata, I proposed to add (by bot) in the field "also known as" smth like "Streetname, Foocity". This would solve the problem here by typing smth "Streetname, F", which would give me a more targeted list in the dropdown menu. The proposal is still open, but at this point it is clear it is not going to get consensus (pls do not try to influence it, this is not the point of my message here). The only other way to solve the problem I see is from the Commons side - to add a function "more" to the dropdown menu, which would bring the next seven items and so on, so that eventually I would get to the item I need. Would the Commons community support this? If not, I will probably stop filling in P180 and recommend to remove the step from the upload workflow, because it becomes a horrible waste of time with only a minimal benefit. Ymblanter (talk) 16:31, 14 September 2023 (UTC)

I would have no problem with that. - Jmabel ! talk 18:20, 14 September 2023 (UTC)
I would have also no problem with adding a “more” option to the dropdown, but I fear it’s out of the control of the Commons community. —Tacsipacsi (talk) 22:34, 15 September 2023 (UTC)
I made that proposal (on someone else's behalf) in 2020. --Matěj Suchánek (talk) 08:02, 16 September 2023 (UTC)
Thank you, it looks like we are stuck here and there is no way forward. Ymblanter (talk) 09:19, 16 October 2023 (UTC)

It only shows 7 possibles. You end up having to search on Wikidata. Secretlondon (talk) 23:02, 7 October 2023 (UTC)

Qualifier for inscriptions

Hi! I've seen qualifiers like inscription (P1684) (at depicts (P180)). A language is need to set this qualifier. But what about inscriptions with digits only or unknown language? --XRay 💬 07:52, 16 October 2023 (UTC)

I would say digits would work for any language which uses them (though filling in "depicts" with only digits would be weird). Concerning an unknown language, I am not sure: How are you going to fill in "depicts" with an unknown language? Or in any language you do not speak? How do you know it is not nonsense? Ymblanter (talk) 09:19, 16 October 2023 (UTC)

An unknown language may be present if, for example, only individual letters can be read from an inscription. And please have a look to {{Inscription}}. You can use '?' for an unknown language and '~' for digits. I can't find a similar expression for the qualifiers. --XRay 💬 09:45, 16 October 2023 (UTC)

BTW: At the template inscription a '/' can be used for multilingual inscriptions. --XRay 💬 09:46, 16 October 2023 (UTC)

The statements on d:Property:P1684 already provide an example for an unknown language: use the language code und. (I hope the SDC interface allows that, Wikidata clearly does so.)

The case of digits is trickier. Probably the easiest (if applicable) is to use the language of the work/author; this is the language the author likely pronounced the number while writing it down (if they pronounced it). For example, if the inscription is C. Monet 1896, I’d use French because Monet was French. If there’s no clue, maybe I’d use und (Undetermined) here as well. Another possibility is mul (Multiple). —Tacsipacsi (talk) 16:12, 17 October 2023 (UTC)

The example at P1684 is well known, but SDC does not support it. A language is required. --XRay 💬 17:12, 18 October 2023 (UTC)

und is a valid ISO 639-3 language code. If the SDC software doesn’t support valid ISO 639 codes, it’s broken and should be fixed. (SDC software being broken is neither new nor surprising, though…) —Tacsipacsi (talk) 19:36, 19 October 2023 (UTC)

Oh. I'll try to use it. May be another code is good for digits: zxx. --XRay 💬 14:56, 20 October 2023 (UTC)

I don’t think zxx is appropriate for numbers, they are very much linguistic content. To illustrate: depending on the language, the same number may be written as 3 (English), ٣ (Arabic), ३ (Marathi), ༣ (Lhasa Tibetan), Ⅲ (Latin) etc. —Tacsipacsi (talk) 21:17, 21 October 2023 (UTC)

OK. But two other will help in other cases: und (see above) and mul if there are multiple languages. --XRay 💬 08:15, 22 October 2023 (UTC)

Yes, those are useful indeed, although multiple languages can be split up into multiple inscription (P1684) qualifiers in certain situations (e.g. if the same text is repeated in multiple languages). —Tacsipacsi (talk) 13:16, 22 October 2023 (UTC)

Discussion at Commons:Village pump#Blockers to automated import of structured data

You are invited to join the discussion at Commons:Village pump#Blockers to automated import of structured data. {{u|Sdkb}} ^talk 21:44, 9 November 2023 (UTC)

No metadata about color scheme

Hi. We recently had one query request over at Wikidata where a contributor wanted to look for P18 statements with black and white images, including monochromatic images such as sepia toned images. Unfortunately it doesn't seem like this is included in the structured data as of yet, but it would be immensely helpful in locating P18 images that could use a modern version. Is adding color (P462) claims on Commons images aceptable? And if so, how would you want it modelled? @Bouzinac: Infrastruktur (talk) 16:03, 16 November 2023 (UTC)

I think color (P462) itself is probably best used to describe an image's contents, as a qualifier to depicts (P180) (e.g. to signify that the depicted bus is red). El Grafo (talk) 16:25, 16 November 2023 (UTC)

@Infrastruktur: genre (P136) with either monochrome photography (Q91079944) or black-and-white photography (Q3381576), the latter being more specific. I'd rather have a property based on photographic technique (Q1439691), but it's just a Q-item without a corresponding property. - Jmabel ! talk 18:22, 16 November 2023 (UTC)

Thanks. That seems like a good way to do it. Infrastruktur (talk) 18:53, 16 November 2023 (UTC)

Structured Data in the image page

Answered

Is there a way to lookup structured data in the page of an image? Regards, --Antoine2711 (talk) 16:53, 15 December 2023 (UTC)

Please clarify "lookup"? You you mean to get them populated in page descriptions? Template:Structured Data is a good starting point to see that in action. --Schlurcher (talk) 17:38, 15 December 2023 (UTC)

@Antoine2711: If you just mean that you want to see the structured data for the file page, there are two side-by-side tabs, "File information" (the default) and "Structured data". Click the latter to see structured data. The only exception is the caption, which is technically structured data but shows on the "File information" tab. - Jmabel ! talk 20:36, 15 December 2023 (UTC)

I have pictures of puppets, and in structured data, I pushed the height, width, and weigth of the said puppet. I would like to show that in the description. I look at the https://commons.wikimedia.org/wiki/Template:Structured_Data, it shows some declarations, but not the one I want.

I would like to retrieve that specific information like this: «{{Stuctured Data|P2048}}» to display: « Height: 12 cm », when in english, and « Hauteur : 12 cm » when in french.

Regards, Antoine2711 (talk) 21:15, 15 December 2023 (UTC)

@Antoine2711: It would be really helpful if you would link to a specific image as an example. - Jmabel ! talk 22:27, 15 December 2023 (UTC)

https://commons.wikimedia.org/wiki/File:TI_FlorientBeauregard_face.jpg

So I would like this data to appear in the main page, in the description. Note that my data is a qualifier of the depicted items in SDC, so it's far. Also note that this data is in the WD item, so I could want to fetch it from there also…

https://www.wikidata.org/wiki/Q107553062

Regards, Antoine2711 (talk) 23:38, 15 December 2023 (UTC)

@Antoine2711: I strongly suspect nothing of the sort exists. Certainly nothing general in Category:Structured Data on Commons templates. {{Information}} and {{Artwork}} both can do something of the sort, but I suspect it was implemented in a manner not available to ordinary users. - Jmabel ! talk 03:54, 16 December 2023 (UTC)

Actually, I'm almost getting my answer with

https://commons.wikimedia.org/wiki/Module:Wikidata

and

https://commons.wikimedia.org/wiki/Module:WikidataIB

But, I'm fairly new to Modules and Lua. I'll keep searching and testing.

This works: {{#invoke:Wikidata|formatStatementsE|item=Q107553062|property=P31}}: puppet and fictional character

And this also: {{#invoke:Wikidata|formatStatementsE|item=Q107553062|property=P2048}}: 152.5 centimetre

Regards, Antoine2711 (talk) 04:10, 16 December 2023 (UTC)

@Antoine2711: If you want to use the Commons Structured Data directly, you can do:

{{label|P2048|link=-}}{{colon}}{{#invoke:WikidataIB|getQualifierValue|qid=M107619255|P180|pval=Q107553062|qual=P2048|osd=no|fwd=ALL|noicon=true|linked=no}}. This gives: height: 152.5 centimetre.

The number in M107619255 is the page id of File:TI_FlorientBeauregard_face.jpg. --LennardHofmann (talk) 13:13, 16 December 2023 (UTC)

Yes, that is the answer to my original question. Thanks a lot.

I have new questions now! ;-)

How can I get an uppercase letter at the start of the label?
{{#invoke:WikidataIB|getQualifierValue|qid=M107619255|P180|pval=Q107553062|qual=P2048|osd=no|fwd=ALL|noicon=true|linked=no}} shows in french as 12,5 centimètres, but {{#invoke:Wikidata|formatStatementsE|item=Q107553062|property=P2048}} shows in french as 12.5 centimètres (dot instead of a comma). Must the module Wikidata be fixed?
{{#invoke:WikidataIB|getValue|qid=M107619255|P180}} this doesn't work, to just get the value of the property. Why?
Another question I have is: if there is no value, how can I prevent this line from appearing? I guess is uses the {{#if|}} magic, but I don't know how to use it.

Thanks LennardHofmann for providing me the answer I needed. I was close, but you showed me more. Antoine2711 (talk) 14:31, 16 December 2023 (UTC)

@Antoine2711

You can uppercase the first letter with the magic word {{ucfirst:}}
Yes, French uses the comma as w:decimal separator
See Module:WikidataIB#Parameter_sets
Use Template:If then show, e.g. {{If then show|{{#invoke:WikidataIB|getValue|qid=M107619255|P180|ps=1}}||{{ucfirst:{{label|P180|link=-}}}}{{colon}}}}

LennardHofmann (talk) 13:29, 17 December 2023 (UTC)

LennardHofmann You have been so helpful, thank you very much. I'm grateful. If you ever need help/information about OpenRefine, you let me know. It's a tool to push mass of data to Wikimedia Commons or Wikidata, without coding APIs. Best Regards, Antoine2711 (talk) 16:59, 17 December 2023 (UTC)

Is any of this documented somewhere that I should have been able to find it? I spent 20 minutes looking for it before concluding incorrectly that it wasn't publicly available.
Very odd that what we invoke is called "Wikidata", given the repeated insistence that although SDC uses Wikibase, it is not Wikidata. - Jmabel ! talk 17:54, 16 December 2023 (UTC)
True that WikibaseIB would be a better name. What does the IB stands for anyway? Antoine2711 (talk) 18:35, 16 December 2023 (UTC)
IB = Infobox? --Matěj Suchánek (talk) 10:44, 19 December 2023 (UTC)

@Jmabel: The documentation I found is there:

Module:Wikidata

Module:WikidataIB

Regards, Antoine2711 (talk) 04:27, 17 December 2023 (UTC)
Neither of which was in Category:Structured Data on Commons or any of its subcats. I'll fix that. - Jmabel ! talk 07:10, 17 December 2023 (UTC)

No, I guess I won't. No idea how to add a category to those. Can someone who knows how please add Category:Retrieving Structured Data on Wikimedia Commons to those two module pages? - Jmabel ! talk 07:16, 17 December 2023 (UTC)

What properties to use to mark media files as l-r-stereoscopic, having nude content, having sexual content?

I like to add properties to my uploaded files for stereoscopic content, for containing nude depictions, for containing sexual depicitions. What properties would I use for that? C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 22:08, 16 December 2023 (UTC)

Steroscopy as a technique is stereo photography (Q17165350), but oddly there doesn't seem to be a property for photographic technique (Q1439691).

For the others, I'm guessing they can be done with depicts: nudity (Q10791), sexual intercourse (Q5873), cunnilingus (Q8402), fellatio (Q8401), etc.- Jmabel ! talk 23:55, 16 December 2023 (UTC)

While adding depicts is useful (and as time allows I will do that), but I was more looking for a type of rating. DSA/DMA is coming and WM is affected by it. Ideally I would like to add the information that is needed in the form a future MW content rating system will need it, or at least in a way that it can be automagically translated into the form that future sysetm will need it. C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 11:18, 17 December 2023 (UTC)

I have just created stereo photograph (Q123906082), which may be used with has characteristic (P1552). Categorisation of pornographic images in the manner suggested would require a wider discussion and consensus on Commons. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 21:31, 17 December 2023 (UTC)

There also a whole set of projections that really should be modelled, which I think it is good to have a discussion about.

For instance, stereoscopy can be lr, rl, tb, interlaced, red/green shift etc. And for 360 photography/videography you have projections like equirectangular, cylindrical, cube, eac etc, which in turn can be encoded with stereoscopy as well. I've been meaning to open a discussion about this, but haven't gotten around to it yet. —TheDJ (talk • contribs) 10:08, 19 December 2023 (UTC)

Agree to that. Also wiggle stereoscopy. But the majority of stereoscopic images are probably lr. Either PD stereo cards from the past, and images from @VasuVR an me. C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 19:47, 23 December 2023 (UTC)

Thank you for tagging in this discussion, @C.Suthorn. My earlier stereoscopic photographs I have tagged with stereoscopy (Q35158), under "depicts" (P162) - may not be really appropriate (example Flowers at Bristol ). Any final suggestion can be applied to all of my stereoscopic L-R images, which are in the category Category:Files (stereoscopic) by VasuVR. Please tag me with final decisions, suggestions. Thank you. VasuVR (talk, contribs) 08:59, 24 December 2023 (UTC)

Depicts is certainly wrong for photographic technique. - Jmabel ! talk 09:45, 24 December 2023 (UTC)