Commons talk:Structured data/Archive 2017

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

It's alive!

Hey folks :)

The Wikidata team has been working on structured data support for Wikimedia Commons for a while now. A lot of work had to be done in the background to make Wikibase (the software powering Wikidata) ready for this. We've made a lot of progress and can now show the very first results. What we have so far is a new entity type (next to item and property) called mediainfo. Mediainfo entities will hold the data about a media file.

We have set up a demo system with the current state and will update it as we make progress. The current state is still extremely basic but I'd rather show you progress very early than when it is all done and can't be changed anymore. We are still quite a bit away from a deployment to Commons.

What works so far:

  • You can upload a file and you will get a link to the associated mediainfo page.
  • You can click the link to the associated mediainfo page. At this point it does not exist yet in the database but you will see an empty "virtual" mediainfo entity. When you add a label or description it is properly created in the database.
  • You can add statements to existing mediainfo entities.
  • You can create and edit the mediainfo entity using the same API as for items.

What doesn't work yet:

  • A lot :D
  • You can't create the media info entity by adding a statement. It will fail.

What we will work on next:

  • Make the creation of the mediainfo entity work when adding a statement first. (smallish amount of work)
  • Make it possible to use the items and properties from Wikidata in the statements. (largish amount of work)
  • Integrate the mediainfo entity in the file page directly so you don't have to go to a different page to view the data. (huge amount of work)

You can find:

There is still a lot of work to do but we're making progress towards helping Commons store structured data \o/

Cheers --Lydia Pintscher (WMDE) (talk) 15:59, 28 July 2016 (UTC)

Thanks Lydia, looks great. May be be careful discriminating between the actual position of the object and the camera position, we use different templates for these on Commons.--Ymblanter (talk) 16:16, 28 July 2016 (UTC)
That is actually a good point. We could in the future do this with two properties or use qualifiers. It's either way flexible enough to accommodate that. --Lydia Pintscher (WMDE) (talk) 16:18, 28 July 2016 (UTC)
Thanks to everyone who worked on making this possible. The future of Commons looks awesome. Léna (talk) 16:17, 28 July 2016 (UTC)
Thank you :) --Lydia Pintscher (WMDE) (talk) 16:18, 28 July 2016 (UTC)
Same as Léna − thanks for the work :) Can’t wait to explore the possibilities further! :) Jean-Fred (talk) 16:24, 28 July 2016 (UTC)
+1. The future is here, and it is called mediainfo. Wittylama (talk) 16:25, 28 July 2016 (UTC)
Splendid news, thank you. First impressions are good, and I look forward to using it in production, However, one thing struck me immediately - there's no thumbnail of the image, on the data page! I trust that a solution for this is in hand? Andy Mabbett (talk) 16:31, 28 July 2016 (UTC)
It will become one page in the future so there should be no need for a thumbnail then. I'll see how hard it is to add one in the meantime. --Lydia Pintscher (WMDE) (talk) 16:32, 28 July 2016 (UTC)
\o/ Rama (talk) 16:36, 28 July 2016 (UTC)
@Lydia Pintscher (WMDE): Is there community consensus to put this stuff on the file description page directly? --Steinsplitter (talk) 16:49, 28 July 2016 (UTC)
Would you rather have it as a separate page in the future? Please also keep in mind that this is very early. There will be many design and functionality changes still before I consider it ready for Commons. --Lydia Pintscher (WMDE) (talk) 16:52, 28 July 2016 (UTC)
Likely. I posted a link on COM:VP for lager community input regarding this :-).--Steinsplitter (talk) 16:55, 28 July 2016 (UTC)
Personally, I would strongly prefer to have the structured data fields directly on the file page. After all, structured data will most likely replace most of our current Information template. --Sebari (talk) 21:43, 28 July 2016 (UTC)
  • I was slow to understand what is happening here and I had to look at this for a while before I got it. Lydia already explained this, but for anyone who needs the same thing explained again in another way, here is my attempt: Previously, file descriptions on Commons were free text which was stored in templates on Commons. Now to a limited extent, and coming more in the future, there are options to replicate all of the file descriptions in Commons in Wikidata. This means that instead of filling in multiple fields in a template, the same information can be put into Wikidata. When information is in Wikidata a user completes any number of single form fields, rather than working with the file description as one free text form. To see an example, consider this lighthouse photo. Note that there is no published file description in the file, and obviously, Commons needs to have these. But instead of the file description, one can click through to the MediaInfo listing. In that listing, all the information which a Commons file ought to have is there as structured data, which is obviously the best way to store this kind of information. In the future, this MediaInfo data will be published on Commons file pages, and also, there will be more options to enter more MediaInfo data for any file. Blue Rasberry (talk) 17:49, 28 July 2016 (UTC)
If you want information about a painting from Wikidata to show up in the file page of the photo of that painting here then you need to adapt or create a template here. --Lydia Pintscher (WMDE) (talk) 08:26, 29 July 2016 (UTC)
This is a great news. Congratulations to you all for this step and looking forward to all the possibilities that this line of work opens. :) --LZia (WMF) (talk) 21:27, 28 July 2016 (UTC)
Very good to hear, Lydia, this has the potential to greatly enhance Commons. One question: Will we be able to use Wikidata items directly in statements or will we have to duplicate Wikidata items on Commons? --Sebari (talk) 21:45, 28 July 2016 (UTC)
That's a good question. In the example image, it has a "depicts" field with "Lighthouse", and a new item Q3 was created for "Lighthouse", which doesn't at present link in any way to the Wikidata equivalent entry. More likely in practice you'd want the item to be "Poolbeg Lighthouse" which would be an instance of Lighthouse, so you'd end up building an entire topic hierarchy, matching Commons categories and Wikidata items. --ghouston (talk) 01:11, 29 July 2016 (UTC)
The idea is that it'll use the items from Wikidata. Making that possible is still quite a bit of work but it is what we are going to work on next. I have only created the items on the demo system now to show where it is going. --Lydia Pintscher (WMDE) (talk) 08:26, 29 July 2016 (UTC)
It's also creating an item Q2 for the photographer. Such items will find a match in Wikidata in some cases, but not others, depending on whether the photographers meet Wikidata's notability requirements --ghouston (talk) 01:43, 29 July 2016 (UTC)
Yeah we will create a new data type that lets you link to an item on Wikidata, a Commons user page, a flickr user page and more to cover that. --Lydia Pintscher (WMDE) (talk) 08:26, 29 July 2016 (UTC)
Wow, this is awesome, Thanks, -- Bodhisattwa (talk) 23:13, 28 July 2016 (UTC)
Very good news and a great motivation for all the users that work on nowadays wikitext metadata of files to be more structured by using templates for all types of information. This way we will once be able to transfer these data into the new system. Cheers, --Arnd (talk) 09:09, 29 July 2016 (UTC)
Awesome!! I'm giddy with excitement :D Sending many good vibes to the team for the next steps. Spinster (talk) 19:05, 29 July 2016 (UTC)
Excellent! Can't wait for the next steps :-) Raymond 20:02, 29 July 2016 (UTC)
Belated congratulations to everyone who made this possible, this is going to be real big :) --DarTar (talk) 18:43, 30 July 2016 (UTC)
@Lydia Pintscher (WMDE): So you plan to replace the filedescription pages completely? What happens with custom licenses, costume user templates, costume information templates etc.? Will the content be stored here on commons or on wikidata? --Steinsplitter (talk) 16:13, 31 July 2016 (UTC)
If I understand it correctly, the structured data will be just another section on the file description page, similar to EXIF data. At least I hope that is the case. In that case we could use that data in our trusted, old Information template and we could have a default {{Structured License}} template that uses the license information from the structured data. So an override with a custom template would still possible. --Sebari (talk) 16:32, 31 July 2016 (UTC)
I don't have all the answers yet. What I do know is: The data will be stored here on Commons. The data should be in the file description page. The existing stuff in the file description page needs to live on at the very least for the (probably long) time of migration. --Lydia Pintscher (WMDE) (talk) 17:10, 31 July 2016 (UTC)
This all sounds great @Lydia Pintscher (WMDE): ! There is tons of interest for this, and as always let me know how we can reach out and help. Astinson (WMF) (talk) 14:28, 3 August 2016 (UTC)
Extremely enthusiastic, and curious about how to proceed. In which ways could we take part in the development or experimenting? Best, Susannaanas (talk) 10:25, 4 August 2016 (UTC)
The best thing is to play with what is there already and see if that is going in the right direction and if you can model what you want to model. I'm also always grateful for weird edgecases that we might overlook. --Lydia Pintscher (WMDE) (talk) 13:17, 4 August 2016 (UTC)
License and copyright fields can be tricky. What would you do with an item in the public domain, since public domain isn't a license? Can you specify the license / public domain status / Freedom of Panorama status of an object depicted in an image, as well as a license for (say) a photograph itself? Can you deal with compound license tags like File:A Safe Escort (18850019652).jpg, which uses {{Licensed-PD-Art|PD-old-auto-1923|cc-by-sa-2.0|deathyear=1927|attribution=[https://www.flickr.com/people/31363949@N02 Leonard Bentley]}}? --ghouston (talk) 01:44, 5 August 2016 (UTC)
We have done some modelling based on difficult cases we could find together with Stephen from WMF's legal team and they all seem possible to solve with the means we have in Wikidata. But feel free to do some modeling yourself on the test system. You can create items and properties there as you want. I created a license property but it could be called differently for example. It can also have different values and it can have qualifiers to give additional information. --Lydia Pintscher (WMDE) (talk) 13:39, 5 August 2016 (UTC)
I think it will be possible, but will require the use of several fields in some cases, as well as a template that examines which fields are present and gives the information for each license and how they interact. That template could be quite complex. --ghouston (talk) 03:22, 6 August 2016 (UTC)
Great news, my (late) congratulations! Thanks to Blue Rasberry for the explanation. In general, there are a lot of things to fix with regard to Commons. But it is a very good feeling to see progress in this difficult and sensitive core of the wiki. Ziko (talk) 23:21, 5 August 2016 (UTC)
Neat! --MZMcBride (talk) 23:33, 13 August 2016 (UTC)
Wikidata now has a class of properties for describing media items (https://www.wikidata.org/wiki/Q28464773). This would eventually include all properties that describe Commons entities, such as "photographer", "date taken", "depicts", and so on. Some properties already exist and are members of the class Q28464773. Runner1928 (talk) 22:11, 24 March 2017 (UTC)
Looks good, but I don't really have a clear idea of how that gets used on Commons. Would someone be prepared to set up a fully Wikidata-linked image as an example, at least to the extent to which that is possible so far? MichaelMaggs (talk) 07:39, 25 March 2017 (UTC)

Grant to fund work on Structured Data on Commons

Hi all, the WMF and WMDE just announced funding for work on Structured data on Commons via a grant from the en:Sloan Foundation. You can find the announcement at the Wikimedia blog. More information about the grant is at Commons:Structured data/Sloan Grant. If you have questions, please join us here, Astinson (WMF) (talk) 20:25, 9 January 2017 (UTC)

As that's a self-link to this page, I'm presuming that questions should be added below. --MichaelMaggs (talk) 08:20, 10 January 2017 (UTC)
@MichaelMaggs: Fixed right: I was using a similar message for different pages. Astinson (WMF) (talk) 00:26, 11 January 2017 (UTC)

Pages update

Could you update pages in this cluster as you link here from everywhere and you mention Sloans Foundation grant here. There are new informations and old/obsolete information mixing in this cluster.--Juandev (talk) 12:32, 22 February 2017 (UTC)

@Juandev: Sorry for not responding sooner to this: we were waiting to bring the rest of the team on-board, to ensure that the workplan was a shared understanding among all the teams involved. User:SandraF (WMF) plans to work on overhauling these pages to better reflect that process. The goal is to have more published within the next few months, which better reflects our plan. Astinson (WMF) (talk) 15:22, 5 July 2017 (UTC)

CLs are hiring!

Hey everyone, apologies for the cross-posting, we're just too excited!

We're looking for a new member for our team [0], who'll dive right away in this promising Structured Data project. Is our future colleague hiding among the tech ambassadors, translators, GLAM people, community members we usually work with? We look forward to finding out soon. So please, check the full job description [1], apply, or tell anyone who you think may be a good fit. For any questions, please contact me personally (not here). Thanks! --Elitre (WMF) (talk) 12:07, 10 March 2017 (UTC)

Product is also hiring!

Additional cross posting, additional excitement. The Wikimedia Foundation is hiring a product manager for multimedia support/features and, most immediately, the structured data on commons project. We think it would be tremendously helpful if that person was already a member of the Wikimedia movement. 

If you have product management experience and are interested in learning more, please see the job description and apply here.  Please copy this notice anywhere else you think interested community members might see it.  

Thanks in advance, Jkatz (WMF) (talk) 02:04, 11 March 2017 (UTC)

Wikicentric and universal properties

On wikidata, I left a suggestion months ago to clearly separate (graphically or spatially) the properties related to wiki-related issues to the item itself (for example "monument of wikilovesmonument Italy".

On commons this aspect is mainly related to the categories such as "file uploaded by X" and "file uploaded during activity Y" (WLM, edit-a-thon ect..). They are very important for wikimetrics and monitoring of community activities and similar, but they should not be mixed up IMHO with the standard properties and information of the files. Also, we show some confusion about the categories of wiki events and so on, such a mess of different naming and categorization that we can hardly fix it manually anymore. Similarly, the information about SUL user and related categories of living wikipedian and wikimedians, should be carefully inserted in the category scaffolding when a category about one of those people is created.

Please, when you are creating the scaffolding of your system, try at a certain point to find a right balance between wikicentric properties and information and "universal ones". It could be one of the tiny details where you can actually pave the way for further improvement on wikidata.--Alexmar983 (talk) 09:03, 15 April 2017 (UTC)

I was asked to find the link. Here it is--Alexmar983 (talk) 02:52, 5 July 2017 (UTC)
Also User:Quiddity here is another link d:Wikidata:Project_chat/Archive/2016/06#Wiki-related_properties. Other users were User:Ymblanter, User:Jura1 and User:Innocent bystander.
Personally, I still think that separating what exists as a property because of the wiki ecosystem (platforms, chapters and user groups, SULs) and what exists because of the file itself (and could be used similarly on totally new archives of file that organize metadata) is important. It can be a choice in the software architecture, in the display in the page (position or color background), but they are different class of concept.--Alexmar983 (talk) 03:00, 5 July 2017 (UTC)
There are Commons-centric properties and Wikinews etc-centric properties on Wikidata, yes. But those are not the only user-created wikis there. OSM and IMDb are other "wikis" with special designed properties. Should they be treated in the same way? And wikis who use flagged revision in an alien language, like dewiki, should those be handled differently than the Libris-catalogue, who I easily can change by an email to the Royal Library? I think I have done more such changes to Libris than I have done to Wikinews and definitely to de.wikipedia. I think you have to be specific to why and where you want to draw the line. That the commons-category-property has survived, is because that there is no simple 1:1-relation between WP-articles and Commons-categories. It was sad that Wikidata was implemented in the way it was. It should never have been introduced as a problem-solver to the Interwiki. -- Innocent bystander (talk) 07:32, 5 July 2017 (UTC)
Of course you have to be specific, that's what a discussion should be about, drawing a line based on previous experience. Now, my experience with newbies and content-related users it's that some people just want "work to do on content" and don't care about the hidden wiki levels. For example, you have hidden categories here displayed with small fonts, they don't look at them and they don't care if they exist. My other experience is that for other users such categories are related to maintenance and they are delicate and if you just add them here and there instead of thinking of a good architecture based on clear principles (chapter and organization-related, user-related, related to the work on the file), you might create more easily some back log. For example... "photos from Wiki Loves something" that are not properly reviewed after years. Photos from edit-a-thon or meetings that are not marked are such, which makes more difficult for a chapter volunteer to monitor them (e.g. some copyright tag from unexperienced users). More people could actually review the back log but only if you provide with the right working environment. here you have the information you need to make the file useful or verified (add the date, add the description, add the category about the content), there the information that help you manage the wikirelated work you want or need to do. Which file needs to be cut? Which files were uploaded by that user that have the highest percentage of erased files so far? How many files were uploaded because of a WIR program and how can I get a list of them with a simple query? Can we suggest standardized description for most common modification for upload (e. g. "rotated pictures, contrast modified...")? Which media file is still unused on other projects from those uploaded in the year 200X?
There are many possible scenarios along this direction, and it is only a partially explored land. If we put some structure, than it should be also on this aspect trying to converge all previous attempt of wiki-related information in a solid infrastructure. If it is well planned, we can make it more and more refined without being too chaotic. But If someone else in his or her experience thinks that a "big cauldron" approach is inevitable or not so bad or not worth the cost/benefit so far, fair enough. We just add more and more new properties (or whatever the structure will be based on) when we need them for some wiki-related activity and wait them to find their own hierarchy with a bottom-up approach.--Alexmar983 (talk) 08:44, 5 July 2017 (UTC)
@Alexmar983: Sorry for missing the earlier comment here. At first, the early adoption plan will be to use existing properties and data structures from Wikidata so as not to recreate existing structure, and allow for cross-wiki querying, etc. What the community will need to figure out through the course of the project, is how to either a) adapt those properties effectively for this new environment or b) how to make sure that the Commons can have additional strength of data beyond what has been anticipated by those earlier structures. This will all require community discussion, something that we are beginning to outline with User:SFauconnier (WMF). Astinson (WMF) (talk) 15:18, 5 July 2017 (UTC)

Linked data in the watchlist?

Yesterday there was a big change in how my watchlist functions. Previously, items appeared on my watchlist when their wikicode changed (e.g., description, categories, geolocation, and the like). I think items also appeared when they were moved, deleted, or other file-level actions. This makes sense:

  • June 29, 2017 06:58 File:American Swedish Institute Turnblad Mansion.jpg‎ (diff | hist) . . (-64)‎ . . Martin Urbanec (talk | contribs) (Cat-a-lot: Removing from Category:Uploaded with Mobile/Android (Jul 2016 - Jun 2017))

But now I see watchlist changes appearing when there are changes to Wikidata items linked to Commons files. For instance:

  • June 29, 2017 D 02:43 File:Fotothek df roe-neg 0006508 010 Besucher auf der Herbstmesse 1953, im Hintergrund der Pavillon des VEB Carl Zeiss.jpg‎‎ (3 changes | history) . . (0)‎ . . [Andrei Stroe‎ (3×)]
    • m D 02:43 Q1731 (diff | hist) . . Andrei Stroe (talk | contribs) (‎Created claim: Property:P17: Q7318; ‎Created claim: Property:P17: Q55300; ‎Created claim: Property:P17: Q16957)
    • m D 02:41 Q1731 (diff | hist) . . Andrei Stroe (talk | contribs) (‎Created claim: Property:P17: Q156199; ‎Changed claim: Property:P17: Q693562; ‎Created claim: Property:P17: Q153015; ‎Created claim: Property:P17: Q43287; /* wbsetcla...)
    • m D 02:37 Q1731 (diff | hist) . . Andrei Stroe (talk | contribs) (‎Changed claim: Property:P17: Q183; ‎Created claim: Property:P17: Q693562)

I don't have anything in my common.js; I didn't enable ExpandedWatchlist in my preferences. I do have Wdsearch turned on, but its Preferences line item says it only affects search results (not the watchlist). So now I have lots of questions:

  • why did the watchlist functionality change?
  • Wikidata is now an option in the Watchlist options, but it's unchecked by default. Can we set those checkboxes to user-specific defaults?
  • how does Commons know which Wikidata items to place into the watchlist?

In short, this was a surprising change and I must have missed documentation about it. I like the concept, but I have to say that some of the Wikidata items linked from the Commons watchlist don't seem like reasonable matches. E.g., File:Beaver Creek concrete bridge. Black Hills of South Dakota - NARA - 283680.jpg links to Q61 (Washington, DC). How does that help a user who's watching that image? Best, Runner1928 (talk) 18:39, 29 June 2017 (UTC)

Crossposting to Commons:Village Pump. Runner1928 (talk) 21:22, 30 June 2017 (UTC)
FYI, this has been discussed: on enwiki Astinson (WMF) (talk) 16:13, 5 July 2017 (UTC)

Welcoming Sandra Fauconnier, our new Structured Data community liaison

The Technical Collaboration team is very happy to welcome Sandra Fauconnier, our new community liaison focusing on the Structured Data program. Sandra will support the collaboration between the communities (Commons, Wikidata, GLAM…)  and the product development teams involved at the Wikimedia Foundation and Wikimedia Germany. The plan is to improve Wikimedia Commons allowing users to better view, translate, search, edit, curate and use media files. To achieve that, the Commons backend will be migrated to Wikibase, the same technology used for Wikidata. Many other features and pieces are part of this plan. In the near future, as the first prototypes and tests start to emerge, Sandra will also drive the engagement with new individual content contributors, existing and new GLAM organizations, and developers interested in exploring the possibilities of the new platform. You can find more details here.--Qgil-WMF (talk) 11:52, 3 July 2017 (UTC)

Thanks for introducing me! Everyone: do get in touch with me with requests, questions, compliments, tips, wishes and other types of feedback. SandraF (WMF) (talk) 07:04, 12 July 2017 (UTC)

New step towards structured data for Commons is now available: federation

Hello all,

As you may know, WMF, WMDE and volunteers are working together on the structured data for Commons project. We’re currently working on a lot of technical groundwork for this project. One big part of that is allowing the use of Wikidata’s items and properties to describe media files on Commons. We call this feature federation. We have now developed the necessary code for it and you can try it out on a test system and give feedback.

We have one test wiki that represents Commons (http://structured-commons.wmflabs.org) and another one simulating Wikidata (http://federated-wikidata.wmflabs.org). You can see an example where the statements use items and properties from the faked Wikidata. Feel free to try it by adding statements to to some of the files on the test system. (You might need to create some items on http://federated-wikidata.wmflabs.org if they don’t exist yet. We have created a few for testing.)

If you have any questions or concern, please let us know. Thanks, Lea Lacroix (WMDE) (talk) 13:02, 6 July 2017 (UTC)

Hi, I uploaded a file for a test. There are a lot of things to set up or copy over there to be able to have a realistic test environment (templates, Javascript, interwikis, etc.). In Wikidata, the system proposes properties suitable for an item. That's most useful. It doesn't work there now. Regards, Yann (talk) 13:28, 6 July 2017 (UTC)
  • Thanks for the demo. It does show an aspect of the design that's interesting and potentially problematic. The Commons structured data consists of MediaInfo items, such as M13, which are attached to a particular file. All of the item properties are in Wikidata, such as Q15. This only works as long as the items are "notable" for Wikidata purposes. We are assuming in that example file that the photographer is notable. In the final system, it's proposed that the MediaInfo fields such as "Photographer" will hold either a link to wikidata, or text strings etc., in the case of non-notable people (and presumably likewise if extended to topics such as buildings, devices etc.) This will cause some complexity, both for interpreting the values by software, and for maintenance. When entering data, users will presumably be expected to find an appropriate match on Wikidata, and if not found, enter the data directly instead. --ghouston (talk) 00:19, 7 July 2017 (UTC)
  • The fields for people, photographers etc., will be a special case handled with a new field type T127929. This will hold a string for their attribution "name" and a "smart URI" that will be either a link to a wikidata item or a link to some other place such as their Commons user page or Flickr user page. It will be the link that's used to identify people uniquely (which should work alright for Wikidata items, but will likely have problems in some other cases). Possible maintenance issues would arise, if for example somebody had created a Wikidata item for themselves, uploaded a few thousand files to Commons over a few years, then their Wikidata item is deleted as non-notable. We'd be left with bad links. Alternatively, we may have files for a photographer who initially isn't notable, but then somebody creates a Wikipedia entry for them so we'd want to switch all their links to Wikidata. --ghouston (talk) 00:41, 7 July 2017 (UTC)
Coordinates (p5) should be separated in coordinates of the displayed scene/object and coordinates of the camera.--Alexmar983 (talk) 05:54, 7 July 2017 (UTC)
Yeah those properties were just created as simple examples. At the end of the day the editors on Commons and Wikidata will be in charge of the properties and how to use them. I realize there is more complexity to it than the example can cover. --Lydia Pintscher (WMDE) (talk) 08:45, 7 July 2017 (UTC)
I mainly see 2D pictures.
i hope you can put soon more examples with different media. One of the problem of commons is how it is difficult for example to look for a movie to watch or a song to hear in the free time (I do educational stuff but it would be a good hook for my friends, a link to the database of old movie to watch). They are scattered everywhere, and even if there is some categorization all the categories contains everything and are never precise.
More in general, amongst the type of media to test besides images form different suorces I would suggest:
a document, that is something that is not at the wikisource level but close to that such as a wall plate or a letter, a case when a possible property is its text;
a silent movie, including the information of the text cards;
a WMF video, with dubbing in many languages (and how the files relate);
a derivative file of some type, with the information of the derivation. For example "nuance corrected" or "cut border by x %", such information should also be encoded in some type of standardized property;
a ppt presentation or similar (software with a version);
Whatever should be presented to the public, should have tested all those previous examples and created a truly general architecture, IMHO.--Alexmar983 (talk) 08:30, 7 July 2017 (UTC)
Yeah we need to cover all the other file types as well. What we have so far is independent of the file type however as it is just the really basic groundwork. Once things get more interesting we absolutely need to make sure this works not just for images. --Lydia Pintscher (WMDE) (talk) 08:45, 7 July 2017 (UTC)
Sure, i was just listing the type of examples I would need to show to the people I know in the future. IMHO, if you want content-related users to really embrace this idea and understand from their perspective (that is not very wikidata-oriented) one of the reason it is necessary and why it can change the future workflow, you should show a set with more than photos, a bunch of file that even with a small fraction of future properties can show how adaptive the storage can become in the future. Most of the issues with the volunteers with whom i interact is mainly from such type of media. they feel that the infrastructure for archiving of simple photos here is a little bit mismanaged but still not so bad (currently, it simply absorb a lot of manpower to make it works), while it's the rest that really looks below its potential.--Alexmar983 (talk) 11:10, 7 July 2017 (UTC)
To complicate things, we also now have the rather new Data: namespace that holds tabular data as well as map data. Technically, these are not uploaded files but data stored in plain text in a specialized wiki page (as far as I understand it), but they need similar Metadata. If we introduce a radically new system like StructuredData, it would probably make sense to think about how to include these things there. Both .tab and .map are still in a relatively early stage of development (they don't even support Categories at the moment), so I guess it would make sense for the different teams involved to get together as soon as possible. --El Grafo (talk) 13:18, 7 July 2017 (UTC)
I did not know that. Talking about specific types of file, I am also trying to follow the OSM-wiki interaction (OSM is big thing in Italy) and now apparently we have a lot of pictures created by OSM maps too. If "structured" some days, it would be nice to include map information (version of OSM, from coordinates X to Y, on date Z).
For me in any case the point is: for users focusing on the most common files (png, jpg, etc), automation of processes is probably something more important than structured data to simplify their life. Of course the aspects are related but they don't get it immediately, so if you want to really sell the "structured" part to some of them, you should try to highlight the future role in speeding up maintenance work and that's why I have suggested, see above, to show that you think of some architecture for properties more related to the "wiki" workflow.
At the same time, you can also focus to the minority of motivated content-oriented users dedicated to specific aspects and who are always looking for a friendly "ecosystem" to explore their potential, that is people who wants better way to store files with data such as memorials, maps, charts, dub files etc...
After the preliminary tests, the examples of future phases should try to communicate to at least one of those two groups. they are the most probable frontrunners when things get real.--Alexmar983 (talk) 15:39, 7 July 2017 (UTC)
@Alexmar983: We agree on the importance of the existing workflows and simplifying that work and/or making them more dynamic. Wikimedia Deutschland has already, and our plan is to start at supporting functionalities that improve existing workflows and needs of the community: existing research done by Wikimedia Deutschland. Additionally, we are going to propose some additional research for that stakeholder group: the folks who curate Commons are the first community that we want to ensure gets support. Right now, there are a lot of different tools and scripts that fill in particularly vital organization and maintenance activities, which are hard to learn for new contributors to become regular maintainers.
As for content-oriented communities, right now we have identified GLAMS and Wiki Loves Monuments as first major programmatic communities. Are there others we should be paying attention to? Astinson (WMF) (talk) 14:32, 11 July 2017 (UTC)
(conflict of edition) I think that is quite enough to start with, you are doing fine. I can think of a bunch of minor users and their wish. I can contact them one by one and find some subgroups they are part of and some links. Without thinking too much, I suggest you to contact also wikisource users. As I said in an email on the WMF international ML, the problem is that the limit between a "book" and a small document is arbitrary. As a result, many structured description of tombstones or epitaphs or one-page document are missing on wikisource (for language like Latin, that's not a correct corpus IMHO) but they could be considered documents. that's a gap and there are users interested in that gap. Similarly, I can think of the m:Wikimedia genealogy project. Basically they wanted a good geneaology site, but in the end a good genealogy project means structured information, that's in the end also a very refined architecture of properties that describe the content of photos and scansions of old documents. I can also contact users who are expert of very old movies, it is time to have a real library. So I would ask to all local projects related to creative contents such as music and filmography.--Alexmar983 (talk) 15:48, 11 July 2017 (UTC)
@Astinson (WMF): That's an interesting document, thanks for linking it. --El Grafo (talk) 15:21, 11 July 2017 (UTC) (Ironically, I never would've found it otherwise, since it wasn't in Category:Structured data) :-P

Welcome Amanda Bittaker as program Manager for Structured Commons

I’m excited to let you all know that Amanda Bittaker (User:Abittaker (WMF)) is joining the Audiences (formerly Product) team at the Wikimedia Foundation as the Program Manager for the Structured Data on Commons program. She will be working closely with teams from the Wikimedia Foundation, Wikimedia Deutschland, and the communities to complete the Alfred P. Sloan Foundation grant, expanding the capabilities of Commons to make it easier for people and institutions to find, share, and reuse Commons content. You can learn more about Amanda in in the post on Commons-l. Astinson (WMF) (talk) 17:08, 11 July 2017 (UTC)

Template:Category definition that only takes a Q-item

Hello,

doing quite a lot a photography of museum exhibit objects, I used Template:Category definition quite a lot. When I started taking an interest in Wikidata and describing the objects there too, I noticed that information was being duplicated. So I drafted User:Rama/Catdef, which uses a Lua script to provide an equivalent to Template:Category definition that only takes a Q-item as an argument and fills all the rest using the information available on Wikidata.

Examples of categories:

If anybody has remarks or ideas for improvements, I would be delighted to hear them. And ultimately, if the idea seems a good one, pass this on for proper implementation.

Cheers and good continuation! Rama (talk) 19:28, 2 August 2017 (UTC)

Excellent, very useful for my photos of GLAM objects too. Raymond 21:20, 2 August 2017 (UTC)

Re-organisation of / updates to these info pages

Hi everyone! After Wikimania I expect that I will spend some time updating and improving the info pages about Structured Data on Commons. Who would like to help? And which information should definitely (not) be in there?

Here's a first (draft) checklist of information that I think would be useful to have.

  • About the project
    • What is Structured Data on Commons?
    • Why is it important? Why do we care?
    • For whom do we do this (the stakeholders)?
    • Frequently Asked Questions (FAQ)
    • A glossary. What do all these strange words mean?
  • The communities
    • Sign up and stay up to date: newsletter subscription and archives + other information channels
    • A task force of volunteers who support Structured Data on Commons
    • Feedback area
    • How to contribute: current tasks
  • Team
  • Planning
  • ... ?

All comments and feedback welcome. Greetings! SandraF (WMF) (talk) 15:07, 17 July 2017 (UTC)

  • Stakeholders! Certainly the proper term to use, in context. This is a project and the community members have a vested interest in the outcome. Sensible enough.
Greetings, SandraF! A lot of this information is present, but not all of it is here in the portal, I think. I know that there are dormant systems in place for the newsletter, the potential for various task forces once their goals have been identified (which ties into how to contribute). I'm looking forward to reviving all this old work and putting it to good use. Glad to see you working in this area. Keegan (WMF) (talk) 22:26, 17 July 2017 (UTC)
  • Sure, Tuvalkin, I'm not a big fan of jargon overall either. Sometimes, telling people my job title makes me shudder a little on the inside :) However, I guess more to my point, what would be a better term - understood by all parties - to encompass everyone involved in this project? We have the Commons community-at-large, individual Commons contributors, bot operaters, GLAM-wiki community members, GLAMS as an institutional whole, GLAMs as individual institutions, Wikimedia chapters and affiliates, Wikimedia Deutschland tech, Wikimedia Foundation tech, etc. We've got to use a word for all of these partners. Perhaps parters would be a better substitute, though it doesn't capture everything in the meaning. Do you have a suggestion? Whatever word we use, we're going to be using it a lot, and for a long time. I certainly don't want you annoyed by the word "stakeholders" for years :) Keegan (WMF) (talk) 00:29, 18 July 2017 (UTC)
  • The description of the Commons community-at-large above reminds me of how several otherwise separated Populations of a species can be considered to be part of a larger Metapopulation when there is some kind of interaction at a higher level of organization. So personally, I'd go for Commons Metacommunity – does that sound geeky enough for everyone? :-p --El Grafo (talk) 07:44, 18 July 2017 (UTC)
  • Another kind-of-synonym would be allies but that has a very external feel to it. As Keegan describes, it's about the Commons community AND many other people and groups who (want to) work with Commons. I kind of like metacommunity but it's quite geeky indeed, and maybe hard to translate (but the same might be true for stakeholders). SandraF (WMF) (talk) 10:43, 18 July 2017 (UTC)
  • On second thought, maybe something like StructuredData metacommunity would be more appropriate, since there's a) more to SD than Commons and b) more to commons than SD. --El Grafo (talk) 15:24, 18 July 2017 (UTC)
  • (Edit conflict) I’m loving "metacommunity" — geekspeak is the way to go because that’s geeks who both create and enjoy free knowledge (almost by definition). The word "stakeholders" is divisive and utilitarist, and reeks of biz-speak (reminds of shareholders!), and its translation, as said, might be problematic. The translation of "metacommunity", on the other hand, should be straightforward for languages that can routinely assimilate or calque faux Greek/Latin words ("μετα" + "COMMVNITAS"). -- Tuválkin 11:47, 18 July 2017 (UTC)
Metacommunity sounds like an obscure word creation intended to divide our community into ("better"?) users interested in meta affairs and "average" ("less important"?) users. Please use established and widely understood terminology, i.e. stakeholders. As a native speaker of German I believe that the term stakeholder is actually used quite frequently even in non-business German.    FDMS  4    12:44, 22 August 2017 (UTC)
A stakeholder seems more appropriate. We have angry mobs carrying stakes who enjoy building stakes for the WMF to burn on. :D Or alternatively. "actors", cause we all are fake. Or collaborators, since clearly since last week nazis are a thing again.. Time to move on to the next bikeshed yet ? I'd love deep marine blue... —TheDJ (talkcontribs) 12:00, 22 August 2017 (UTC)

An example from real life

I leave this comment here, maybe it can be useful.

I found this file. I would like to find the female equivalent one and it is quite time-consuming. I hope that on the long term with structured such task become somehow simpler.

That's one of the many things I expect from structured commons on the long term, a shortcut for this type of actions.--Alexmar983 (talk) 05:18, 5 September 2017 (UTC)

  • I found it in a few seconds with a single click: Category:PE&D Infographic Icon set, referenced on this filepage (grownups have the cat list at the top of file pages) and including the requested akin file as one of its 28 content members. Categories, unlike Wikidata, are part of real life, not vaporware. If you want to find examples to justify the apalling budget deferences between the two, you need to try harder. (Or just don’t justify — nobody’s expecting it, anyway.) -- Tuválkin 16:19, 5 September 2017 (UTC)
Tuválkin, sorry if I ping you, if you are a person that does like to ping because he does not like to be pinged, let me know.
You mean the same category I clicked on and than I found the only female icon inside which is not in the same "style" of this one (the proportion of the head and the arm, for example). I would never use them as a couple on a toilet sign, for example, but I have high standard in graphics usually. So this is not what I was actually looking for, and than, since the string name was different and I could not be sure if that was the only "counterpart", I spent 5 minutes looking for a better female equivalent in all possible subcategories, and than for a file with a woman and a men together (like this one I've found later) until I decided to recreate it myself again from zero because I had already enough shortcuts for my work and was no big deal.
In any case I though, if the properties (color, format, content) were more organized I could have told easily "show me all the black and white svg file in this category with this content (by the same author)" for example, well instead of five minutes it would have been 30 seconds and more effective that opening parent categories just to be sure. That if I could have mastered such query, which I kinda know how to do, because they are like the one on wikidata, something I suggest you to discover them if you don't know them. They are easy, I show and teach them to newbies and they learn fast, as soon as I can show one of their many use. And there are so many uses...
That's what I was trying to express, that sometimes there are special relationship between files due to file format, style, content and so on that we need to be sure to manage precisely. I just liked it because it was particularly simple to embody the concept, that's it, so I told myself, why not share it? After so many years I still feel this way.
Actually, I also wrote a more detailed explanation about that (and other example such as map of specific style), but than I removed and I sticked to the core idea knowing that some newbie would have been dissatisfied even looking for the not perfect match.
That's the style I use to communicate with newbies, that I prefer to keep because when I don't sound "smart" to my "full potential", I connect better to them and as a side-effect I can also evaluate people from their reactions, like I am doing with you right now.
Also, next time I teach to newbies how to use categories I won't mind to say "they are not vaporware", I like such expression very much... unfortunately, most of newbies accept concept like wikidata AND commons categories at the same time. They are so different from you, I guess. That's the problem with your attitude Tuvalin, user demographics is against it. If you want to prove some point you should try harder, not tell people not to try on their own. They might ignore you. I am probably ignoring you, sorry.--Alexmar983 (talk) 11:12, 6 September 2017 (UTC)
  • Aww, cute. Never mind: I’m ignoring you, too. Meanwhile Wikidata is still newbie-unfriendly (Q-number wha…?) vapourware (does nothing except consume huge amounts of funding). Keep having fun with what-if features, I’m back to work in actual media curation using tools that work. -- Tuválkin 23:42, 6 September 2017 (UTC)
Tuválkin I work with real stuff, you know? There are currently 30 active users who uses wikidata to find missing images in their area, the one you find really in front of you here... there are properties that help to organize WikiLovesMonument lists that people use to locate sites and upload files during this month, and so on. Structured data already have influence on commons. Not only in perception of what could be, real influence. Of course I can't argue with some sort of faith with logic. Newbies have no prejudice so they learn wikidata, they learn commons and than they feel that something structured on commons is fine. Plus I don't get the problem of funding, expert commons users could (should) have asked for money to implement structured commons architecture years ago, simply they missed the train.--Alexmar983 (talk) 02:31, 7 September 2017 (UTC)
No, Tuválkin I said I would ignore your advice, not you as a whole. I'd never do that unless extreme cases, it's not wiki. --Alexmar983 (talk) 02:50, 7 September 2017 (UTC)
BTW I notice only now you said you were actually ignoring me... and you though that in that general way as a reaction of what you were assuming about my words, so why did you reply? Sometimes I feel that we don't need more grownups solutions, we need more grownups in general. I usually have discussion on wiki that makes me remind when I interact with young kids as a volunteer.--Alexmar983 (talk) 13:27, 7 September 2017 (UTC)
Can we please keep this collegial ? —TheDJ (talkcontribs) 11:36, 14 September 2017 (UTC)
Exactly. Not "I'm ignoring you" except in great necessity. "Thanks for the suggestion, which clearly has certain advantages, but I still prefer the advantages of my earlier plan" uses a great many more words, but is probably more accurate and surely far less irritating. Jim.henderson (talk) 14:28, 17 September 2017 (UTC)
He ignored what I said, since the first reply. Mine was "vaporware", his vision was "real". If you want a productive environment go to the source of the disfunctionalities, so one step above in this case. I don't see you commenting his behaviour, so it's also up to people like you if user like me are alone to face this sort of situation. And it's not their fault as you would like to imply, it's mostly of people like you. Next time try to teach him something as well, he really needs a little bit more. More than I do.--Alexmar983 (talk) 21:03, 17 May 2020 (UTC)

Presentation adds little

User:SandraF_(WMF) added a link to her Wikimania presentation.[1] Following the link leads to a PDF file of slides for the talk. The slides have little content about structured data. The slides do not include what was said at the presentation; the viewer is left to guess. Slides inviting people to breakout sessions carry no content for this page. Were the sessions good, or were they a bust? The slide about "Easier to Search" is too busy and does not show what is going on. The slides about "Reusable" and "Structured" do not convey anything. Commons is already reusable. "Templates" existed on Commons long before the structured data proposal. What does a birthday cake have to do with anything. The two slides on timelines have some content, but they are two slides out of many. In adition, timelines are discussed on the overview subpage and the proposal. The solicitations for help are already a tabbed subpage. We don't need a slide pointing back to this page.

I reverted stating the presentation was too sparse.[2]

User:Yann reverted me without giving a reason.[3]

The presentation adds very little to this page and should be reverted. The overview subpage gives a better notion of what is going on; readers should go there rather than a slide show.

Glrx (talk) 17:35, 3 October 2017 (UTC)

Glrx, you didn't give a reason either for removing Sandra's post. Sorry, but you removing looked like vandalism to me. Did you ask here first? Regards, Yann (talk) 21:31, 3 October 2017 (UTC)
Yes, I did give a reason in the edit line: "presentation is too sparse". I explained the sparseness above.
No, I didn't ask here first. Since when is it required to ask before removing en:WP:COI material that says nothing? en.WP has en:WP:BRD that puts the burden on those trying to add such material.
When did Commons blow off some notion of en:WP:AGF? Assume it is vandalism without thinking about it? You haven't commented on her slide show; does that mean you haven't bothered to look at it yet?
Glrx (talk) 00:58, 4 October 2017 (UTC)
I think your feedback is very valid, Glrx - the presentation slides are indeed very minimal for those who didn't see it live. I removed it; images from the presentation will serve better as illustrations for a general (written) project intro. SandraF (WMF) (talk) 09:03, 5 October 2017 (UTC)
Thank you. FWIW, I know you have a difficult task. Glrx (talk) 14:53, 5 October 2017 (UTC)

Differentiating community engagement from "stakeholder" engagement

For those advocating for Wikidata, please keep in mind that making sweeping changes to Wikimedia Commons without attempting to first gain a local consensus is not going to go happily. In September 2017 there were planned to be a number of activities to gather feedback and information, yet as the most active uploader of GLAM related images I have not been invited to provide any feedback or comment. The only change I've experienced is Jarekt mass removing geo-coordinates from Commons category templates without any pre-existing consensus, and as a result being both ignored and then attacked for insisting that a pre-existing consensus is necessary per our community agreed policies.

Scanning through the descriptions of the structured data project, the only mention of community issues is to describe them as "politics" which need to be "managed". Based on these words, the plan appears to be setting the project up to either steamroller over the local community to achieve its funded objectives, or to fund a political campaign to demolish perceived (unfunded) opposition, rather than working with the widest possible community to gain support.

In the project documents, an analysis of previously raised issues seem absent. Such as the fact that deleting metadata from Wikimedia Commons and migrating as much as possible to Wikidata not only permanently removes the metadata from local searches of wikitext, but introduces a change to the default copyright release of contributions to Commons from CC-BY-SA to CC-0. Legally a change of copyright will not be possible without new releases of any metadata that is created using judgement rather than simple measurement. This means that descriptive text, transcriptions, translations, estimated dates, geolocation based on judgement, etc. should not be mass migrated to Wikidata without detailed copyright assessment, and in some cases may remain impossible to migrate without ignoring the right of attribution and creating a future legal risk for the project.

Today there have been several emails ("Survey for GLAMs about batch uploads to Wikimedia Commons") encouraging GLAM professionals to complete a survey for their experiences and opinions. The largest number of GLAM uploads from the widest number of sources can be found at User:Fæ/Project_list, only a handful of those projects were completed within a partnership with a GLAM institution, and all relied on an unpaid non-professional to do the uploads and choose the technical methods to do it. The survey will skip feedback from those years of experience. I think that misses out on obvious value, but this design highlights which stakeholders are considered important to managers of the project. Based on actions so far, it's not the unpaid volunteer community.

Thanks -- (talk) 11:53, 11 October 2017 (UTC)

We should probably not mix up the community's usage of wikidata to render content from templates, with how we want to add structured data storage to Commons, which is what this page is about. While both have areas where they touch or interact, they really are quite different.
Can you explain: "but introduces a change to the default copyright release of contributions to Commons from CC-BY-SA to CC-0" ? I don't entirely see how that comes into play. If I include a reference (or a weblink as that's a more familiar concept to most) to a CC-0 Wikidata definition into Commons, how does my action of adding this reference into a CC-by-sa work, change the copyright release ? Only if you literally copy paste a copyrightable amount of information into Wikidata, that should be a problem right ? Are we planning that anywhere ? Or will we be changing the license of of Commons content to CC-0 ? —TheDJ (talkcontribs) 15:55, 11 October 2017 (UTC)
If the Wikidata record is verifiably correct as CC-zero, then using the data on Commons is not an issue. Going the other way, i.e. to create a Wikidata record based on extracts from Wikimedia Commons is the issue. A real, if convoluted, example is the work done a couple of years ago, using UK Ordnance Survey data via their API to geolocate images and deduce County names and hence categories. The resulting sets of data (the mass of identified coordinates) are a derived work from their database. Even if I as the creator of the image pages on Commons were to create or add to associated Wikidata records, the onus remains on us (Wikimedia projects) to continue to attribute the OS in line with their attribution requirement, which cannot be assured once the data is published as CC-zero. This may seem a technical point, but there is a presumption that all single "data" items cannot be copyrighted, however when dealing with masses of data handled by bots, we are talking databases where even something as granular as geo-locations can be reverse engineered by later commercial organizations to, say, draw maps of counties which are actually copyrighted by the Ordnance Survey, or to deduce maps of Post Codes which are copyright of the Royal Mail.
Your other questions do not seem directed at me. -- (talk) 16:30, 11 October 2017 (UTC)
, are you worrying about w:Sui generis database right and and that collection of data on Commons pages might be treated as a database and moving that information to another database within the wikipedia universe might be an issue. If you think that might be an legal issue than you should check with foundation lawyers who I am sure considered that before approving creation of Wikidata. Maybe someone should look at it again before creation of Structured data on Commons. You kind of lost me with the other examples. Are you saying that some geographical locations of places on Commons are copyrighted by Royal Mail or UK Ordnance Survey? I do not even know how to mark coordinates that are copyrighted by others. Do we have any templates for that? --Jarekt (talk) 17:11, 11 October 2017 (UTC)
I am surprised that you are no longer choosing to ignore me. I am not going to talk about your recent actions, you can revert them per BRD if you want me to believe that you are going to work collegiately rather than acting as if you run Commons. Should you wish to ask WMF legal, or any other lawyer, for opinions, you should do that before you take action as an individual which is easy to legally interpret as the cause of damages.
With regard to your mention of sui generis, my example has direct and specific claims of copyright as well as a database right, follow the link. There is no doubt whatsoever that in UK law, maps in any form are protected by long established copyright law without any need to start hypothetical debate about what is a database. -- (talk) 18:19, 11 October 2017 (UTC)
I am still failing to comprehend your point. Are you saying that the coordinates of cities and other locations which are stored in Commons category pages are copyrighted by some third party, because of a "long established copyright law" and need to display attributions described in the link you provided? If so we should mark them somehow. Sui generis is the only law I am aware of that can make pair of coordinates of some place copyrightable. Also I do not feel a need to check with lawyers about legality of coping data from Wikimedia projects to Wikidata. I do not believe there are any legal issues there; however based on your post, I was under impression that you needed reassurance. --Jarekt (talk) 22:04, 11 October 2017 (UTC)
Mass exports from a CC-BY-SA environment to a CC-0 database require confirmation that there are no remaining BY or SA requirements on what is being copied or moved. If you don't know whether the original source of the geodata was a database such as Ordnance Survey, you must analyse the data first. Had you been collegiate enough to gain a consensus for your actions, the benefits would include a published record of discovery and having volunteer colleagues help with analysis. The metadata on Commons may not be adequately attributed, but that is fixable as traceability remains under the indefinite CC-BY-SA license, but under CC0 there is no long term way to maintain traceability or even withdrawal, once large scale commercial reuse applies.
By the way, sui generis is not a law, it is a vague description of "other stuff". Specifically database rights are not esoteric "other stuff", they are well established under European Law and UK law. Your dismissive approach to the copyright of maps data is arrogant and risky for the project in the light that as an administrator you appear to have god-like carte blanche to make mass changes without proper consultation, or bothering with small things like checking copyright. -- (talk) 09:10, 12 October 2017 (UTC)
The data, being copied from Commons, Wikisource or Wikipedias to Wikidata is not eligible for copyright, as basic data like dates of birth or death, identifiers like VIAF, populations or locations or towns and cities, etc. is not something you can copyright. That is why people can migrate it to Wikidata from all those projects. --Jarekt (talk) 20:21, 12 October 2017 (UTC)
Hello all! Just a general heads up that indeed, community feedback procedures and input requests have not been active yet. I can imagine that you are curious and worried if things are going well... Continuous community feedback, interaction and input is definitely central to this project, we as a team are all dedicated to that, and things in this area will start moving quite soon. The reason why updates and feedback requests have been coming in slowly: we're still getting up to speed as a new team, doing mainly background work (including the GLAM research, of which the survey is a part). I myself was only working part time till 1.5 weeks ago, and I made the mistake to overestimate the amount of work I could do in that time, for which I feel guilty and I do apologize - I know there's a need for more interaction with you all! By the way, I'm finishing new draft info pages (and glossary and FAQ) that will be posted here in the upcoming weeks, that everyone can help improve, and that hopefully address some of the questions asked above. Thanks for assuming good faith :-) SandraF (WMF) (talk) 09:05, 12 October 2017 (UTC)
  • Fae's initial comments are broad and imply further issues. Maybe SD is a freight train headed for disaster. SD identified Risk 1 as high; Fae's comment is Risk 1 has being ignored so far. For something on the metadata issue, see d:Wikidata:WikiProject sum of all paintings#Metadata!, but even that section is too simple wrt copyright. Fae is correct about metadata such as "descriptive text"; a description pulled from within a museum's CC BY-SA 3.0 TIFF file of a CC0 Renaissance-era painting is fair game on Commons but not on Wikidata. The SD overview page suggests that "24 million language-identified descriptions" and 750,000 photograph descriptions can be moved to Structured Data. Yes, that's true if that SD stays on Commons with appropriate attribution, but those descriptions cannot be moved to Wikidata because they are not CC0. If I do a Structured Data Query on Commons and pull up one of those descriptions, then the attribution must come with it. Wikidata avoided the attribution problem by requiring contributions be CC0. Is SD imposing the same CC0 requirement? The problem is even deeper than attribution. Does SD gain anything from the text descriptions of paintings and photographs? They won't help with SPARQL queries such as ?item wdt:P180 wd:Q7228600. Glrx (talk) 19:20, 12 October 2017 (UTC)
Glrx SD data will remain on Commons and will not be moved to Wikidata. I do not know what license will be used for SD data. that is probably a question for User:SandraF (WMF) and the foundation legal team. Much of the file metadata like date of upload, date the photo was taken, uploader's name, license used, etc. seems like stuff not eligible for copyright. File descriptions on the other hand will need to be CC-BY-SA. So my guess is that the database will be either CC-BY-SA or mix of CC-BY-SA and CC-ZERO. We might also have different treatment of past and future file descriptions. Sandra, is that something your team thought about? --Jarekt (talk) 20:21, 12 October 2017 (UTC)
I don't think you understand Fae's DB copyvio argument. If "much of the file metadata ... seems like stuff not eligible for copyright" were true, then compilations and databases could never be copyrighted in any jurisdiction. It takes sweat of the brow to compile a database. Some jurisdictions award a copyright for that effort. That does not mean it is reasonable to take a European museum's database, load it onto Commons while acknowledging its copyright, and then say hey, this data resides in Florida, so US law applies, and we can vacuum all the facts and publish them as CC0. If the US has a treaty, then the US respects the foreign power's copyright laws. The laws may be absurd, but that is not the issue here. Glrx (talk) 22:10, 12 October 2017 (UTC)
Glrx, I am no expert on copyrights, but as for as I can deduce the only copyright-like laws related to databases are w:Sui generis database rights, and they definitely should be followed. I am sure someone with a legal background will look over current plans for SD, as I am sure someone did before foundation approved creation of Wikidata, and years-long migration of CC-BY-SA data from wikipedias to CC-ZERO Wikidata. My assumption was that stuff being migrated is not eligible for copyright and we do not worry about Sui generis laws when moving data from one end of wikimedia universe to another. However if someone understands legal issues related to Wikidata and SD, I will be interested in reading about them. --Jarekt (talk) 23:51, 12 October 2017 (UTC)
Then evaluate your position for a contradiction. If w:Sui generis database right should be followed, and such data exists on a WP under a CC BY copyright, then where is the authority that allows that data to be copied and changed to CC0? The reason cannot be that contributors did database copyvios in the past, so it is OK to continue to make new database copyvios. Glrx (talk) 00:16, 13 October 2017 (UTC)
"no database right exists in the United States", and WikiMedia is based and publishing in the United States, so it may not be an issue. --ghouston (talk) 01:27, 13 October 2017 (UTC)
Hey all, just a quick note that the team is going to look at this closer and come back with more info. It personally seems good to me that this is examined by someone with due legal experience. SandraF (WMF) (talk) 08:22, 13 October 2017 (UTC)
A credible expert opinion should be from someone independent of the WMF. No WMF legal adviser can publish an opinion which may harm the interests of the WMF. Undermining a $3m value project is a risk they are paid to avoid. -- (talk) 09:12, 13 October 2017 (UTC)
If the WMF violates copyright and can be sued for violating copyright that's not in the interest of the WMF. If you see a different problem than that, what kind of problem do you see with copyright? ChristianKl (talk) 13:58, 13 October 2017 (UTC)
This is why the WMF is not legally responsible for content, either on Commons or Wikidata. If anyone gets sued, it should be the uploader. -- (talk) 14:09, 13 October 2017 (UTC)

IRC office hour next week!

Which data/information goes where? Something to look at during the next IRC office hour

Hi everyone! I warmly welcome you to attend the next IRC office hour: next Tuesday 21 November at 18:00 UTC in #wikimedia-office webchat. Amanda, Ramsey and I will be there to talk about the work of the last months, and to explain what's coming. I'm also thinking of having a closer look at the last version of the project roadmap, and to look at the 'what data goes where?' graphic together. Any other topics that people are interested in? Bring your questions to the table next week. The chat log will be published afterwards for those who could not attend. All the best! SandraF (WMF) (talk) 08:30, 15 November 2017 (UTC)

The (slightly chaotic) log of the office hour can be read here. Thanks to everyone who participated and stay tuned for the next edition (still TBA)! SandraF (WMF) (talk) 19:49, 21 November 2017 (UTC)

User story: Our app needs a way to localize Commons categories

The "Get involved" page talks about collecting user stories, so here is mine :-)

Less than 1 in 4 persons in the world can understand basic English. So we developed an app that allows non-English speakers to upload pictures to Commons (with all of the necessary anti-selfie/copyvio countermeasures, yes). The app lets them select categories for the uploaded picture, but unfortunately now these categories are all in English, so the users do not understand them, and more importantly the search bar does not return the right things, for instance typing "España" in our app's category search bar does not make the "Spain" category appear. Even more for languages with non-Latin alphabets (only 36% of the world population use the Latin alphabet). It is not just our app, I believe all of Commons have the same problem.

So, a big expectation from a Structured Commons would be that categories have a name in several languages, not just English. The category search API should also allow for localized search.

The workaround for now is to take the appropriate label from the category's Wikidata item, for instance "विकिपीडिया:श्रेणी" from https://www.wikidata.org/wiki/Q6741108, but it only has labels for very famous things.

Cheers! Syced (talk) 02:21, 21 November 2017 (UTC)

Hi Syced! Thank you for bringing this up. Yes, translatability of metadata is what Structured Commons is all about. However, we focus our attention on structured metadata, not categories (see the FAQ on categories on some explanation why - the main general reason is that categories are a very imperfect way to structure information, and refined structured data is a long-term solution to make content on Commons more precise, machine-readable, thoroughly translatable and re-usable). So I'd suggest to describe this particular user story as follows: As an uploader of media files in the Commons Android application, I must be able to add information about my uploaded files in my own language, using multilingual terminology that makes it possible for speakers of other languages to find and understand my uploads as well. This would be closely related to user stories that we currently have on page 19, page 20 and page 26 of the current user story document. In short: we hope to solve the need for multilingual description (and findabililty) of files in a much better way than with categories.
What I have done for now: I have created a first rough Phabricator task to track the integration of structured data in the Commons Android app in the future. I will also be very happy to make sure that you and other developers involved in the app are kept up to date about the new 'clean' and multilingual structured data API, and to help integrate it into upcoming versions of the app. I can imagine we could give this some special attention during hackathons and other developer outreach programmes, too. I'd like to hear your thoughts! SandraF (WMF) (talk) 11:34, 22 November 2017 (UTC)

Tool list

I got a newsleter, where I am inivited to add and comment on the important tools (but at the moment, I cannot edit that Google Spreadsheet). And at the same time, we are invited to leave the list of tolls here on Commons in a table. Isnt this a duplication work? Will someone take care to move tools from the list on Commons, to that one in Google Spreadsheet and vice versa?--Juandev (talk) 10:08, 22 November 2017 (UTC)

Hey Juandev and others - I was a bit too careful with the spreadsheet but made it fully editable now (and made a copy just in case things go wrong). Feel free to add input there now.
  • The spreadsheet is really huge and contains more than 100 tools. In that spreadsheet I would like to indicate and choose the most important tools. Think of it as an internal document to narrow things down.
  • The tool page here on wiki then tracks the tools that we all find most important.
  • It's my task to keep an eye on this. I'll do my very best but don't hesitate to ping me if I miss something.
I hope this works? Warmly, SandraF (WMF) (talk) 11:56, 22 November 2017 (UTC)
  • @SandraF (WMF): What’s the goal of this list? What are these tools being ranked/filtered by “importance” for? -- Tuválkin 07:03, 26 November 2017 (UTC)
    Hey Tuvalkin: we want to know which tools are most crucial to experienced Commons (and Wikidata) users for doing important daily work on Commons. Then we know which most urgently need active support, so that we can help them to be adapted for structured data on Commons as quickly and smoothly as possible. I can base this on my own experience as an active tool user (I, myself, would say tools like VisualFileChange, QuickStatements and Mix'n'Match are really impactful to make structured data on Commons work well in the future) but I'd really like to base this decision on many people's input. If you have ideas on how to prioritize the tools better than in the spreadsheet, let me know! SandraF (WMF) (talk) 07:41, 27 November 2017 (UTC)
  • @SandraF (WMF): Also Cat-a-lot and HotCat, of course. As for ideas to make Commons work better, here’s one: Defund Wikidata, kill the the whole thing and burn it to the ground then use the freed monies to invest in durable, sustainable infrastructure (e.g., crazy idea: what about having a server farm in a place that’s not bound to flood in the next couple decades?). It’s not gonna happen, I know: There’s simply too much cash from donations by people who love Wikipedia, so paradoxally it “needs” to be used to destroy Wikipedia (and Wikitionary and Commons — the rest are kinda meh). But you asked, so I answered. -- Tuválkin 07:59, 27 November 2017 (UTC)
Another point of view: I am not a real Commonist, so from my point of view as a mere consumer Wikimedia Commons appears to be practically defunct in the current form, and solutions such as “add more servers” do not improve anything. The way content is managed here is simply not powerful enough to deal with the amount of files this project meanwhile has, so a fundamental technological upgrade is utterly necessary.
One might now of course question whether “structured commons” is the correct approach then. When things are changed there is always a risk that something could go wrong, or that some stakeholders are not happy with it. However, given the fact that we already have a successful Wikidata which shows the potential of the proposed solution, it really appears to be the right thing to do at Commons as well. Social components of the transition have to be considered as well of course. WMF seemingly does so by appointing a community liason (SandraF) and seeking input for important steps during the development and transition process. —MisterSynergy (talk) 08:36, 27 November 2017 (UTC)

Beta experiment or permanent change?

I am wary about the attempted change on Wikimedia Commons. Structured Commons... I am unsure how prepared most experienced editors and newcomers are on Structured Commons. However, permanent change to Structured Commons is something that we may not be ready for. Why not use Structured Commons as beta experiment (or some sort of beta) instead indefinitely until there is consensus to permanently use Structured Commons? George Ho (talk) 03:31, 26 November 2017 (UTC)

May you please elaborate for readers, Tuvalkin? Thank you. George Ho (talk) 07:07, 26 November 2017 (UTC)
  • Hm, I mean that agree with what you said above…? That this is a move that risks jeopardize the wealth of experienced users’ own experience, bringing back the community to a blank slate of 100%-newbs, with the concomitant loss of effectiveness at all levels of Commons work? -- Tuválkin 07:25, 26 November 2017 (UTC)
  • Not true, at least for one case: Geolocation data is being syphoned away to Wikidata off from category pages, with actual latitude and longitude being replaced (not just complemented) with some opaque gobbledygook that means nothing by itself. This matter being discussed elsewhere; in my opinion it needs to stop, and blanket reversion of that vandalism must be actioned. -- Tuválkin 08:04, 27 November 2017 (UTC)
@Tuvalkin: The geolocation data can be stored in Wikidata thanks to a series of templates, that is not a part of the Structured Data for Commons project (see Category:Geocoding templates). If you object to the removal of geocoordinates in favor of Wikidata, you should voice your concerns about this at Commons talk:Geocoding. --MB-one (talk) 08:40, 27 November 2017 (UTC)
@MB-one: Been there, done that, got the tee-shirt and the mug (which are de minimis). What I meant to say was that, just like geolocation data once present in the wikicode of category pages is being syphoned to Wikidata and replaced with illigeble gibberish, we can expect that such stunts will be done in/with Structured Commons. The statement that «information or functionalities» will not be taken «away from the project» is certainly sincere, but it is also likely incorrect: We’re used to the WMF messing up seriously interface features that are important for our volunteer work just because some overpaid twit needs to replace a round blue button with a square green switch in order to fluff up their resumé as UIdesigner extraordinare — we cant therefore expect/fear that a complete overhaul of Commons will ruin everything for (almost) everybody. -- Tuválkin 09:06, 27 November 2017 (UTC)
So what is your solution, i ask you ? Or do you claim that Commons is perfect and that we should never change anything about it for ever ? —TheDJ (talkcontribs) 15:48, 27 November 2017 (UTC)
To paraphrase and old saying, on Commons change is the only constant. In the early days file pages did not used any infoboxes, than we got {{Information}} and {{Artwork}} templates. Than we got into internationalizing as much of a file page as possible, so it is accessible to the widest possible audience. That pushed us into using more and more i18n templates, many of which are maintained by me. One side-effect of adding dozens of i18n templates to a file page is that it makes page hard to read and correect to humans. Now we are capable of tapping into the metadata stored on Wikidata, like in case of place locations or information about people or taxa, which nicely separates matadata from the page i18n source-code. Structured Data will allow us to keep metadata in human-readable form while keeping control over file metadata on Commons, so there is no danger of data siphoning to other projects, because it will stay on Commons. My preference would be to remove data stored in Structured Data from file description text field, to prevent duplication resulting in conflicting information, when people correct wrong data in one place but leave it in a second place. However, that is a topic that will be decided by future discussions. --Jarekt (talk) 16:45, 27 November 2017 (UTC)
We old baby-boomers are getting stiff in the mind; nothing odd about that. Innovations like this will give us difficulties and look at first like gibberish, just as HTML looked like gibberish to our elders, twenty years ago. No use trying to hold back everyone else for us, however; we'll eventually climb aboard the streetcar with a little help from stronger young hands. More specifically, we'll need a more sophisticated interface. I have confidence that such a thing will be made; not so much confidence in how soon it will be. — Preceding unsigned comment added by Jim.henderson (talk • contribs) 07:18, 16 December 2017‎ (UTC)

Descriptions

Hi,

I can see the MediaInfo page example here, but the associated file page this is not helpful that I'm uncertain if this will be just more headache or a revolution for the Commons.

Can you have a working example file page compliant to current Commons standard (with {{Information}} and good license template) so we can have a more concrete example how it will work when this is coming to Commons? Thanks. — regards, Revi 08:34, 22 November 2017 (UTC)

Hello -revi! The MediaInfo page example is an early test of the technical infrastructure. The actual data in there is indeed not correct at all, it's intended to be a technical tryout. From the other (design and information) side, we will also work on design sketches, of which you can see one very early first attempt in this slide of a recent presentation held at WikidataCon. In the upcoming time, we will present and refine these designs further and further in consultation with you, the community. How metadata from {{Information}} and other templates (including licensing) will fit in there, depends on how the community wants to model this. SandraF (WMF) (talk) 11:49, 22 November 2017 (UTC)
Yup, I know that file in federated-commons.wmflabs is obviously dummy, but what I’d like to see, sooner than later, is a real mockup of the proposed changes your team has in mind. Without working example, again, we don’t know if this will do anything to Commons. I’m still not sure what this will do to Commons. — regards, Revi 19:21, 9 December 2017 (UTC)

Structured data from Trove citations?

I won't pretend I fully understand the ambitions of the Structured Data probject (but I get the general idea of it being Wikidata for Commons). I use the Upload Wizard for my uploads which included out-of-copyright photos uploaded from local libraries (mostly from the State Library of Queensland) which I find using Trove (the Australian library catalogue aggregator). Each entry in Trove has a Wikipedia citation generated for it which can be copied and pasted into Wikipedia as citations and further reading.

I use the Trove citations as the "source" field when I upload images that I find through Trove. E.g File:Hugh Milman circa 1890.jpg has the source:

|source=Hugh Milman, ca. 1890. John Oxley Library, State Library of Queensland. Retrieved on 19 November 2017.

Is it possible to mine these Trove citations for Structured Data? Kerry Raymond (talk) 21:27, 13 December 2017 (UTC)

Hi Kerry Raymond! The exact way in which information in Commons templates can be converted to structured data is still a bit difficult to predict (it is up to the Commons community to decide how to do this). To my own limited knowledge (I work a lot on artworks in my volunteer time), it is best if you currently do uploads to Commons which use existing {{Creator}} and {{Institution}} templates as much as possible. My first impression is that the above (Wikipedia-style) citation template, might be a bit on the complex side to convert to structured data, because it combines information about the author, the institution, publication date and the source URL in plain text and in formats that are not very custom on Commons. I'd like to hear opinions from others. As a little exercise, I manually edited this file towards making it (in my opinion) more easily convertible to structured data. Best! SandraF (WMF) (talk) 13:43, 19 December 2017 (UTC)
I'm a bit confused by your reply. The Trove citations above clearly separates out all of those fields you mention in the source (it just does not render well on Commons, but a structured commons tool would surely not work off the render, but the underlying source), e.g.
{{Cite web | author1=Unidentified | title=Hugh Milman, ca. 1890 | publication-date=1890 | publisher=John Oxley Library, State Library of Queensland | url=http://trove.nla.gov.au/work/153917805 | accessdate=19 November 2017 }}
Your example about File:Walter Bentley as Hamlet did not use Trove's Wikipedia template, so it's entirely different. If I had uploaded it using Trove's Wikipedia template, it would be
{{Citation | author1=Falk, Sydney | title=Actor, Walter Bentley, playing Hamlet | publisher=John Oxley Library, State Library of Queensland | url=https://trove.nla.gov.au/work/153915054 | accessdate=20 December 2017 }}
My point is that we have Trove which automatically supplies Wikipedia citations for all the books, photos, etc in the collection. These are extensively used on en.Wikipedia and is very time efficient (just copy from Trove and paste into Commons) so using these on Commons too makes a lot of sense for anyone uploading and are amenable to automated extraction. I think hoping people will do more work *manually* in order to make the data more amenable to Structured Commons seems unlikely to be a winner. I think having something like Citoid that works with popular photo sources (e.g. Flickr) is more likely to be used.
As an aside, is this Talk page the best place to discuss the topic? It would seem so, but it's clearly pretty quiet here, making me wonder if the conversation is elsewhere. Kerry Raymond (talk) 21:54, 19 December 2017 (UTC)
Hi Kerry Raymond! My apologies - I do understand that my answer was confusing, seeing that I gave an example which did not include a {{Cite web}} template. It is of course very good and convenient to use that template widely on Wikipedia, but it is relatively uncommon on Commons (according to the Petscan tool, it is used around 17,000 times, which is relatively rare, compared to the 1,5 million times that the {{Photograph}} template is used. It is most likely that frequently used templates on Commons will be the first candidates for conversion to structured data. Also, having authorship and source information included in plain text is one step in the right direction, but for conversion to structured data it is better if there is already some kind of Wikidata connection for each bit of information in the template (that makes it machine-readable and much more easily convertible). This is the case for many {{Institution}} templates, including the {{Institution:State Library of Queensland}} template which has a little Wikidata icon indicating that there is a Wikidata link present. In any case, I'm including your example in the interesting Commons files page, and I have also alerted the community focus group for Structured Commons to weigh in. (And maybe lack of broader response is also due to the approaching holiday period?) SandraF (WMF) (talk) 14:53, 20 December 2017 (UTC)
I can't see any way to select the Photograph template in an Upload; it seems to use Information template by default, so I have no idea where those Photograph templates are all coming from. I am not in a position to change the way the Trove citations are structured. If there is a desire for structured data (and I can see the value in that), then the Upload has to be designed to construct that template instead of the Information template. Kerry Raymond (talk) 22:38, 20 December 2017 (UTC)
My looking around shows there are uses of Cite journal, Cite book, Cite news, Citation, too. I think what appears to be missing in the design thinking here is the workflow of the Wikipedian. They are already constructing citations for the Wikipedia article so it's logical to want to re-use those citations when taking images from the same source. For example, I've uploaded individually over 1300 images in the Category:Images from the Queensland Heritage Register, a slow process I assure you. If I could use the template Cite QHR as I do on en.WP, it would be a lot less work for me (copy and paste). The integration between Wikipedia and Commons is really poor, the templates are different, the categories are different etc. This makes the workflow inefficient and needlessly wastes people's time. Kerry Raymond (talk) 23:47, 20 December 2017 (UTC)
Thanks for your candid input, Kerry. You touch on a lot of things we are working on. We don't have specific solutions yet, but one thing I can say is that one of our top priorities is finding ways to make it much easier to add accurate metadata to uploads, and also reducing the need for templates at all. Addressing some of these long-standing issues will come as part of the package of adding Structured Data since enhancing the user experience is needed to make the most out of Structured Data features anyway. You'll start to see this reflected in our first new UI designs, which will be available for community feedback in the first few months of 2018. RIsler (WMF) (talk) 02:34, 22 December 2017 (UTC)