Commons talk:Structured data/Archive 1

Initial feedback[edit]

Greetings!

What do you think of the Structured Data project? What parts do you like? What could be improved? Anything missing?

Please share your insights here, to help plan our next steps. Thanks for joining this discusssion! Fabrice Florin (WMF) (talk) 08:02, 22 August 2014 (UTC)[reply]

What do you like most?[edit]

What parts of this project seem most useful to you?

Killing mw:Extension:CommonsMetadata! Anomie (talk) / BJorsch (WMF) (talk) 14:20, 22 August 2014 (UTC)[reply]
I'm hoping this will replace various categories like Category:Taken with Pentax K-50 that currently have to be manually curated to match the EXIF metadata (or other machine-readable metadata). I'm also hoping that in cases where there are multiple copies of the same image in different formats (such as TIFF and JPEG, or DNG if that is ever supported), this will allow the multiple copies to be treated more as a "unit". That said, I'm afraid that even after reading the slide deck and various other pages, I'm still uncertain about exactly what will and won't be possible, so I'm just hoping/guessing about these things. --Ppelleti (talk) 02:21, 23 August 2014 (UTC)[reply]
Maintenance of the site currently requires a bit of guesswork as to whether a file is likely to be properly licensed. There are 3 or 4 pieces of absolutely essential information a piece of media must have to be sure it is licensed correctly. Anything that makes finding, comparing, and changing that information easier will hopefully encourage more people to undertake maintenance of the site, which is terminally understaffed because of how obtuse everything is. –⁠moogsi (talk) 01:39, 24 August 2014 (UTC)[reply]
The 'killer app' is topic-based search, with progressive refinement as the user inputs additional topics. This should be the first focus of the project, albeit with an eye to establishing structures that will also facilitate other uses. Better multilingualisation would also be a valuable bonus. There should also be other advantages from the facility to be able to write and run queries on a structured database -- for example, perhaps to enable a user on the fly to easily change the way a category is sorted (eg perhaps by date); or filtered (eg perhaps by quality); or to auto-suggest the most useful topics (eg 'Edinburgh' or 'watercolour') or types of topic (eg 'place' or 'medium' or 'original artist') for further refinement, based on the incidence of those topics in the set of files currently under consideration, and estimated information gain from them. Jheald (talk) 10:41, 25 August 2014 (UTC)[reply]
Replacing intersection categories like "Paintings by Jan Steen with Fur" that make it harder to see artists' work rather than easier. - PKM (talk) 05:45, 3 September 2014 (UTC)[reply]

Localisation. We could have every user see the file properties in their own language, just by translating the property labels. Edits in one language would show up and be accessible to users in other languages.

What could be improved?[edit]

How could we improve this first proposal?

Referring to the "API Class Diagram" in the slides: Shouldn't "FileMetadata" have an attribute "Contributor", and "Work" have an attribute "Creator"? - Looking through the slides I got the impression that the contributor of the file to Commons and the creator of the work (which can be identical but don't have to) are being mixed up. I guess it would be good if we could start with clear definitions and a sound data model. --Beat Estermann (talk) 06:57, 25 August 2014 (UTC)[reply]
- For this to really happen, we would need very clear explanations in all languages about the metadata collected, such as e.g. "Creator of the uploaded image" versus "Creator of the item depicted (if any. Check unknown if one existst but is not known, check anonymous' if one exists but refuses to be named)" etc. --109.44.2.248 06:50, 18 October 2014 (UTC)[reply]
What is a "Workflow" ? How is this different from a "User story" ? Is the experience of somebody browsing the site a "workflow", even though they might not consider themselves working ? Or does "workflow" refer more to what volunteers do: uploading new images; better describing / organising / categorising / tagging / presenting them; monitoring / policing / maintaining them, etc. The term "workflow", and what it is that you are looking for beyond the existing "user stories", is not clear. Jheald (talk) 10:57, 25 August 2014 (UTC)[reply]
- A user story describes what an actor/person wants to achieve and for which reasons, a workflow also describes the consistent steps that are required to achieve something. —TheDJ (Not WMF) (talk • contribs) 13:42, 28 August 2014 (UTC)[reply]
There should be much more clarity on the general helicopter-view of the principles behind what information should be stored where. Information relevant to images will be stored in three places: the file page, Wikidata, and the Commons Wikibase. In particular d:Wikidata:Notability is useful, and should probably continue in much its present form -- there is value in being able to download a dump of Wikidata, and have it largely contain items or concepts relevant to the real world, and not be bloated to a much larger size with items only relevant to Wikimedia. The implication is that information which is primarily relevant to the specific file on Commons should usually not be stored on Wikidata. Information stored on Wikidata should relate to the real world, and usually be relevant (at least potentially) to multiple files on Commons -- eg the biographical details of a painter; or information about a real-world topographical feature; or a particular old-Master work hanging in a particular gallery. The "multimedia data list" spreadsheet, linked to in the article, appears to get this badly wrong, and should be revised.

At least in the initial phases, the information stored on Commons Wikibase should focus on what is likely to be searched, or filtered by, or queried, per file. These should be the priorities, accepting that for the foreseeable future much other information -- not just EXIF information -- is likely to remain on the filepage. Jheald (talk) 11:22, 25 August 2014 (UTC)[reply]

Related to the above, the information stored in particular places is likely to break into several different, almost independent, chunks -- eg on Commons Wikibase one might have (a) topics, (b) legal-related, (c) ... etc. Identifying early these ways to break the total information down into groups of fields that hang together, such that each group can be treated almost independently of the rest, would be a good step towards making the overall analysis easier. Jheald (talk) 17:36, 31 August 2014 (UTC)[reply]

(Added) One thing that in particular is needed is a clearer line on when real-world objects underlying Commons images -- eg physical old photographs, individual map sheets, manuscript folios, etc -- should (or should not) get their own first-rank items on Wikidata. Will such objects automatically qualify for items, even if this will massively increase the number of Q-numbers on Wikidata? See eg d:Wikidata_talk:WikiProject_Visual_arts#Wikidata:Notability_and_artwork, and subsequent sections on that talk page, for some example-cases and discussion. Also this thread on multimedia-l/wikidata-l, though with little follow-up so far. Jheald (talk) 09:34, 6 October 2014 (UTC)[reply]

The proposal should recognise that increasingly detailed ontologies for likely subject matter have been (or are being) already developed on Wikidata -- see in particular d:Wikidata:WikiProject Visual arts/Item structure, d:Wikidata:WikiProject Books, d:Wikidata:WikiProject Source MetaData, as well as of course ontologies for people, places, artistic movements, etc, etc. This boat has already sailed, and to the extent that the initiative is working with Wikidata, it needs to work with what is already there (subject to amendment / modification / evolution, etc). However, most of the initiative's work is likely not to be on Wikidata, but on the Commons Wikibase. Jheald (talk) 11:35, 25 August 2014 (UTC)[reply]
The proposal needs to think harder about denormalisation / duplication of information between Wikidata and Commons Wikibase -- especially the key "topics" associated with a file needed to make it searchable / discoverable. For example, an image of the Mona Lisa should also be discoverable starting with a topic search for "Leonardo da Vinci". Canonically, the information as to who was the painter of the Mona Lisa should be stored on the Mona Lisa item on Wikidata. However, I suspect that the Q-number for "Leonardo da Vinci" should also exist as a topic associated with the file in the file's topic list on Commons Wikibase -- presumably with an attached property 'relevance to the file in question' binding a Q-number signifying "painter of the underlying image". So this information will have been denormalised between Wikidata and Commons Wikibase, and that denormalisation will require maintenance, eg in case of edits at either end.

As further denormalisation, the item for Mona Lisa on Wikdata should probably contain a list of topics that the Commons Wikibase item for any file representing the painting should include.

All this denormalisation may seem very messy, but the alternative -- having to recurse a tree down through Wikidata to identify all the topics related to a particular topic (eg all the paintings painted by Leonardo), each time that topic is searched, to build a ranked hits set, also seems awkward (even with caching) -- as well as losing the flexibility of just being able to associate an arbitrary topic with an arbitrary file and an optional "how this relates" property. Jheald (talk) 12:04, 25 August 2014 (UTC)[reply]

Expansion of these thoughts on topics and searching in this message on multimedia-l. (some follow-ups also on wikidata-l). Jheald (talk) 09:43, 6 October 2014 (UTC)[reply]

What is the use-case for a new per-file view, that will appearently be a rival to the existing file page? The team has already been burned recently when MediaViewer was positioned as an alternative/replacement to the file description page. MV is now being re-positioned with a much leaner interface, no longer trying to replicate so much of what is presented on the file description, but instead as an alternative way to view information that is on the Wikipedia article page (with eg the caption from the article page, rather than the description from the file page), and very little additional information, so that for further information the reader is now going to be strongly encouraged to click through to the file description page.

Yet, despite that experience, and the cry that went up from Commons users and GLAMs that users seeking full information should be directed as quickly as possible to the file description page, it would seem that the team is now preparing another parallel alternative to the file description page view. Is this sensible? Is it necessary? What is the use-case for it?

An alternative, instead of having a separate page as a reader's typical view on the structured data, would be closer integration and evolution of the existing file description page -- so that the normal place to read structured data would be in fields on that page, and (one) normal way to edit it would be through templates on that page. (Obviously there would be others, presumably including a direct API, and views like topic-tagging games working through the API). Indeed it probably may well make sense to preserve the metaphor of editing the wikitext of this page as a viable API for bots to read and write the structured data -- editing wikitext is very lightweight, requires minimal additional modification for bots that are already written to edit filepages, and would make it easy to combine edits to the structured data with other edits to the filepages. The wikitext being served or retrieved from the bots -- probably something like {{topics | Q1234 | Q5678 | Q91011 }}, need not actually live on the page, but could be generated on the fly, and then if changed parsed on the fly, by the server, as required. Such an interface to the Commons Wikibase would help to anchor the Structured Data initiative as an evolution from what we have at the moment, rather than a revolution. Jheald (talk) 12:42, 25 August 2014 (UTC)[reply]

Incidentally, I suspect it is both unlikely and unhelpful to suggest that Structured Data may replace file pages, or lead to them containing just a no-parameter {{Information}} template, as the article seems to imply at one point; so that even as a long term vision it should probably be disavowed. There is simply too much information on file pages, stored in a myriad of bespoke templates like {{Infobox aircraft image}}, as at eg File:AT-6C Harvard IIA NZ1056.jpg, that is not going to be expressed as simple triplet references to Wikidata items, and is primarily going to need to remain searchable as plain text. The same may often go for source information about images, and source templates being used on them, eg {{British Library image}}, for use of which see eg File:Cuthbert discovers piece of timber - Life of St. Cuthbert (late 12th C), f.45v - BL Yates Thompson MS 26.jpg or File:Digging the Cane-holes - Ten Views in the Island of Antigua (1823), plate II - BL.jpg or File:Bay Owl - 51 drawings of birds and mammals at Bencoolen, Sumatra (c.1824) - BL NHD 47-34.jpg that in this case can contain link-backs to several out of up to a dozen different catalogues, some links for the file, some for the underlying image (probably not individually notable for WD), and some for the underlying book or manuscript or collection; each with appropriate different linking text. Yes, such templates could do with standardisation and generalisation; yes, probably it does make sense to have containers on Commons Wikibase for linked data; but at the end of the day, a template like this is probably as good a way as any to control the display of that information -- using the traditional flexibility and editability and light-weightness of wikitext. Jheald (talk) 13:19, 25 August 2014 (UTC)[reply]
Getting file-description information onto Commons looks like it's going to become a lot harder. At the moment, if I'm manually uploading an image (or series of images) which may have reasonably detailed object information, I will typically use Upload wizard to create the most basic file page, and then copy-and-paste either a blank {{Artwork}} template pro-forma, or an {{Artwork}} template from a similar image that only needs a few fields changed. Alternatively, projects like the Commons:British Library/Mechanical Curator collection often use ingestion templates, like Commons:British Library/Mechanical Curator collection/script, which generates a fully made-out {{Artwork}} template from specific domain-relevant information.

It is the text-like interface of wikitext that makes this easy. All this would appear to be about to become much harder, because copy-and-paste from existing examples will no longer be possible, and ingestion templates will be a lot harder to write. So I worry that it's about to become a lot more cumbersome to add the equivalent of {{Artwork}}-like information to an image. Wizards aren't everything. Text-like APIs can have their moments too. Jheald (talk) 11:18, 6 October 2014 (UTC)[reply]

keeping the Community updated When I look at the Roadmap, I see a point community discussions in the first phase but nothing comparable thereafter. If there's anything we should have learnt from the MediaViewer debacle, it's that people react very strongly when they get the feeling that something has been done behind their back. This set of pages and the newsletter are a good and important step into the right direction. Please use these channels extensively and maybe consider posting the newsletter to the village pumps as well to get the people who may have already forgotten about this. In addition to that, I think there should be community feedback rounds at every major milestone to gather opinions from more than the (probably) rather few that are interested enough to constantly follow the development process. I know that providing a constant stream of information for "outsiders" means a lot of work, but it's time well spent if it avoids a rude awakening of the kind we experienced with MV. And remember that the changes this project aims to make go much further down to the core of how Commons works than MV … --El Grafo (talk) 08:48, 14 October 2014 (UTC)[reply]

Anything missing?[edit]

Did we forget anything important?

Ten days ago I posted this question: Commons_talk:Wikidata#I_seem_to_suck_at_RTFM... (about Wikidata documentation). The answer was a bit disappointing. I am super interested in the structured data project and would love to be one of the consumers of the data, but in order to get developers excited the documentation needs some love. This would also have a positive impact at this stage of the discussion, as it would help make informed decisions if one knows what the capabilities of Wikidata are, and how the data can be utilized. Cheers Dschwen (talk) 00:38, 23 August 2014 (UTC)[reply]

A link to the post on Wikidata, for context. I'm not sure what we can do from the WMF end to help out, Dschwen, but proper documentation is definitely significant and I'll put this in my list of things to look into in the coming months. Keegan (WMF) (talk) 06:47, 24 August 2014 (UTC)[reply]

Ways the community can get involved now. Anything that depends on Commons Wikibase will of course have to wait until that system is closer to being spec'd out and delivered by the developers. But anything that only depends on Wikidata the community can start to work on now. As a place to gather, a user group is now live on Wikidata, d:Wikidata:WikiProject Structured Data for Commons. In particular, the community can start to work, with the WikiProjects already on Wikidata, on designing or improving the ontologies that will be used to describe the subjects of files and categories here on Commons. It can start to design (and soon implement) templates to go at the top of Commons categories and Commons galleries, drawing their data from the corresponding items on Wikidata. It can start to design and test (but not yet implement) replacement versions of templates used in file properties birdcages, drawing data from specified non-corresponding items on Wikidata. It can start to populate Wikidata using information from Commons pages, so stress testing the existing ontologies for adequateness. In this way the data will be there on Wikidata, and the templates to read it live and operational on Commons, ready to help the developers as they get closer to bringing the Commons Wikibase part of the project to fruition. Jheald (talk) 14:25, 25 August 2014 (UTC)[reply]

Categories. How should a category like Category:Images released by British Library Images Online be represented? It's a complete, closed set of images relating to a particular image release. On the other hand, like so many Commons categories, (eg so many intersection categories), it currently would fail d:Wikidata:Notability, as really only defining a particular Flickr set, rather than anything independently notable in the real world. Similarly, a category containing a particular set of scans from a book. The book itself might be notable, but surely not one particular set of scans.

Yet it is useful to give categories an identifier, so that one can test for membership; and to hold properties for it, eg which book the scans are from, what the source was of them etc; and one might want to run queries relating particularly to that set. So it would be useful to have an item for the category.

So where to put this item. One option would just be to throw WD's notability test to the winds, and give each and every Commons category a Wikidata item. But really, that's adding a huge number of junk objects to WD's database. The more natural alternative would seem to be to create items for these categories on Commons Wikibase, that would be tied to items on WD for categories relating to things that really are notable. Jheald (talk) 14:50, 25 August 2014 (UTC)[reply]

See also d:Wikidata_talk:WikiProject_Structured_Data_for_Commons#Categories_and_tags for further discussion, including the idea that one should be able to associate rules with categories, to make categories auto-updating -- so that if the right topic/relationship were added to a file's topics entry, that pinged a particular rule, the file would automatically be added to the rule's category. Conversely, it would also be nice to highlight on the category view which members did or did not match such a rule -- the topics and wikidata relationships for such files could then be manually investigated, which might be useful for database improvement if topic information were initially weak. Jheald (talk) 17:06, 29 August 2014 (UTC)[reply]

Terms of use. I see the proposal includes a "terms of use" field, but the semantics of this aren't very clear. My use case: A few years ago I tried to write a user script that would highlight images that were using the "link=" parameter when the license required the file page link for license terms and/or attribution (public domain and CC0 don't require a link, most other licenses require the link for both notice of license terms and attribution, and a few require the link only for attribution). But it was too difficult to keep up with, since the only way to determine this was to look for the file page to be in one of an ever-changing list of categories. Anomie (talk) 14:42, 27 August 2014 (UTC)[reply]

What about non-Commons file uploads? Looking at this again, I see that the proposal is to store the data on Commons itself. Which is good, but what about third-party wikis or Wikimedia wikis with an EDP, who could very likely find good use for structured data for their own locally-uploaded files as well? Anomie (talk) 11:25, 1 September 2014 (UTC)[reply]

Any idea how files with multiple copyrights will be handled. Here's an example: File:Josef_Suk_-_Meditation_Op_35a.ogg. Kaldari (talk) 06:18, 18 September 2014 (UTC)[reply]
- It will be possible to just add several licensing statements. Each license could then have qualifiers for example to indicate where a certain license applies or which parts of the work it applies to. --Lydia Pintscher (WMDE) (talk) 09:43, 18 September 2014 (UTC)[reply]
  - I just typed the following for a new section below before I realized this one. I don't want to rephrase the whole thing, so please ignore the fact that it's partially redundant:

Under some circumstances, it may be necessary to specify different licenses (or license-like tags) for the photograph and the object depicted in it. The most prominent case would probably be statues, where the usual "this is a faithful reproduction of a 2D-PD-work, so the whole thing is PD" doesn't work anymore. In those cases, we need a license tag for the statue (e.g. {{PD-old}} or {{FOP-Germany}}) AND a license for the photograph of it (e.g. {{PD-self}} or just the standard {{CC-BY-SA}}). We also need a way to figure out if a) the sculptor or b) the photographer or c) both must be credited by re-users. See also: Template talk:Art Photo#Issue with MediaViewer. --El Grafo (talk) 15:45, 7 October 2014 (UTC)[reply]

Just another example for a more complex situation: 18th century PD original by A, 2012 bronze reproduction covered by FOP by B, photograph by me (CC-0 in this case, though). --El Grafo (talk) 08:39, 10 October 2014 (UTC)[reply]

Vandal-fighting. Concerns about this became very clear in the first IRC Q&A, particularly in the contributions from User:Steinsplitter. Changes on WikiData pages -- either changes to item properties, eg who is pointed to as the "Creator" of an object; or changes to item labels in different languages -- will be directly reflected on File pages on Commons, but without any corresponding changes of Wikitext. A small number of vandal-fighters depend very heavily on automated tools to keep the much larger number of anonymous IP vandals in check. It is therefore essential that such tools can easily pick up and reverse vandalistic edits.

This appears to be a growing concern about Wikidata, shared across a number of client wikis, which may be starting to seriously hamper uptake of Wikidata-sourced content. So for example, in recent days there has been concern about a vandalistic label change on a reasonably high-profile mathematics topic which went un-noticed for four days; and about how to protect high-value items on Russian wikipedia from vandalism.

One suggestion has been to activate flagged revisions on Wikidata for high-sensitivity items; but there appear to be a number of issues with this. Another suggestion (see this diff) was to add an option to MediaWiki to highlight changed template values arising from changes on Wikidata, by exposing these values as comments added on-the-fly to the wikitext, and to trap and pass on any attempted reverts to Wikidata. This could allow vandals to be fought on Wikidata through existing vandal-fighting software on client wikis.

Either way, it is not enough to think that just propagating relevant Wikidata changes to a Commons watchlist, for manual investigation, is enough. Effective vandal-fighting relies on automated tools, so thought is needed as to how to integrate Wikidata and CommonsData changes with the tools being used by vandal fighters on Commons. Jheald (talk) 10:26, 6 October 2014 (UTC)[reply]

I think that's not only relevant for vandal fighting: If anyone makes any changes to my works or its metadata, I want to be notified, so I can a) revert bogus, b) ask if I don't understand the edit or c) thank the edotir for a useful contribution. Also I think it can be very educating for newbies if they are pointed to the adjustmenst more experienced users may make to their uploads (at least I found that useful when I was one). --El Grafo (talk) 15:45, 7 October 2014 (UTC)[reply]

Derivative works I haven't seen anything about this in the spreadsheet: I think it would be very useful to link Commons-file(s) a file was derived from in a different matter than files that were just copied from somewhere. This could then also be used in the other direction, automatically creating "derivative works of this file" listings that currently have to be maintained by hand. However, the most important point here is probably, that we have a lot of files that are derivatives of derivatives of derivatives of […] and we are not at all capable of tracking that appropriately. An example would be a map in language A that was translated from the same map in language B, which was vectorized from a map in a raster format, which was created on an empty basemap and filled with data from an institution. In the chain of derivative works, it is all to common that information about e.g. which base map was used and who authored it is not carried forward to the map in language A – and believe me, this is just a simplified example. See also bugzilla:67283. --El Grafo (talk) 15:45, 7 October 2014 (UTC) (Does this make any sense to you? If not please ask and I'll try to rephrase)[reply]

Tools for uploaders. I am very interested in this project, but I am concerned about making it harder for power contributors to add files to Commons. I do this a lot, using the Artwork template (so not the Upload Wizard). The Wikidata interface for editing is much harder to use than my current process (which involves a number of keyboard macros that paste in the syntax for the various templates I use regularly). If the understanding is that improvements to the Upload Wizard to handle other templates than Information would happen before or in tandem with this project, we should call that out. I can't be the only editor who would conclude that contributing would be harder in the new model, although the experience for users would be improved on all the ways listed above. - PKM (talk) 02:03, 14 October 2014 (UTC)[reply]
Thanks for calling that out. This was brought up during the recent meeting in Berlin, and an initial idea to support these workflows is to have custom "upload profiles" that power users can create and edit, perhaps in their user preferences. That way, you could fill the form once, save it into a custom profile, and the next time you could just reuse all those fields with one click. This is just an initial exploration, but the point is, you're not the only one using the "old" or "basic" upload form, and Structured Data will have to support this workflow. Guillaume (WMF) (talk) 08:52, 14 October 2014 (UTC)[reply]

Help text. A template consist of the data and some help text, for example what a license allows. I don´t see how that could be done with wikidata as it is now. I think a help text (data type: string?) should be attached to the property or a group of properties, which would be helpful for wikidata too. --Goldzahn (talk) 17:08, 16 October 2014 (UTC)[reply]

Alt text. One data element I think deserves consideration in the Structured data effort is alt text, (W:Wikipedia:Alternative text for images, which is text to be read, say by a screen reader, for people with visual disabilities. Current dogma on Wikipedia is that alt text should on a per-article basis, with different text for each use of an image. But it seems to me a generic text description of an image would be a valuable default when article specific text is not available. I don't have any statistics, but my impression is that only a tiny fraction of articles now have alt text. By making provision for alt text in Structured data, in multiple languages, of course, I think we could provide better support for readers who can benefit from alt text. A corpus of alt tex might have other uses as well, say for searching image types.--agr (talk) 16:22, 6 November 2014 (UTC)[reply]

Camera heading. The Structured Data List spreadsheet currently contains 3 entries marked as "geotag": Latitude, Longitude and Elevation. Many of the geocoded files at Commons also have information about the direction the camera was looking when taking the picture. This information is displayed on the file description page through the {{Location}}-Template (example). It is also used in the maps you can reach through the links provided by the template next to View this and other nearby images (the geocommons database seems to be down atm, though). I think including this information would be much more important than information about elevation. --El Grafo (talk) 12:08, 13 November 2014 (UTC)[reply]

Questions[edit]

Here are some open questions; your answers can help the engineering team. It's been proposed to form dedicated workgroups to investigate these questions over time. If you are interested in joining a particular workgroup, please sign up.

Workflows: Which workflows should be supported first?[edit]

Topic-based search -- it's the big one. Jheald (talk) 14:53, 25 August 2014 (UTC)[reply]
Bulk uploading from GLAMs, eg the existing GWToolset community, would also be a good priority -- these are very large quantities of high quality, high interest images, often with high-quality existing catalogue metadata. These are images that could be "born structured"; it would real-world prove the ability to get data into the system; rapidly populate any test database; help discover awkward edge cases in the metadata; and of course the GWToolset already has developers, paid for by a consortium of European WMF chapters. One particular issue at the moment is identifying appropriate categories for images, which often needs its own sweep post-upload. Such metadata may include tags, which at the moment are not a good match for our category system; matching these to topics instead might be quite useful. (And in the other direction, topics we can identify may be valuable to the source institutions as tags). Jheald (talk) 15:02, 25 August 2014 (UTC)[reply]
Image set presentation and refinement -- the ability to present an image set (from a category or a search); to sort it in multiple, probably context-specific ways -- eg for engravings of old buildings, one might want by building construction date; by original engraving publication date; by upload date; alphabetically by filename; alphabetically by name of building; geographically by proximity, ... etc; and to filter it in multiple ways, including suggesting what filters might be interesting, in a context-specific way, based on the topics associated with the images, to produce a more refined image set. For existing categories, it would probably be nice to have this as a widget inside the existing category view. For sets of search hits, a presentation with a similar classic wikitext-style gallery presentation? Or something new? Jheald (talk) 15:22, 25 August 2014 (UTC)[reply]
- I would like to see a flexible light box tool that would allow for quick visual sorting. It should have options to display a category with its sub categories and show, perhaps in a side bar, a list of target categories or data groups. It should be possible to select multiple images, one or several at time, and then assign them to a target, either as a move or as an add on, or remove them from the target.--agr (talk) 16:28, 6 November 2014 (UTC)[reply]
Related / similar images -- again, drawing from the topic database, rather than just existing category membership. Probably illustrate topics relating to the original image by showing small-number samples, and get the user to check which one or ones they want to intersect, showing a sample of the results at each stage, effectively building up a topic query. Don't want to just show the most similar images -- instead want to give the user a sense of spread of available images in each topic dimension. Jheald (talk) 15:55, 25 August 2014 (UTC)[reply]
Support for 3rd party upload tools -- Many of the more experienced Commons users heavily rely on 3rd party Upload tools and I guess they would get pretty upset if they were forced to use the UW instead (at least I would). So please get the developers of those tools into the boat as early as possible to facilitate a smooth transition. --El Grafo (talk) 14:45, 10 October 2014 (UTC)[reply]
<your comment and user name>

Structure: How do we define a basic data structure that's flexible but not redundant?[edit]

For topics, as suggested above, you perhaps want the Q-number of a topic, then probably a static property ("relation to the file"), then the Q-number of a type of relationship. (The latter would be optional). There may be other properties you want to define locally on the topic Q-number as well. There should also be some note of whether the topic and its relationship had been imputed through a chain of properties and Q-numbers on Wikidata, which would potentially be vulnerable to any changes in that chain, so would need to be tracked; or whether the topic had been contributed independently, eg through a tagging game. It's probably also useful to distinguish what topic Q-numbers represnt unique things, and what ones represent types of things. Jheald (talk) 15:40, 25 August 2014 (UTC)[reply]
For creators, there are a number of issues I can think of with the sub-attributes like "possibly", "follower of", "and workshop", and "formerly attributed to". Relationships like "created by follower of" make sense on the artwork in WD but perhaps not on the artist. Also, "formerly attributed to" should have fields for who made the attribution, when, and where. Do these all belong in WD? - PKM (talk) 20:58, 8 October 2014 (UTC)[reply]

<your comment and user name>

Research: How can we measure and validate each feature?[edit]

<your comment and user name>

Platform: What modules should be built first? Why?[edit]

<your comment and user name>

Features: Which features should be developed first? Why?[edit]

(1) Migrating the license and author to Wikidata. Right now the methods for pulling this data are very fragile (and poorly advertised), making it difficult for 3rd party re-users to adequately meet the licensing conditions (without a lot of manual work). (2) Implement tags/topics. The current category+search system doesn't work very well for helping people to find media they are looking for, especially if they aren't English speakers. Kaldari (talk) 21:07, 22 August 2014 (UTC)[reply]
Ability to present structured data through templates on file description pages, and edit the data through the same templates. Jheald (talk) 18:26, 25 August 2014 (UTC)[reply]
Ability to present structured data as wikitext (eg apparent template arguments) in the source of file description pages, and capture attempted changes to that wikitext (eg by bots). Jheald (talk) 18:26, 25 August 2014 (UTC)[reply]
(As suggested above) Ability to be able to associate rules with categories, to make categories auto-updating -- so that if the right topic/relationship were added to a file's topics entry, which pinged a particular rule, that file would automatically be added to the rule's corresponding category. (These would be alongside files that were already in such a category -- not all categories would have such a rule; and even in categories that did, not all files (especially initially) would necessarily derive their category membership from it.) It would also be nice to conversely be able to highlight on the category view which members did or did not match such a rule -- the topics and wikidata relationships for such files could then be manually investigated, which might be useful for database improvement if topic information were initially weak. This would allow the topic system and the category system to develop together organically. Jheald (talk) 17:06, 29 August 2014 (UTC)[reply]

Migration: How do we coordinate the data migration?[edit]

<your comment and user name> — Preceding unsigned comment added by Seddon (WMF) (talk • contribs) 02:18, 26 October 2016 (UTC)[reply]

Overhauling these pages[edit]

In case you didn't notice, work is being done to turn this into a proper hub for communication and information as this project gets underway. Fleshing out the FAQ and issues pages, among others, is still very much a work in progress. Keegan (WMF) (talk) 13:44, 10 October 2014 (UTC)[reply]

@Keegan (WMF): : You maybe want to shift the existing content on this page to a new free-standing project page, with its own talk page, to clear this one for something more like a conventional talk page. Jheald (talk) 15:57, 10 October 2014 (UTC)[reply]

Or on second thoughts, maybe not. This is the page people know; and much of the above is germane to the project as a whole. Jheald (talk) 16:30, 10 October 2014 (UTC)[reply]

@Jheald: It's true we're missing the "Discussion" label in the navigation tabs. It should be along shortly with a recursive link to this page if you're on the main project page. This page will serve as the centralized discussion page. I'm not sure whether to collapse, remove, keep, or move the above questions from Fabrice just yet. The plan is the flesh this all out by the end of next week after we have office hours again, it might be good to discuss this with yourself and others then. Thanks for signing up for the newsletter immediately, the first edition should be going out a week from Monday :) I'm headed home from Berlin tomorrow and will be taking a couple of recovery days from a busy week. A fresh start on Tuesday. Keegan (WMF) (talk) 23:44, 10 October 2014 (UTC)[reply]

user interface[edit]

File:Wikidata within commons.jpg shows how it could look if Wikidata and Wikitext is coming together on one page. The new Wikidata infos are at the bottom. I think just step by step infos like licensing, metadata, etc. should be deleted from the 23 million pictures. --Goldzahn (talk) 15:38, 16 October 2014 (UTC)[reply]

Information like metadata and licensing are vital to curating a repository of media. What structured data plans to offer is organizing it and presenting it in a consistent manner to humans and machines alike. That mockup is just, as you say, how it could look. It's one of many design possibilities :) Keegan (WMF) (talk) 19:58, 22 October 2014 (UTC)[reply]

The way this screenshot was put together — the JPEG format, the mismatching stitching, the chalkboard-style “clean up” of some areas… all that tells me all I need to know about how this whole thing is being done. Well, it rather simply confirms what has been hinted at before, and I'm more and more convinced that Wikidata, behind its all-business façade, is a mishmash of hack jobs whose only outcome will be to paralyze what has been done so far in Commons (which is far from perfect or ideal, of course). It will fail spectacularly due to any of its many shortcomings, it will grind to a halt, the culprits will flee to comfy jobs in some for-profit outfit (or not…), and Commons will probably have to be taken offline for a while and rebuilt from an old fork. I hope I’m wrong about this. -- Tuválkin ✉ 06:01, 15 November 2014 (UTC)[reply]

How_to_handle_new_files_missing_information_template[edit]

Please see Commons:Village_pump#How_to_handle_new_files_missing_information_template --Jarekt (talk) 20:15, 3 November 2014 (UTC)[reply]

I am looking into tagging files as Category:Media missing infobox template or Category:Pages using Information template with parsing errors‎. The first category for files without any infobox templates and the second for files with {{Information}} template which is not rendered properly (likely due to bracket imbalance). We should expect to find ~700k such files. --Jarekt (talk) 17:39, 6 November 2014 (UTC)[reply]

Folksonomy in, ontology out[edit]

Made me think of Commons, of course: https://cyber.law.harvard.edu/hoap/Intro_to_TagTeam --Nemo 06:22, 3 December 2014 (UTC)[reply]

Templates to rebuild using LUA[edit]

Hi everyone, the Wikidata team has been silently working on arbitrary access to data here on Commons. That's a huge step in the right direction for us because that means we can finally really start using data from Wikidata. I think the first step should be to rebuild {{Creator}} and {{Institution}} in LUA to grab data from Wikidata if no local data is available. That's probably quite a bit of work. First try is already at d:Module:Institution, testing can be done at test.wikipedia.org. Who wants to help? @Jarekt: you seem to be the only one here working on lua..... Multichill (talk) 20:46, 18 January 2015 (UTC)[reply]

Not the only one, User:RP88 was quite busy lately and User:Sn1per was also great help jump-starting the work on porting {{Other date}} to Lua. Yes we could use more people writing Lua. If I am ever done with Module:Complex date , rewriting {{Creator}} and {{Institution}} could be a interesting next project, but if someone else want to spearhead it that would be great too. --Jarekt (talk) 22:27, 18 January 2015 (UTC)[reply]

Hello, I saw Jarekt's ping. I've done a little Lua here on Commons, most recently I did Module:Ordinal / Module:I18n/ordinal. The last couple of days I've been spending a little time investigating the structure and dependencies of {{Artwork}} with the intention of determining what it would take to reimplement it in Lua. Since that is a huge, almost unapproachable project, it was my plan to start by reimplementing some of the child nodes of {{Artwork}} in Lua. It was my plan to start with {{Size}}, but converting {{Creator}} and {{Institution}} seem like laudable goals. I'll read up on wikidata. Is there any particular reason you are implementing it at d:Module:Institution instead of Module:Institution? —RP88 (talk) 23:50, 18 January 2015 (UTC)[reply]

Oh wait, now I see why you suggested testing on test.wikipedia.org, it looks like the necessary support for access to arbitrary wikidata has not yet been deployed to Commons. —RP88 (talk) 02:30, 19 January 2015 (UTC)[reply]

I have had a firt shot at Module:Artwork (without Wikidata functionalities atm), it seems more straightforward than {{Creator}} and {{Institution}}. --Zolo (talk) 09:40, 20 January 2015 (UTC)[reply]

Sorry Jarekt, RP88 and Zolo. I was traveling so I missed your replies. I created phab:T89594 to track progress. Let's see how that works. I'll add tasks to it. Multichill (talk) 13:10, 15 February 2015 (UTC)[reply]

Layer structure: depiction vs. depicted subjects, observer location vs. observed location[edit]

In case of imaging works (photos, realistic drawings or paintings, scans) and other derivative works we need to distinguish between data about the depicting file itself and data about the depicted subjects. The atributes such "author", "licence (permission)", "location+coordinates" and "description" are different for the file itself and for the depicted or even scanned work.

I propose to separate the information about depicted subjects into separate data sets, which would be attached as secondary layer to the primary data set about the file.

Example – photographic reproduction of a painting of a village with a church

Primary layer (the file itself):

Title: Kidlington in Winter by John Michelangelo
Description:
- Observer location and heading: camera coordinates, camera heading, (building-2, institution linked there)
- Depicted subjects: (work-1, object location included there).
- Circumstances: regular visit of the gallery during (action-3)
- Record technique: from the tripod, under daylight (+file metadata)
Type of work: Photographic reproduction of two-dimensional work
Work location: default (Wikimedia servers, file link)
Creation date: (date when the photo was taken)
Last version date: (date of the current version)
Author: User1
Uploader: User1
Upload date:
First published: (book-9, date and publisher of the photo included there), page number
Licence: not applicable (not a creative work), derivative work of (work-1)

Secondary layer (the depicted painting work-1):

Title: Kidlington in Winter
Description:
- Observer location and heading: (coordinates of the painter stand and his heading), (village-4)
- Depicted subjects: (village-4), (church-5 including location), (person-6)
- Circumstances: in winter, during painter's February 1865 travel to England
- Record technique: Oil painting on the wood panel
Type of work: Painting
Work location: (building-2), (exact coordinates of the painting), (floor number), (room name), using time qualifiers if the work is moved
Creation date: (date when the painting was painted)
Last version date: (event-11, including date of last modification or restoration of the work)
Display date: (if not follows from the qualifiers of the location)
First published: (event-10, first event or publication which published the work, including date and circumstances)
Author: (author-7)
Owner: (institution-8) (using qualifiers if the owner is changed)
Licence: PD-Old-UK

Third layer: (action-3), (village-4), (church-5), (person-6), (author-7), (institution-8), (book-9) can simply link to Wikidata item, or can be described as separate data set in Commons (if they are not notable enough to have its own Wikidata item), We can have specific forms for some frequent types of subjects: painting, sculpture, building and its adress, street, road or railway line, settlement, administrative unit, natural formation or area, person, institution, vehicle or machine, event or incident, etc. The most simple solution would be to use for all layers a format compatible with Wikidata items. Of course, filling in the form should be as easy and simply as possible, without needless jumping between windows – and in most cases, it would be not so complicated as in the example.

An interesting idea can be also to interconnect description (list of depicted subject) with annotations, but this idea is not just compatible with the above described idea of layer structure.

Am I understandable enough, or should I add some other examples for other types of files? --ŠJů (talk) 07:12, 29 June 2015 (UTC)[reply]

I think that's pretty clear and I totally agree that support for multiple layers of works is a very important thing to implement. Not only for reproductions of paintings or photographs of statues, but also derivative works of other images at Commons or collages of multiple pictures. For notable artworks that already have an item at Wikidata (like Mona Lisa), much of this could be directly drawn from there, though (see also Wikidata:WikiProject sum of all paintings). --El Grafo (talk) 09:58, 29 June 2015 (UTC)[reply]

Europeana white paper draft[edit]

Interesting Best practices for multilingual access to digital libraries, accepts comments for some more days and has a considerable overlap with our experience. For instance, the recommendation to describe objects with identifiers which are available in multiple languages (such as VIAF items) is very similar to the Wikidata system of item label display. --Nemo 07:18, 30 July 2015 (UTC) — Preceding unsigned comment added by Seddon (WMF) (talk • contribs) 02:13, 26 October 2016 (UTC)[reply]

Commons talk:Structured data/Archive 1

Contents

Initial feedback[edit]

What do you like most?[edit]

What could be improved?[edit]

Anything missing?[edit]

Questions[edit]

Workflows: Which workflows should be supported first?[edit]

Structure: How do we define a basic data structure that's flexible but not redundant?[edit]

Research: How can we measure and validate each feature?[edit]

Platform: What modules should be built first? Why?[edit]

Features: Which features should be developed first? Why?[edit]

Migration: How do we coordinate the data migration?[edit]

Overhauling these pages[edit]

user interface[edit]

How_to_handle_new_files_missing_information_template[edit]

Folksonomy in, ontology out[edit]

Templates to rebuild using LUA[edit]

Layer structure: depiction vs. depicted subjects, observer location vs. observed location[edit]

Europeana white paper draft[edit]

Navigation menu

Commons talk:Structured data/Archive 1

Initial feedback[edit]

What do you like most?[edit]

What could be improved?[edit]

Anything missing?[edit]

Questions[edit]

Workflows: Which workflows should be supported first?[edit]

Structure: How do we define a basic data structure that's flexible but not redundant?[edit]

Research: How can we measure and validate each feature?[edit]

Platform: What modules should be built first? Why?[edit]

Features: Which features should be developed first? Why?[edit]

Migration: How do we coordinate the data migration?[edit]

Overhauling these pages[edit]

user interface[edit]

How_to_handle_new_files_missing_information_template[edit]

Folksonomy in, ontology out[edit]

Templates to rebuild using LUA[edit]

Layer structure: depiction vs. depicted subjects, observer location vs. observed location[edit]

Europeana white paper draft[edit]

Navigation menu

Search