Commons talk:Flickypedia/Data Modeling

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Discussion, in general

[edit]

George was just now talking on the Commons Photographers' User Group about wanting some informal discussion of the mapping of description & tags from Flickr. I'd very much like to be included in those discussions. - Jmabel ! talk 16:53, 5 August 2023 (UTC)[reply]

Likewise. See also phabricator:T339902. Tags are not what the depicts statement is for. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:36, 5 August 2023 (UTC)[reply]
Hi @Jmabel and @Pigsonthewing. Thanks for tuning into the presentation on Saturday. Glad you're here.
I want to show my "colours" up front and say I'm generally very supportive of folksonomic classification systems, and I'd argue you can see their great success demonstrated on Flickr, with almost 20 years of organic and emergent tagging bearing superb search fruit.
But! I've also reviewed that phabricator:T339902 conversation and the references, and can tell there's lots of WM community frustration that these folksonomic descriptors are also being used in a place you and others feel should be very structured. It also sounds like there's frustration that you weren't a part of the design/development choices made when implementing the so-called "tag" UI which if I'm interpreting the conversations correctly is the place where people are being asked to actually fill out a depicts statement. (Is that right?)
I'm not sure we're going to be able to calm that frustration as we develop Flickypedia, but what I hope we can do is figure out how we can explain what depicts means to people using Flickypedia, as we also borrow their tags, to see if we can make some of those more formal depicts statements I think you're after during that process.
@Pigsonthewing I looked at Commons talk:Structured data/Computer-aided tagging/Archive 2020#Bad tags, nagging, and no tags but couldn't get a clear read on what you do actually think depicts should be - is it something perhaps a bit more like named entities?
Thanks!
(And let me also apologise in advance if I'm slow to reply. I'm still developing my WM-based conversation muscles!) Ukglo (talk) 09:48, 7 August 2023 (UTC)[reply]
And by the way, here's me in 2007 giving a talk at SxSW about taxonomy v folksonomy, so, I've been thinking about all this for a while! (Which I found with machine, tags, george, oates.) ;-) Ukglo (talk) 10:08, 7 August 2023 (UTC)[reply]
@Ukglo: This is an asynchronous medium, so take as long as you like before replying. Jmabel, I, and anyone else who cares to (and you!) can subscribe and get notifications if and when there is a follow-up post.
In short, "Depicts" statements should only be used for the most specific available item. Suppose we have a picture of a car (automobile). It's fine for someone to tag it as "depicts=automobile". later, someone else may change that to "=Ford Escort", and later to "=Mk II Ford Escort Estate". But at each change we don't need to keep the less specific value, much less "vehicle", "metal", "rubber" and so on. That's what's currently being done, contrary to community wishes. Tagging is a good thing, and there's a place for it - I've also given talks promoting it - but it's not the same as saying what an image depicts (the picture in your postscript does not depict "tags", nor "machine").
More specifically it would be unfortunate for you and your colleagues to spend time and effort encoding a method for importing tags into depicts statements, if the WMF do change their approach, as I hope they will, and such "tags" end up being removed from "Depicts" statements. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:03, 7 August 2023 (UTC)[reply]
"In short, "Depicts" statements should only be used for the most specific available item." - Great - and why is that? Ukglo (talk) 14:07, 7 August 2023 (UTC)[reply]
I note with interest that the sugar cube image used as the basic example on the main SDC page contains "Mathematics" as a depicts value - is that a mistake? Ukglo (talk) 14:09, 7 August 2023 (UTC)[reply]
I would consider that a mistake. - Jmabel ! talk 17:01, 7 August 2023 (UTC)[reply]
Looks like it was [added 2 days ago. General rule of thumb on wikis: if something strikes you as dubious, check to see whether it came from a recent edit. User:XRay, this change came from your bot. I don't have any idea what drove the change. Do you think this is correct, or do you agree with me that this was an error? - Jmabel ! talk 17:06, 7 August 2023 (UTC)[reply]
A Cube is a mathematical object. But you're right, this should be fixed. (I'll fix it.) --XRay 💬 17:11, 7 August 2023 (UTC)[reply]
And now, seconds later, it's fixed. Bot too. --XRay 💬 17:13, 7 August 2023 (UTC)[reply]
@Jmabel Yay! I did think it was weird - thanks for showing me the process. Ukglo (talk) 13:03, 8 August 2023 (UTC)[reply]
I think the thing I'll be most interested to talk with you about is what a minimum set of desirable structured Flickr-specific data will be. We'll have a decision to make as the project team on whether we're able to incorporate SDC in our V1 (hoped-for release this year, as I said on the call).
I've mapped a short list using the SDC info, as listed on the Data Modeling page. Does that look ok/good/right to you? Ukglo (talk) 15:02, 7 August 2023 (UTC)[reply]

Creator

[edit]

Hi @Ukglo, your talk from yesterday brought me here. Thank you so much. I would love to help with the mapping of Flickr data to SDC.

Here are some ideas regarding the mapping for the "Creator" info: Ideally, the new tool could query Wikidata for a unique item with the respective P3267 statement (Qxxxx → P3267 → [Flickr user ID]) and then just create a statement P170 → Qxxxx. Only if no Wikidata-Item for the Flickr-ID exist, the "some value" placeholder together with qualifiers (P3267, P2093) should be used to avoid redundant information on each file. Since P3267 creates a URL to the Flickr userpage, an additional P2699 qualifier seems redundant as well.

Cheers MB-one (talk) 21:51, 6 August 2023 (UTC)[reply]

Hi @MB-one - if I'm reading your suggestion correctly, I would wonder how many Flickr users have Wikidata entries... my guess would be not very many. You're right though, my initial suggestions may have too much redundancy - is that always undesirable? Ukglo (talk) 09:51, 7 August 2023 (UTC)[reply]
While not that many Flickr users have Wikidata entries, some do, and it has been a royal pain having to go through manually or in semi-automated ways to fix this when it comes up. It's easy to do a lookup in Wikidata to see if the Flickr ID is associated with a Q-item, so it should be easy to get this right when the case arises. - Jmabel ! talk 17:36, 7 August 2023 (UTC)[reply]
I'll make a note of this - it's a great example of a small move we could implement that will save labour later. I'm also now curious how many Flickr users have Q-items :-) Ukglo (talk) 15:09, 8 August 2023 (UTC)[reply]
Sorry - forgot to mention that we'd also try to make the connection between
Ukglo (talk) 15:11, 8 August 2023 (UTC)[reply]
I now know this! As of just now, there are ~2600 Wikidata entities that include a Flickr photo ID. Alexwlchan (talk) 22:06, 10 November 2023 (UTC)[reply]
in my personal opinion such redundancy should be avoided, as it can lead to more work without creating additional value. But as you said, probably only a small number of Flickr users has their own Wikidata entry (yet). So that would be another point of discussion: should we allow and enable on-the-fly-creation of new wikidata entries within the tool? My personal stance would be that at a certain number of transferred images by the same Flicker user (say 100+), a dedicated wikidata entry is warranted. But it should be discussed in Wikidata as well. MB-one (talk) 19:44, 7 August 2023 (UTC)[reply]
Hi @MB-one, thanks for this suggestion! I’ve incorporated this into the current Flickypedia data model.
In particular, it searches by both Flickr user ID (e.g. 64018555@N03Governor of Maryland (Q693032)) and username (e.g. ianemesIan Emes (Q5981474)). If it finds a match on either of them, it creates a statement P170 → Qxxx instead of the "some value" placeholder.
(I’ve been testing the SDC mapper on a variety of Flickr photos, and already come across a couple of randomly-chosen examples which had a Wikidata entity!)
I don’t think Flickypedia is the right place to create new Wikidata entities – we’re not going to build a better interface for creating new entities than Wikidata itself. I like the idea of counting photos transferred and using that as a rule of thumb for whether somebody needs a Wikidata entry, but that might be better off outside Flickypedia – there are plenty of photos in Wiki Commons which will never touch Flickypedia. Alexwlchan (talk) 14:03, 12 October 2023 (UTC)[reply]

Three systems for metadata, not two

[edit]

I think it is important to recognize up front that, in terms of metadata, there are really three systems here, not two.

  1. Flickr metadata
  2. Structured data for Commons (SDC), including depicts (P180)
  3. Wikitext on Commons, including categories
    • EXIF data implicitly makes it into the UI for this as read-only data, no action needed beyond uploading the photo.

The existing Flickr2Commons completely ignores SDC, and does a fairly decent job translating Flickr metadata to Wikitext on Commons. I would hope that this project will concern itself with both:

  • Providing decent support for SDC (which so far is what this page seems to be about) and
  • Providing improved support for wikitext on Commons.

I can see how it is tempting to focus on SDC, but the reality is that the bulk of Commons users (at least human Commons users, vs. bots) are much more focused on Wikitext and categories than on SDC and depicts. While any degree of support for SDC and "depict" will be welcomed, it should be an absolute requirement that support for Wikitext and categories be at least as good as in Flickr2Commons. I can pretty much guarantee that this is absolute baseline for community acceptance of this tool.

(Also, an aside on vocabulary: when Commons users refer to metadata, they often mean only EXIF and the like: metadata within the file itself. Yes this is an unconventional usage, but be aware of it, because there will inevitably be confusion in talking to some users, because they know the term only in that context, be prepared to define and to have to reiterate how the term is being used here. Yes, I consider the broad usage here more correct, but that's because unlike most of Commons users I'm a computer professional, not someone who learned their vocabulary in the Commons context.) - Jmabel ! talk 17:34, 7 August 2023 (UTC)[reply]

Just a quick note to say thank you very much for writing your thoughts down on all this. I am digesting it, and will definitely come back to you with questions/comments.
As I suggested above, I think our challenge for V1 of Flickypedia is finding a small SDC move (together, I hope!) that can demonstrate its potential, like making sure we are supporting the creation of the minimum fieldset.
I was also heading towards working explicitly with a single category in this first version, and that would be a category to gather all the images that come through Flickypedia in one place, like Commons:Files uploaded through Flickypedia (or similar).
Depicts... is way more complicated and I would like to talk more about that! Ukglo (talk) 13:08, 8 August 2023 (UTC)[reply]
Oops - I think I meant Category:Files uploaded through Flickypedia. Ukglo (talk) 14:01, 8 August 2023 (UTC)[reply]
Convenience link: Category:Files uploaded through Flickypedia. (We pretty much always link category names, even if they do not yet exist, so that if/when they do there will be a working link) - Jmabel ! talk 15:07, 8 August 2023 (UTC)[reply]
Right now, Flickr2Commons lets a user specify the categories. This includes setting categories for all of the images in a mass upload (e.g. from an album) and retaining categories by default when you upload a series of individual images. (In the latter case, it is often useful that the default is to keep the same ones, but small edits to the category set are easy to make.) I would say that for me losing that feature would be very close to a dealbreaker, and I'd be very surprised if that were not the case for the majority of people using the current tool. I do not think it is reasonable to lose basic capabilities of the existing tool.
If I'm transferring/uploading 100 images from an album, which have their most important categories in common, it is tremendously easier to specify those categories as part of the transfer/upload than to have to deal with it later. Yes, it can be done, but consider this analogy on Flickr itself: leaving that out is as if you couldn't add tags as part of uploading images. - Jmabel ! talk 15:07, 8 August 2023 (UTC)[reply]
Thanks @Jmabel - @Pigsonthewing had the same feature request, to retain the capacity to add at least 1+n categories as you upload, so we've noted that. I understand the utility you're describing. Ukglo (talk) 13:11, 15 August 2023 (UTC)[reply]
If you do want to do something with depicts, then the (sometimes buggy) Andorid app (there is no iPhone equivalent) has a good interface for picking a depicts subject for an image. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:51, 10 August 2023 (UTC)[reply]

Two folksonomies in Commons

[edit]

There are really two different folksonomies involved in Commons, and they have rather different rules (including rules of thumb). One is depicts (P180) in SDC (hereafter "Depicts"; the other is Commons:Categories. The most obvious difference is that Commons categories are very often intersections (e.g. Category:Destroyed buildings in Seattle); this is something Wikidata (and hence SDC) tries to avoid. However, there are other important differences:

  • The category system is much broader than Depicts. Depicts is intended to refer only to what is visible in the image; anything else should be handled by other properties. A good example of something that should almost never be "depicts" is location. depicts (P180) Seattle (Q5083) might be good for a satellite photo showing most or all of the city, but normally we would have location of creation (P1071) Seattle (Q5083). Similarly for date of photo (inception (P571)), creator (P170), collection (P195), part of the series (P179), genre (P136) and probably some other things I'm not immediately thinking of.
  • The category system is pretty mature. There are, of course, disputes now and then, but I would say without hesitancy that there is 95% agreement on almost everything important about categories among experienced Commons users. A lot of this is unwritten community consensus, but the consensus tends to be pretty strong. The biggest remaining area of disputes is between "lumpers" and "splitters" and, secondarily and related, about how strict we are about COM:OVERCAT, which says that we should almost always use only the most specific category available, and none of its ancestors. I'll get into this more below.
  • Depicts is less mature. We recently had quite a fiasco when someone ran a contest intended to get more "depicts" onto photos and literally the majority of edits made in the contest were deemed "bad" by more experienced users; I believe that after a few attempts at rule changes, the contest was cut short. Also, the Depicts part of Commons:Suggested Edits has been very controversial, with many experienced users including myself considering it an overall liability (to be fair, some are happier with it, but I don't think anyone considers Suggested Edits a raging success). Among other things, there is not a solid consensus on whether COM:OVERCAT applies. In principle, if only by analogy with Wikidata, it should, but there appear to be practical considerations because present-day tools don't seem to be very efficient about traversing the hierarchy of instance of (P31) and subclass of (P279). So there is a bit of a theory-vs.-practice fight here.

Also: both of these differ from Flickr tags in that we are trying to build a strong community consensus on how they are used, whereas Flickr tags are basically laissez-faire. - Jmabel ! talk 18:07, 7 August 2023 (UTC)[reply]

Hi @Jmabel - @Pigsonthewing and I had a good chat yesterday about this. We're considering proposing a new property specifically for Tag, or possibly Flickr Tag. That'll help people out there not put laissez-faire tags into Depicts, I think. Ukglo (talk) 15:46, 10 August 2023 (UTC)[reply]
(Sorry - should have proofread! - I meant a new Wikidata property, Tag or Flickr Tag.) Ukglo (talk) 15:47, 10 August 2023 (UTC)[reply]
There's also hashtag (P2572), into which a (very much) earlier proposal for a "Flickr tag" property was merged. I've demonstrated how that could be used in this series of edits; there are no doubt arguments for both that method of working and a discrete property. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:31, 10 August 2023 (UTC)[reply]
Maybe hashtag (P2572) and a qualifier? The qualifier could be stated in (P248) Flickr (Q103204)? - Jmabel ! talk 23:17, 10 August 2023 (UTC)[reply]
But a Flickr tag is not a hashtag :-) Ukglo (talk) 09:52, 11 August 2023 (UTC)[reply]
The name of the property isn't crucial; it has the alias "tag" (they could be swapped), and I added the alias "Flickr tag" recently . Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:04, 11 August 2023 (UTC)[reply]
I think the name of the property is important, and I feel "Flickr Tag" should be discrete from other types of keywords. There are billions of them, and by sheer scale alone warrant a specific situation. Ukglo (talk) 13:14, 15 August 2023 (UTC)[reply]
It would also save the additional qualifying step(s) that appear to be required in the more general case to specify that a particular tag is a Flickr tag, whether that's a human or machine additive step. Ukglo (talk) 13:15, 15 August 2023 (UTC)[reply]
Note also that the associated pages - e.g. https://hashtags-hub.toolforge.org/Uruguay - include links to the Flickr page for the tag. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 16:57, 11 August 2023 (UTC)[reply]
That's what I did in the example I linked to, above. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:04, 11 August 2023 (UTC)[reply]

For bringing tags into the wikitext, please see Template:Flickr Tags. That can be used at the tail end of the "description" portion of {{Information}} and other similar templates. For example,

{{Flickr Tags|Bangkok|temple}}

produces

Flickr Tags: Bangkok temple.

Jmabel ! talk 23:37, 10 August 2023 (UTC)[reply]

What does "bringing tags into the wikitext" mean? I think Andy mentioned something like this the other day.
Would we be doing that so the tags become searchable? If that's the singular reason, I'm not sure about it. That is removing metadata structure (by merging a tags field into a description field). But! Perhaps I'm misunderstanding. Ukglo (talk) 09:52, 11 August 2023 (UTC)[reply]
There's an example of the above template on File:TheEvens1.jpg. Yes, strings added that way become searchable. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 15:06, 11 August 2023 (UTC)[reply]
(cross-posted) It does make the tags searchable, but it also makes them visible on the normal page where someone looks at the photo.
Actually, now that you mention it, "description" is not ideal (though it's often been used this way; as on Flickr, the description field is a bit wide open). Best practice in {{Information}} is probably to use "other fields". Then you can do something like
other_fields= {{Information field|name=Flickr tags|value={{Flickr Tags|Bangkok|temple}} }}
which (when used in the Information template) will produce
(in left column) Flickr tags
(in right column) Flickr Tags: Bangkok temple.
Here's an example of someone having used it that way manually: File:1959 Danelectro U1.jpg.
You actually can use multiple {{Information field}} templates within "other fields", so it would also be possible to handle something else this way, if desired. Jmabel ! talk 15:13, 11 August 2023 (UTC)[reply]
Cool - why does it print "Flickr tags" again in the data bit on the right of that example?
Our challenge is to make a piece of software that is easy for people to use without having to learn intricacies of the Wikimedia Commons or Wikidata data structures. And, in particular, if they are Flickr users, they will be familiar with the idea of lightweight tag use, and probably expect something similar. Ukglo (talk) 13:18, 15 August 2023 (UTC)[reply]

Lumpers and splitters

[edit]

The biggest ongoing dispute about Commons categories is what I would call "lumpers" vs. "splitters" and, secondarily and related, about how strict we are about COM:OVERCAT. There is no question that Commons propagates categories far more readily than most of its sister wikis (we have at least twice as many categories as the English-language Wikipedia). I think there are several reasons for this:

  1. Almost any time we have content that corresponds to a Wikipedia article it is acceptable to make a category here (if only for purposes of interwiki linking), ecen though the average Wikipedia article will not have a category of its own. Thus, we are likely to have a separate category for any politician, athlete, musician, etc. who would merit an article (but not a category) on Wikipedia. This is entirely uncontroversial. The only reason I say "almost" is that if an image were to be used to illustrate, say, en:Argumentation theory, that wouldn't merit a category: the illustration is used for convenience and isn't really closely related to the topic.
  2. Our coverage is broader. For example, many people who don't merit a Wikipedia article have a Commons category (I'm a good example of that myself); similarly for events like an individual parade. Again, no controversy here.
  3. Because images, unlike Wikipedia articles, cannot be merged, we end up with a lot of something where Wikipedia has only one. Again, no controversy here.
  4. Related to that last: "splitters" win out over "lumpers" more here than in Wikipedia. So we end up with some very specific, narrow categories like Category:Female violinists from the United States or Category:Male artificial red hair in South Korea.

That last can be quite controversial (and I'm thankful that we don't end up with Category:Red-haired female violinists from the United States). I personally am more of a lumper: I can't for the life of me see why someone's gender, nationality, and the fact that they play the violin belong in a single category. But, COM:OVERCAT means that anything in these categories won't be directly in more general ones. If these categories are themselves large, that's probably OK, but I've seen situations where a category is split into a dozen or more subcategories, none of which have more than half a dozen members, so a group of maybe 70 photos is scattered across 15 categories, making it much harder to browse.

SDC and Wikidata largely avoid any equivalent of "intersection" categories, other than in their Q-items about Wikimedia categories (which are used mainly for interwiki connections). Thus the fact that someone is female and the fact that she is a violinist would be indicated by entirely separate statements (as would nationality, but that would never show up in Depicts, just in a Wikidata item).

Further remark on COM:OVERCAT: here's an example where I go somewhat against a strict interpretation of COM:OVERCAT, and I think there would be consensus (though not unanimity) in my direction. For File:Seattle - looking north on Fifth Avenue from just south of Pine - 2016.jpg, technically Category:Fifth Avenue, Seattle is redundant to Category:The Westin Seattle and Category:Nordstrom flagship store. However, Fifth Avenue itself seems to me to be the main subject of the photo, so I've kept it as a category. - Jmabel ! talk 18:42, 7 August 2023 (UTC)[reply]

Hidden categories

[edit]

@Ukglo: Just wanted to draw your attention to hidden categories, in case you aren't aready aware of them: "non-topical" categories used for internal purposes. For example, Category:Flickr review needed, Category:Flickr images reviewed by FlickreviewR, Category:Flickr images reviewed by FlickreviewR 2, and Category:Flickr images reviewed by trusted users are all hidden categories. Very useful for bot-driven processes, but also for things needing human review. - Jmabel ! talk 18:59, 7 August 2023 (UTC)[reply]

We are anticipating creating a Category:Uploads from Flickypedia (or something like it) so all uploads are gathered in this way - should this be a hidden one? We'd also want the UI to allow addition of at least one user-specific Category. Ukglo (talk) 15:48, 10 August 2023 (UTC)[reply]
This would probably be a hidden category, yes. But it's easy to toggle any category to hidden or back. There is no difference at the point where the category is applied to the image's page. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:47, 10 August 2023 (UTC)[reply]
These can be exposed using Preferences - Appearance - Show hidden categories. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 18:47, 10 August 2023 (UTC)[reply]

Dates

[edit]

The wikitext portion of Commons has quite a sophisticated ability to model dates, almost certainly encompassing and exceeding what Flickr has. When precise date (or date/time) isn't known, {{Other date}} and {{Circa}} should allow whatever Flickr has to be transferred without loss of information (and it also automatically accommodates different languages):

  • {{circa|1964}}: circa 1964
    date QS:P,+1964-00-00T00:00:00Z/9,P1480,Q5727902
  • {{other date|decade|1960}}: 1960s
    date QS:P,+1960-00-00T00:00:00Z/8
  • {{other date|between|1963-11-20|1963-11-31}}: between 20 November 1963 and 1 December 1963
    date QS:P,+1963-11-00T00:00:00Z/10,P1319,+1963-11-20T00:00:00Z/11,P1326,+1963-11-31T00:00:00Z/11
  • {{other date|after|1918}}: after 1918
    date QS:P,+1918-00-00T00:00:00Z/7,P1319,+1918-00-00T00:00:00Z/9
  • {{other date|between|{{circa|1918}}|1923}}: between circa 1918
    date QS:P,+1918-00-00T00:00:00Z/9,P1480,Q5727902
    and 1923

Wikidata's approach (which I believe is all accessible in SDC, though I might be mistaken) is documented at wikidata:Help:Dates. - Jmabel ! talk 19:17, 7 August 2023 (UTC)[reply]

Hi @Jmabel, this was useful, thanks. I've not got around to modelling Flickypedia dates in Wikitext yet, but I am starting to do them in the structured data, and I spent a lot of time reading that Wikidata help page.
"Date uploaded" is pretty easy, because Flickr always has a UTC YYYY-MM-DD HH:MM:SS timestamp for that. Maps straight across.
"Date taken" is a bit more fiddly – Flickr has a couple of levels of granularity for this date. We're mapping them as follows:
  • Y-m-d H:i:s ~ a date with precision 11/day
  • Y-m ~ a date with precision 10/month
  • Y ~ a date with precision 9/year
  • Circa … ~ this is always a year value, so we create a date with precision 9/year with a qualifier sourcing circumstances (P1480)circa (Q5727902)
Alexwlchan (talk) 14:07, 12 October 2023 (UTC)[reply]
Sounds reasonable. - Jmabel ! talk 17:25, 12 October 2023 (UTC)[reply]
For dates with times the other thing to consider is timezones: they'll get saved in SDC (presumably inception (P571)) as UTC, even though they almost always will not be (and it won't be simple to determine what timezone a photo was taken in, although if there are coordinates it's at least possible). This problem is hardly unique to Flickr photos of course, so maybe nothing special needs to be done here. I do think it'd be great if the values added could be accurate though! Sam Wilson 01:45, 19 October 2023 (UTC)[reply]

A couple of examples

[edit]

We’re starting to build the data modelling in Flickypedia now, and we have some examples of the output. We’d love to hear any feedback on what we’ve got so far.

There are some JSON examples here (this shows you the sort of thing we'd be passing to the wbeditentity API, and aren’t images on Commons yet): https://gist.github.com/alexwlchan/48f78a2ac5798d289795e7914e47c125

(Creating these JSON examples isn’t hard, so if there are other examples it’d be useful for you to see, please ask.)

And examples in Wikimedia Commons itself:

Alexwlchan (talk) 13:49, 12 October 2023 (UTC)[reply]

This looks pretty good! I have taken the freedom to vastly simplify the Information template, so that this draws most of the data from structured data. This will prevent data drift/duplication and probably also keeps the upload process easier on your side.
Data modeling looks OK to me. I haven't seen your specific construction for the source yet (different URLs for the photo page and the actual location of the image), but I can see its usefulness. Maybe I missed documentation on that. Spinster (talk) 17:26, 12 October 2023 (UTC)[reply]
@Alexwlchan: on that last example, may I presume that you also plan to do a wikitext equivalent for all that info, presumably using {{Information}}? If anything, that is the more important side, aimed more at humans and less at data-scrapers. - Jmabel ! talk 17:30, 12 October 2023 (UTC)[reply]
@Jmabel Ah yes, this is just the structured data mapping! I’m going to look at the Wikitext equivalent later.
I started with the structured data because that’s the bit I’m least familiar with and I left a TODO comment in my code to come back and sort out the Wikitext later. Alexwlchan (talk) 13:28, 13 October 2023 (UTC)[reply]
I want to emphasize that it's not necessary at all to duplicate the structured data into strings in Wikitext. When using a simplified {{Information}} template as in the edit I did here, the file will show information from the structured data to humans, to the Wikimedia commons search index, and to external search engines, automatically, multilingually, (!) and there will be no data duplication or data drift. That's the good thing about the Lua-driven templates that have been built by folks like Jarekt and Multichill. Spinster (talk) 07:26, 20 October 2023 (UTC)[reply]
For inspiration, you can also look at the way Multichill has done recent uploads from geograph.org.uk - for instance, look at this file with its Wikitext and structured data. I can imagine Flickypedia would stay close to this approach too, with a similar data model and simple Wikitext template. Maybe you can even closely copy and modify the Geograph from structured data template for Flickypedia purposes. Spinster (talk) 13:11, 20 October 2023 (UTC)[reply]
That's fine for such simple descriptions, but much more problematic when the descriptions get complicated, and it makes it very difficult for an average user if they need to elaborate an inadequate simple description. I can't immediately think of a good example of this on a file imported from Flickr, but consider something like File:Elliott Bay sunset, probably before 1889 - DPLA - cd9cd454e8804395a00fe402fc987c12 (page 1).jpg, File:1st Ave. S. looking north from S. Washington St., ca. 1876 - DPLA - 571301e7640245dfce8110b0e1b41c2c.jpg, or File:Parkour 01-1.jpg. - Jmabel ! talk 18:11, 20 October 2023 (UTC)[reply]

Not bringing over titles and descriptions?

[edit]

If I understand correctly, this seems to me that this will literally make this a less useful tool than what we have now with Flickr2Commons. If you are not bringing over titles, does that mean you are going to use random nonsense filenames?? - Jmabel ! talk 17:50, 12 October 2023 (UTC)[reply]

Or are you just saying it won't be in the structured data, which is entirely reasonable? - Jmabel ! talk 15:25, 13 October 2023 (UTC)[reply]

Coordinates and captured with

[edit]

We would like to check if these two properties should be brought back by Flickypedia through structured data on Commons: coordinates of the point of view (P1259) and captured with (P4082). Would those be good additions, as they are very used on Commons?

It would definitely save some time for some users who are always adding them or even because "Camera location" on the template comes through Upload Wizard automatically and coordinates of the point of view (P1259) is a property used on SDC a lot and that users and bots add to the files regularly.

captured with (P4082) is trickier as Flickr has a lot of images taken by modern cameras (e.g. glass negatives). However, on Commons, users have been adding metadata even about modern cameras, such as captured with (P4082) -> Nikon D4 (Q1968415) (as per this example, SD tab).

Any thoughts would be very helpful. If you can, let us know what you think, please! GFontenelle (WMF) (talk) 03:43, 18 October 2023 (UTC)[reply]

coordinates of the point of view (P1259), absolutely. Also very useful to get it into the wikitext as {{Location}}.
captured with (P4082): I'd call this "mostly harmless, but usually not all that useful." I know there are people on Commons who really like to add that stuff, and I welcome one of them to weigh in. I personally find it almost never of interest unless an unusual camera is involved, and pretty much a liability when it just says what device scanned a film photo. - Jmabel ! talk 05:54, 18 October 2023 (UTC)[reply]

Wikidata Property request in play: Flickr Photo ID

[edit]

Hi all - yesterday (or was it Tuesday?!) we put in a request to include Flickr Photo ID as a Wikidata property: https://www.wikidata.org/wiki/Wikidata:Property_proposal/Creative_work#Flickr_Photo_ID

We're interested to hear what everyone thinks of that. One huge benefit we see if that we'll be able to backfill the 10million+ Flickr images already in Wikidata with this data, and that will in turn help us do a stronger check for duplicates through Flickypedia. Please let us know what you reckon. Ukglo, flickr.org 🌸 (talk) 13:31, 19 October 2023 (UTC)[reply]

@Ukglo: I think that's great. It will simplify anyone's check for whether any image from Flickr is a duplicate. (Of course, there is still the matter of duplicates on Flickr, but something here can be multi-valued.)
Question: how would you plan to handle the case of images like File:Al Rochester, 1959 (52304119434).jpg where the initial upload was from Flickr, but then we uploaded a higher-resolution version from Seattle Municipal Archives? - Jmabel ! talk 18:06, 19 October 2023 (UTC)[reply]
Hi Jmabel - Flickypedia is only designed to support uploads from flickr.com, so, if you uploaded the second version via Flickypedia, we would do the check for any existing Flickr URLs in WM Commons. If these are two different Flickr images (which is sounds like they would be), we would not find the other version. (I think it's fair to say a higher resolution version is not a duplicate... would you agree?)
And if the upload of the higher res version from the Seattle Municipal Archives is uploaded outside of Flickypedia, there isn't anything Flickypedia can do about it. Unless I'm missing something obvious? Ukglo, flickr.org 🌸 (talk) 11:27, 27 October 2023 (UTC)[reply]
What I'm asking is, will duplicate-detection work if the original upload was from Flickr but was overwritten with a higher-resolution version (not from Flickr, it is directly from the online Archive in this case), and now someone tries to upload the lower-res version from Flickr again? Will Flickypedia detect that this is a duplicate of an earlier version of an extant file on Commons, and if it detects that how will it behave? - Jmabel ! talk 17:38, 27 October 2023 (UTC)[reply]
Gnarly question! Glad you're helping us think through some edge cases.
The upshot is, if that second upload "directly from the online Archive" obliterates Flickr-specific metadata, there's not much Flickypedia can do, because it's tuned to look for Flickr info (e.g. URL, etc). Ukglo, flickr.org 🌸 (talk) 10:41, 30 October 2023 (UTC)[reply]
When you say "metadata" do you mean in the image itself (which will of course be obliterated) or in the Wikitext (where it will not, because it remains the source for the license)? And I suppose whatever you are doing with SDC is relevant here; historically, Flickr2Commons has written to the Wikitext and a bot has come along later to copy that data to SDC.
This is not really an "edge case". Just from Seattle Municipal Archives there are probably several thousand such files, and I'm sure that is not the only archive where this is more or less the routine. - Jmabel ! talk 18:06, 30 October 2023 (UTC)[reply]

Some feedback based on User:GeographBot

[edit]

I run a couple of bots and one of them is User:GeographBot. Based on what I'm doing with that bot (see for example File:St Lawrence, Bovingdon - geograph.org.uk - 5746166.jpg):

Multichill (talk) 20:49, 16 November 2023 (UTC)[reply]

Thanks for the feedback! In reply:
URL: I can see an argument for getting rid of URL (P2699) in the long term, but I’m a bit wary because I imagine it might be used by other bots to go from WMC to Flickr? We have the new ‎Flickr photo ID (P12120) property which I hope will be a nicer way to do that eventually (once it’s backfilled) but I didn’t want to remove a field that other bots or tools rely on.
Creator: what do you imagine the risks of outing/doxing are? It’s unclear to me what you mean. We're only creating links based on information which is already public and easy to find.
  • We get the Flickr user ID from the Flickr API
  • We find matching Wikidata entities with the Flickr user ID (P3267) property
I can see there might be a risk if we were creating new properties in Wikidata.
e.g. we have some secret knowledge that Flickr user 1234567@N01 is the same as Wikidata entity Jane Smith, even though she wants to keep those two accounts separate. We proactively add the property to Wikidata and link those two things, outing Jane as the owner of that Flickr account.
But we're not planning to do any of that – Wikidata is a strictly read-only source for us. (You can see more discussion of this in MB-One's original suggestion above.)
What risks am I missing?
Instance of and MIME type: Thanks for the suggestion. Alexwlchan (talk) 10:03, 23 November 2023 (UTC)[reply]
@Alexwlchan: I'm talking about URL (P2699), not described at URL (P973). Why would any bot use URL (P2699)? How can bots rely on something nobody else is adding at the moment?
The fact that you can easily combine information doesn't mean you should so. Quite a few of the Wikidata items about Wikimedians have been created without any concern for privacy. Data has been combined without the consent of the person in question. If you use that, you're responsible for the exposure. This is a reason for me to oppose your bot request. Multichill (talk) 19:47, 18 December 2023 (UTC)[reply]
Hi @Multichill, thanks for explaining the issues with the Creator mapping. I was a little confused by privacy concerns, so I appreciate you taking the time to explain it further.
This was originally proposed by @MB-one further up on this same talk page. We were fairly new to modelling SDC at that point and it seemed like a good suggestion, so that’s why we implemented it. We’re happy to reconsider if the community consensus is that we should axe it, but I’d like to get MB-one’s perspective before I start making in-depth changes to the code.
(We’re planning to be away over Christmas, so we may not get back to this until January) Alexwlchan (talk) 11:02, 20 December 2023 (UTC)[reply]
Combining data from open source like this is a (light) form of OSINT. When setting up the original bot that converting wikitext to structured data, I recall several users voicing privacy concerns. Multichill (talk) 11:14, 23 December 2023 (UTC)[reply]
@Multichill Thank you for your input. I have to admit, that I didn't see this potential issue, while crafting my proposal. Having considered this, I personally would still prefer to have an existing Wikidata item linked. I'd argue that the potential breach of privacy should be dealt with at the source (in this case in Wikidata). Linking the wikidata item could even help to discover, that unwanted personal information is stored in Wikidata. So I'd wager, that the benefits outweigh the risks by agood margin. But I'm certainly open to opposing arguments. MB-one (talk) 08:46, 28 December 2023 (UTC)[reply]
I don’t really have a leg in this fight – I implemented it following a community request, and I thought it was something the community would be in agreement on. Apparently not.
For now, I'm going to remove the offending code for matching Flickr users to Wikidata entities. Here's the pull request: https://github.com/Flickr-Foundation/flickypedia/pull/377
@MB-one Thank you for the original suggestion! I like the idea and if you can get community consensus, it’d be easy to restore this mapping. And it was a useful way for me to learn some of the Wikidata APIs. :D
@Multichill Once this is merged and deployed, and we're doing a sans-Wikidata user matching, would you approve of FlickypediaBackfillrBot? Alexwlchan (talk) 13:42, 26 January 2024 (UTC)[reply]

Publisher??

[edit]

The correct property to use is content deliverer (P3274), not publisher (P123)

Listing Flickr as the publisher for every photo on their site is blatantly wrong Trade (talk) 12:11, 13 March 2024 (UTC)[reply]

Hi Trade! We don’t use publisher (P123) anywhere in Flickypedia – are you looking at published in (P1433), or something else? Alexwlchan (talk) 10:21, 22 April 2024 (UTC)[reply]