Commons:Village pump/Proposals/Archive/2018/12

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Proposal for Bot

I been playing with my bot (user page writes only! - see last five files listed here) looking at new images that source from Facebook, Instagram, Google, Twitter, Flickr, and have not been reviewed. It nicely finds plenty of images every day which need "no permission". It then seems logical that this could be extended to any new image with an external source and no unreviewed banner - and then rather than just make a list to a user page - add a category to the image to show it's in need of checking. Anyone think this might be good? Ronhjones  (Talk) 20:12, 5 December 2018 (UTC)

 Question So the bot would only categorize? Not tag for deletion? - Alexis Jazz ping plz 21:41, 5 December 2018 (UTC)
Tagging for deletion would need better AI than I could provide. I planned categorised only - something like "Files with external source not yet reviewed" Ronhjones  (Talk) 17:50, 6 December 2018 (UTC)
I would lean towards supporting that, but I'm afraid of the slippery slope. - Alexis Jazz ping plz 19:38, 6 December 2018 (UTC)

Normalize file extensions for new uploads

Automatically normalize file extensions for new uploads (existing files may be dealt with by another proposal, but not this one). For example, ".JPG", ".Jpg", ".JPEG" and ".jpeg" all become ".jpg". For TIFF, ".TIF", ".tiff", ".TIFF" all become ".tif". (of all the TIFF files on Commons, 83.5% has the ".tif" extension) The guidelines from the Library of Congress also support this (See Commons:Village pump#Normalize file extensions for new uploads). To realize this, I will create a Phabricator task if this proposal is accepted.

See also the related proposal directly below this one. Please vote on both.

Advantages:

  • Avoids confusion, like accidentally attributing File:Dedekind.jpg when one meant File:Dedekind.jpeg (this actually happened)
  • Makes life slightly easier for those who make bots/scripts/use VFC for certain tasks.

A related proposal from enwiki can be found at w:Wikipedia:Village pump (proposals)/Archive 74#Several changes to file naming.

Some statistics on current use (stats from July, includes redirects, current percentages may vary slightly. data provided as is):

Total jpg .jpg .JPG .Jpg .jPG .JPg .jpG .JpG .jPg
42453481 35229847 (83.0%) 7222882 (17.0%) 649 50 26 20 4 3
Total jpeg .jpeg .JPEG .Jpeg .JPeG .jPeG .JpEg .jpEg .jpeG
585411 576892 (98.5%) 7605 (1.3%) 891 (0.2%) 8 8 5 1 1
Total png .png .PNG .Png .pnG .pNG .pNg .PNg .PnG
2528817 2397286 (94.8%) 131488 (5.2%) 37 2 1 1 1 1
Total svg .svg .SVG .Svg .SVg .SvG
1377496 1376872 (100.0%) 618 4 1 1
Total tif .tif .TIF .Tif
1030310 1026794 (99.7%) 3515 (0.3%) 1
Total tiff .tiff .TIFF
199233 199213 (100.0%) 20

Votes

Please vote on both this proposal and the one below. - Alexis Jazz ping plz 23:25, 4 December 2018 (UTC)

  •  Support as proposer. - Alexis Jazz ping plz 23:25, 4 December 2018 (UTC)
  •  Support Obvious and useful. Pi.1415926535 (talk) 23:29, 4 December 2018 (UTC)
  •  Oppose JPEG to jpeg vice versa (including JPG/jpg) but  Support JpG, jpG, Jpg ect. Bidgee (talk) 00:16, 5 December 2018 (UTC)
  •  Support Great idea, hopefully can be implemented easily Abzeronow (talk) 03:51, 5 December 2018 (UTC)
  •  Question, would this prohibit files from being uploaded with "the wrong extension" or just automatically change them to lower case as part of the process? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:17, 5 December 2018 (UTC)
  •  Support Correcting all file extensions of new uploads to lowercase improves readability of the file names, in my opinion. This is why I support this proposal. Most likely, the only reason why Commons can not normalize all previously uploaded files to have file extensions with lower case, would be that the Mediawiki software is based on Linux, which operating system, unlike Windows, actually differentiates on a file path according to upper- or lower case. Linux does not care whether a newly uploaded file has a file extension in upper- or lower case. So normalizing during upload should not have any impact on the usability of the file. --oSeveno (talk) 11:15, 5 December 2018 (UTC)
  •  Support, with the caveat that "tif" be considered "normal" based on the statistics above, instead of the current bizarre "tiff".   — Jeff G. please ping or talk to me 12:23, 5 December 2018 (UTC)
  •  Oppose As no external standard has been agreed. The community should not be agreeing an off the cuff set of "best" extension types when external standards exist, and the Wikimedia underpinning database relies on file types being easily mapped to an accepted standard MIME type dictionary. For example the MIME types standard indicates the simplest mapping is to use .tiff and .jpeg, yet the majority of people would not be happy with these being forced. The proposal must include a full list of MIME types from the current database (not just "image") and give a firm rationale as to why we must discourage multiple mappings of extensions to a given MIME type. Harmonization by itself, is not a clear rationale from a developer's standpoint when the status quo is already the most widely accepted practice elsewhere on the internet, and is defined as accepted in external standards. -- (talk) 12:50, 5 December 2018 (UTC)
  •  Support Good idea. --Yann (talk) 13:03, 5 December 2018 (UTC)
  •  Support A file should be unique by name and not by extension: example.jpg, example.jepg, example.Jpg, example.JPg and so on. --Amada44  talk to me 15:09, 5 December 2018 (UTC)
  •  Comment I support what User:Krd said at COM:VP. Yes to jpg and JPG. No to jpG, jPg, Jpg, jPG, JpG, JPg. In other words, yes to the first three columns of the table, no to the rest of columns. I sometimes feel that JPG is a sign that indicates the image has been captured by a camera, and has not been transferred from the internet. However, I cannot back up my feeling with any evidence. 4nn1l2 (talk) 20:57, 5 December 2018 (UTC)
  •  Support. Good idea --AntonierCH (d) 19:58, 6 December 2018 (UTC)
  •  Support and while there may be no official standards, the statistics clearly speak for themselves as to what are the preferred extensions. --HyperGaruda (talk) 05:35, 7 December 2018 (UTC)
  •  Support for mixed case extensions, but  Oppose for extensions in only upper or lower case: It does not help for the millions existing files. Also, it seems to me in Unix and Linux the upper case is the default. Users are at least in German Wikipedia still uploading files with upper case extensions, so on moving to Commons there are frequently new files created here with what you want to avoid. So, your basic aim is actually not at all reachable, you can only teach users that they know about this fact. (I dislike the upper case file extensions myself.) — Speravir – 22:12, 7 December 2018 (UTC)
  •  Support The file extensions are not important, just standardize them across the board and go with that if it helps simplify things. Thanks. Mike Peel (talk) 23:42, 7 December 2018 (UTC)
  •  Support. Vulphere 04:10, 10 December 2018 (UTC)

Discussion

"would this prohibit files from being uploaded with "the wrong extension" or just automatically change them to lower case as part of the process?"

@Donald Trung: They should be automatically changed. - Alexis Jazz ping plz 20:25, 5 December 2018 (UTC)

"I sometimes feel that JPG is a sign that indicates the image has been captured by a camera, and has not been transferred from the internet. However, I cannot back up my feeling with any evidence."

@4nn1l2: If this proposal is accepted, I will ask in the Phabricator request if it's possible to add a note containing the original extension to the upload/file comment. Like "User created page with UploadWizard, file extension automatically changed from jPg to jpg". I don't know if that will be technically complicated/possible, but I'll request it. - Alexis Jazz ping plz 23:43, 5 December 2018 (UTC)

"Support for mixed case extensions, but oppose for extensions in only upper or lower case: It does not help for the millions existing files. Also, it seems to me in Unix and Linux the upper case is the default. Users are at least in German Wikipedia still uploading files with upper case extensions, so on moving to Commons there are frequently new files created here with what you want to avoid. So, your basic aim is actually not at all reachable, you can only teach users that they know about this fact. (I dislike the upper case file extensions myself.)"

@Speravir: A few things:
  • Uppercase is not the default for Linux. In fact, Linux doesn't give a rat's ass about file extensions. (some desktop environments and applications do though)
  • This proposal doesn't deal with existing files, this is true. This is a first step: stop new files with odd extensions from being created. If accepted, we can look into the best solution for existing files. There are several possible approaches for that.
  • There are 7222882 .JPG files (which indeed qualifies as "millions"), but all the others combined (those from the table, there are a few more like pdf and webm) total 921067. 576892 (more than half) of those are .jpeg.
  • When moving a file from dewiki to Commons, the filename should also be automatically normalized. MediaWiki afaik doesn't differentiate between new uploads and files moved from other projects. - Alexis Jazz ping plz 00:09, 8 December 2018 (UTC)
Dewiki to Commons: If so, then this is quite new. I did not observe this myself until now. This did not happen with NowCommons and on the page for the beta-feature FileImporter this is not mentioned, either. — Speravir – 00:19, 8 December 2018 (UTC)
@Speravir: I meant: if this proposal is accepted, the filename for files imported from other projects should also be automatically normalized. - Alexis Jazz ping plz 00:39, 8 December 2018 (UTC)

When moving files, move them to the normalized file extension

Related to the proposal above: when a file with a non-standard file extension is renamed (.JPEG, .pNG, etc) it should be renamed to the normalized extension. The only exceptions for this will be if the file is renamed for reason 4 (harmonize the names of a set of images) or if normalizing would de-harmonize the names of a set. In such cases, file movers should judge on a case-by-case basis if it's better to rename and normalize the whole set or keep the non-standard extension.

File movers won't have to normalize file extensions themselves, the rename gadget would be adjusted for this. - Alexis Jazz ping plz 23:25, 4 December 2018 (UTC)

  • Note: if you are concerned about the (unlikely) theoretical possibility of this proposal being accepted without the first proposal being accepted, you can support it on the condition the first proposal (normalize extensions for new uploads) is also accepted. That's how it was meant to be anyway. I'm sorry, I should have thought of this. - Alexis Jazz ping plz 05:33, 5 December 2018 (UTC)

Votes

Discussion

"How would the opt-out functionality be available - something like a pre-checked tickbox that the file mover can uncheck if needed?" @Pi.1415926535: That's a possibility. Perhaps the person requesting the move could also be warned (but not blocked) if they try to request a move to a non-standard file extension. The exact technical implementation should be looked at together with file movers. - Alexis Jazz ping plz 00:18, 5 December 2018 (UTC)

I created this as a separate proposal as to not overcomplicate the first proposal. The first proposal (for new uploads) doesn't depend on this one, but this one (normalize when renaming) kind of does depend on the first proposal. After all, it wouldn't make much sense without it. I don't believe there would be anyone who would oppose the first proposal while supporting this one. - Alexis Jazz ping plz 05:33, 5 December 2018 (UTC)

 Info This proposal can be closed, because the normalisation on file moving is already in effect. BTW this creates tiff from tif/TIF. — Speravir – 22:15, 7 December 2018 (UTC)

@Speravir: you appear to be right. But as you can see on File:Hugh Riminton 2011 (cropped).jpeg, the yellow {{Rename}} box still shows the current extension, even though it will be renamed to jpg. Bug probably. - Alexis Jazz ping plz 23:31, 7 December 2018 (UTC)
@Alexis Jazz: Template {{Rename}} uses Module:FilePage. This is not the file renaming script. Nonetheless, this looks to me like a bug, too, given that you explicitely ask for .jpg. — Speravir – 23:48, 7 December 2018 (UTC)
So, I just moved the file according to your request, but it has been renamed to File:Hugh Riminton in 2011 (cropped).jpg. The renaming form suggested the new name, and even although I manually changed to .jpeg this has not been granted (as expected by me). — Speravir – 23:54, 7 December 2018 (UTC)

Any use for a "rare character" index?

Hello! There was recently a discussion at Extension:CirrusSearch about creating a new search index for "rare" characters that are currently not indexed by the on-wiki search engine. The three examples of difficult-to-find characters given were (Ankh), (ditto mark), and (ideographic closing mark). (Note that you can currently do an insource regex search like insource:/☥/, but on large wikis this is guaranteed to time out and not give complete results, and it is extremely inefficient on the search cluster.)

We can't index everything—indexing all every instance of e or . would be very expensive and less useful than , for example. So, in English, we would ignore A-Z, a-z, 0-9, space, and most regular punctuation (exact list TBD) and index pretty much everything else.

The most plausibly efficient way to implement such an index would only track individual characters at the document level, so you could search for documents containing both and , but you could not specify a phrase like "☥ 〆" or "〆 ☥", or a single "word" like ☥☥ or 〆☥.

I've opened a Phabricator ticket T211824 to more carefully investigate such a rare character index, to get a sense of how big it would be and what resources it would take to support it. If you have any ideas about specific use cases and how this would or would not help with them, or any other thoughts, please reply here or on the Phab ticket. (Increased interest increases the likelihood of this moving forward, albeit slowly, over the next year.)

Thank you! TJones (WMF) (talk) 16:29, 14 December 2018 (UTC)

  • I would  Support this, although I am genuinely curious how and why these characters are being used, 〆☥ seem to be supported by my device while Tangut characters aren't so maybe they're not as rare as they receive mainstream support. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 13:34, 18 December 2018 (UTC)
    • "Rare" is definitely a relative term. We haven't nailed down the specifics—that's part of what would need to be investigated—but it might well be the case that on English-language wikis, Cyrillic and Greek characters might count as rare, or ñ or ß might be rare, though they are all well-supported by most devices, and are not at all rare in Russian, Greek, Spanish, or German, respectively.
      The specific examples of ☥, 〃, and 〆 came up in the original discussion as characters that are ignored by the current language-processing, so they aren't being used in any particular way. Also, whether or not your device supports particular characters, you can usually still use them, you just can't see them. For example English Wiktionary has an entry on Tangut 𗌜 (which I can't see as I copy it here), and you can find it by searching it on English Wiktionary. However, it is discarded by the normal language analysis on Wiktionary, so the only result is the one exact title match. That's where the rare character index would come in. Some sort of search like char:𗌜 would find all documents with that character in it.
      Another good use case would be limiting a complex insource: or intitle: regex search by limiting the regex scan to only documents that contain a particular character (or possibly characters from a Unicode block), making the regex scan feasible.
      Commons is, unfortunately, an extra complicated case because the default language analysis for search here is English, but there is lots of text that is not in English, so how "rare" characters are best defined here might be harder to figure out. TJones (WMF) (talk) 17:26, 18 December 2018 (UTC)
      • @TJones (WMF): , that does sufficiently explain why such an index would be desirable. I do hope that it will be implemented as it won't hurt to have some extra features, even if it will have a rather limited (or niche) audience. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 19:04, 18 December 2018 (UTC)
        • I hope so, too, because I'm part of the niche audience that would really appreciate it! It's a matter of scoping and prioritization, and there is always more work than can get done, but I'm hoping that I can get to the investigation of how big such an index would be next year, and if so, that it turns out to be not too costly and relatively straightforward to implement. My biggest concern is that determining what is "not rare" for every language will be both necessary and time-consuming. TJones (WMF) (talk) 19:18, 18 December 2018 (UTC)

Use of Eurostat tag

Hi, I have been trying to fix the links to the {{Attribution-Eurostat}} licensing tag, and am finding that there are lots of files on Wikicommons that are really sourced from Eurostat, but where editors have edited the Eurostat data so that they can attribute the material under their own license (however, they still list Eurostat in the name or description or even source). I wonder if editors are aware that Eurostat material is not copyrighted (per the Eurostat tag), and that they can use it directly (with the tag)?

It would be better to have material that is sourced from Eurostat (higher quality), rather than a material from Eurostat that has been edited in some way and lost its Eurostat sourcing. Eurostat is a massive database. This issue occurs with other EU Commission material, most of which is not copyrighted (per Eurostat tag). My proposal is that the Upload template should have an EU Commission section (as it does for the US Government and for Flickr), that would remind editors that most of the EU Commission's material is not copyrighted? thanks. Britishfinance (talk) 15:56, 30 December 2018 (UTC)

@Britishfinance: isn't it possible that a lot of those uploaders were simply unaware of the existence of the {{Attribution-Eurostat}} license tag? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 11:49, 31 December 2018 (UTC)
@Donald Trung: that is my proposal, I don't think people are aware of the {{Attribution-Eurostat}} license tag, and that we should, therefore, incorporate it into the upload wizard (as we do for U.S. Government and Flickr). Eurostat is a massive database (and it applies to almost all other EU Commission media). thanks. Britishfinance (talk) 11:52, 31 December 2018 (UTC)
In addition, I do believe that more of the license tags list should be incorporated into the upload wizard interface. There has been a lot of good work done on these tags and I think that editors are uploading material as own work (by editing source data), whereas with knowledge of the right tags, they could use the source data, and it would have more integrity/weight (i.e. is it not better to have Eurostat's original graphic of an item, then a random editor's re-edit of the data)?. Britishfinance (talk) 12:06, 31 December 2018 (UTC)
That does sound lime a good idea, Flickr as a website has been on the decline so if a new website would fill its void I think that we'd incorporate that into the MeediaWiki UploadWizard too, so it would make sense for something a bit less "trend-dependent" as a government (or intergovernment) website to also be on that list. You could also propose it at "Commons:Upload Wizard feedback", but I highly doubt that that page is still active. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:36, 31 December 2018 (UTC)

Split up Freedom of Panorama country list

Commons:Freedom of panorama contains a list of FoP rules for specific countries, now transcluded from sections in recently-created articles giving the copyright rules for each country, such as COM:CRT/Algeria#Freedom of panorama. The page is large, with a post‐expand include size of 1,290,255, and may load slowly. During clean-up of the country-specific articles I have been adding FoP information to country articles where it was missing, but have held off on adding the countries to the Commons:Freedom of panorama countries list, which may get close to the breaking point.

This is to propose dropping the list from COM:FOP, leaving links to smaller lists for Africa, Americas, Asia, Europe and Oceania. The smaller lists will have entries for all countries, so will pick up new ==Freedom of panorama== sections in the country pages automatically. See Commons:Freedom of panorama/Africa for an example.

Inbound link issue

The main issue seems to be the many links, mainly from archived deletion discussions but also from files etc, to sections in the present FOP country list, often using shortcuts. E.g. 1,314 occurences of links to "COM:FOP#France". One way to resolve this would be to replace the country list with a multi-column list of anchors and new shortcuts with entries like

* {{Anchor|Australia}}[[COM:FOP Australia]]

which would display like

Shortcuts to country-specific rules

COM:FOP#Australia, would jump to Australia's entry in this list (click on COM:FOP#Japan to see the effect), and over time people would learn to use the new shortcut. It is not a particularly elegant solution, but the problem is creeping up on us, so will need some sort of solution sooner or later. Aymatth2 (talk) 19:46, 17 December 2018 (UTC)

Comments

Done. Since there were no objections, I have implemented the change. So COM:FOP#France, for example, leads to the COM:FOP France entry in the shortcut list, which in turn leads to the rules for France. There are no broken links, and it should be easy enough to get used to the new shortcut convention. Aymatth2 (talk) 13:38, 4 January 2019 (UTC)

Better handling of file copyright releases with source linkrot

After this deletion request raised by a Commons Bureaucrat, I have been nudged to reconsider that our norms need better formalization into guideline, so that administrators and others interested in deletion requests can act consistently when sources are suffering from linkrot, so there is a practical presumption of good faith, for files where there is no prior pattern of related challenge or demonstrated copyright violations.

It is a statement of fact that every website on the internet has a limited life, and the scope of this project should include the preservation of useful educational content that might otherwise vanish or become much harder to access. It has to be accepted that independent license reviews are not expected or enforced for all Commons uploads.

Proposal: For an in scope file where the uploading account has a reasonable history of good uploads, but the source is no longer available, or the release at source has changed to a non-free license:

  • A file which is part of an upload project of more than 1,000 files and where there is consistency of licensing, will be presumed to have been uploaded with a valid release.
  • A file which has been automatically license checked, and this can be verified in the upload code, or confirmed via a project page explanation, will be presumed to have been uploaded with a valid release.
  • Uploaded files that have been hosted for more than 2 years, where the uploader account has made significant contributions and there is no related pattern of deliberate copyright violation from the source, or by the uploader account, will be presumed to have been uploaded with a valid release. This will apply regardless of the uploading account having since retired.

These guidelines can be added to Commons:License review, unless there is a better place for them to be added.

Please vote on the principle of having this guideline. The numbers (1,000 files, 2 years) may be adapted depending on comments or evidence raised, or even later copyvio related case evidence. -- (talk) 12:54, 11 December 2018 (UTC)

  •  Support as proposer. -- (talk) 12:54, 11 December 2018 (UTC)
  • Could we not simply have a discussion rather than a vote? Currently the issue and criteria seems to be narrow to the point of involving one user: Fae. Are there other cases? It seems to be saying simply that your uploads should all be trusted and never speedy deleted and default keep at DR unless demonstrated otherwise. I'm sure you do take care but are only human and the source organisations/people are also only human too. We have had cases where apparently trusted organisations uploaded and tagged-as-free images they shouldn't. I recall one notable example where NASA on Flickr uploaded an Annie Leibovitz photo to their PD stream, which was uploaded by an "account [that] has a reasonable history of good uploads". Wouldn't it be better to have ensured the images were reviewed four years ago? Do we have a means to bulk-review a batch of images from one source? If we don't have a means to review images as quickly as some users upload them, shouldn't we fix that, rather than just create exceptions at DR? This seems to be a tail-wagging-the-dog solution. -- Colin (talk) 15:20, 11 December 2018 (UTC)
  •  Comment I have suggested that these files should be license reviewed. Fae opposed that. So not surprised that this comes up here... Yann (talk) 15:31, 11 December 2018 (UTC)
    Please provide a link to that discussion. Thanks -- (talk) 16:09, 11 December 2018 (UTC)
  •  Question Why were these not marked for license review when they were uploaded? If you are that concerned about it they should have been checked by another person. This is an image uploaded from an external site by someone who was not the photographer. That pretty much screams either LR or OTRS. Requesting such a thing would have solved all the issues. To presume anything is to throw out COM:EVID and I'm not really all that comfortable with that. I've found copyvios that have been here for 10 years (using archive sites to verify that they were indeed a copyvio at the time of upload). Putting a year limit on it in the guise of COM:GOF is also not a good idea in my opinion. I'm not entirely familiar with "automatic license checks" as stated in your second bullet. So if you could explain how that works I'd appreciate it. --Majora (talk) 22:36, 11 December 2018 (UTC)
The example DR is such an automatically license checked batch upload, the metadata of the file was checked as having a specific free license ("ATTRIBUTION_SHARE_ALIKE") and if it did not match it was skipped. You can imagine this is the same sort of check that we get with the automatic flickr reviews. There are many example projects that do exactly this type of specific automatic license checking at User:Fæ/Project_list (most recent example). In these cases, investing programming time writing a unique bot script customized for each batch upload, to do exactly the same check, adds zero value, and, even worse, marking literally millions of files for manual checks would be a bizarre waste of volunteer effort.
If it were policy, I could backdate my uploads with {{LicenseReview}} where they do not have one. This would probably mean more than two million files needing manual review as there are no automated bots to handle these, further I would not be allowed to write a bot to do the reviews, even where I have code available, as this would not be independent. This number is large enough to pretty much guarantee the backlog would never be resolved, yet I am hardly the only uploader on this project, and if we add LicenseReview to all our previously uploaded and not-independently-reviewed files, the backlog would be a magnitude or two larger than that. -- (talk) 23:02, 11 December 2018 (UTC)
Hmm...I'd be interested to learn more about this automated checking process. If something is going to come from this your second bullet is the best way forward in my opinion. Where in the metadata is this information stored? I did a more in depth search of the metadata using a third-party viewer and I'm not seeing it. I'm guessing by "file metadata" you really mean page source? That appears to be what User:Fæ/Project list/OpenBenches is working off of as well but please verify. Also, what does the checking? Is it part of the GWToolset? Is it coding that you do yourself when you create one of your projects? Obviously when we are talking about millions of images there has to be some give and take and I'm ok with that. But I just want to make sure all avenues are being explored here and that I fully understand what is going on during these uploads to make an informed comment. --Majora (talk) 23:20, 11 December 2018 (UTC)
The OpenBenches project has metadata available as JSON queries, see the project page. The way the site works is completely unique to that project and the uploads are via a customized Pywikibot script. Were someone to try to use GWT, it would mean extracting the source data from Github (where it happens to be published) and converting it to XML, which would be more complex (in my view) than writing a custom script. The license verification looks at the JSON data for the photograph and does exactly the license string check as explained on the Commons project page. The OpenBenches programmer made the image license data available this way, specifically because I discussed the Commons upload project with him. BTW, in terms of my projects, this is a rather small one, compare with User:Fæ/Project_list/PAS which is "similar" because the metadata is in JSON format, but is fifty times larger. However in the same way as OpenBenches, there is no Commons bot that can be easily tailored to include it to automate independent license reviews. -- (talk) 23:29, 11 December 2018 (UTC)
(Edit conflict) Let me first just apologize for all these questions. Obviously what you are doing is above my level of knowledge so I appreciate you bearing with me while I wrap my head around this. If I'm understanding the above correctly, you write unique queries that pull information off of sites based on how that information is stored (JSON for OpenBenches, etc.). If this is true would it be possible to publish this query so that it can be part of the public record on these images? I see part of the code you used on the OpenBenches information page but not the whole thing. If the entire thing is available it allows others to check your work that way. I feel like that would be a massive step in the right direction towards some sort of verification process. --Majora (talk) 23:41, 11 December 2018 (UTC)
I am prepared to publish the license checking part of my code, but that pretty much is exactly what is on the example project page, it's a cut & paste of a couple of lines from 800 lines of code, and I could continue to do that with future projects, but not retrospective projects especially as in some cases I have probably lost the original code. I do not publish all of my code, that is not a criteria for contributing to this project, nor would it be genuinely useful (in 8 years, there are probably 2 people than have used my code already published on Github for anything, and they did not simply reuse my code; the Github link is even published at the top of my user talk page). However, keep in mind you are only addressing me, there are many other uploaders with customized batch upload projects, and there are no criteria on them for how to do it. -- (talk) 23:50, 11 December 2018 (UTC)
Obviously this is an exploratory proposal that would not just apply to you. Unless the suggestion above made by Colin that this is really a narrow proposal for one was fact. In any case, I really only care about the way licenses are checked. The rest of the code is rather useless in terms of this discussion. As for github, digging through there should not be a requirement. If you could publish the license checking part of your code so that it can be verified that would be preferable in my opinion. This goes for others as well. The way I see this is, essentially, these batch uploads are close enough to bots without actually being bots. Most bots have some sort of code published that can be reviewed to make sure it is doing what it says it is doing. I could see, and probably support, a policy that allows for blanket acknowledgement of acceptable licensing of batch uploaded images provided source code for checking said licenses is published and retained in an easily seen format (such as a project subpage like you have listed above). I'd support this going forward and as for older images we can talk about some sort of grandfathering. --Majora (talk) 00:00, 12 December 2018 (UTC)
  1. "A file which is part of an upload project of more than 1,000 files" is such a large number not many uploads will be able to apply for that. It also seems arbitrary. Why not 2,000? Or 500? And what does "where there is consistency of licensing" mean? If none of the files were reviewed and all the source links are dead, but there are more than 1,000 files we keep?
  2. "A file which has been automatically license checked, and this can be verified in the upload code, or confirmed via a project page explanation, will be presumed to have been uploaded with a valid release." should be better defined (like you're now explaining it a bit more above). I would suggest splitting this proposal and make it more clear what each bullet point stands for. - Alexis Jazz ping plz 23:37, 11 December 2018 (UTC)
The vote is for the principle, the criteria can be hammered out. With regard to "1,000", the point is really "the uploading account has a reasonable history of good uploads", so yes if the source vanished for all 1,000 uploads, the debate would still be why do we presume bad faith for the demonstrably good uploader. Deletion in those cases should require some real evidence of a copyright problem, rather than a theoretical concern because someone realized that the source website has been knocked offline, or as has already happened with the very large PAS project, the links no longer function as the database has had a redesign. -- (talk) 23:41, 11 December 2018 (UTC)
  • I'm not sure if we can have a clear-cut guideline on this. It would involve different factors - how mature the semi-automation system/procedure was at that time of the upload in question (earlier uploads may be more prone to mistakes and bugs), how standardized the markup across pages at the source web site were, etc, beyond the general trustworthiness of the uploader. It seems taht the discussion here focuses on sources that already suffer from linkrot, but going forward, I'd encourage mass uploaders to archive evidence before linkrot happens (ideally, before every upload). Services like web.archive.org and archive.is can be used to store evidence semi-automatically. whym (talk) 14:10, 14 December 2018 (UTC)
I believe this is unrealistic, unless the IABot starts including Commons. If you can show/demonstrate how I can add, say, all of finds.org.uk images to IA, currently 440,000 images on Commons, and return those archive links so they can be added on Commons image pages, I would be grateful. Thanks -- (talk) 14:24, 14 December 2018 (UTC)
Here is a quick and dirty code to call web.archive.org in Ruby:
require 'open-uri'
require 'uri'

def run_web_archive_org(uri)
  open("https://web.archive.org/save/#{URI.escape(uri)}") do |f|
    if f.meta['content-location'] then
      puts "<https://web.archive.org#{f.meta['content-location']}>" # location of the archived page which can be fed to an file-page editing script
    else
      puts "Error? #{f.meta.inspect}"
    end
  end
end
Does this help? (You will somehow have to throttle the HTTP requests if you have thousands of pages to be archived, though.) --whym (talk) 14:50, 14 December 2018 (UTC)
I could do something similar in Python, but extending IABot makes a lot more sense as I can piggyback on whatever error traps exist there, rather than reinventing the wheel, plus it can act retrospectively on my past millions of uploaded files and that housekeeping can be run by anyone not just me. I'll think about it, it's very close to Christmas to do anything right now. Anyway this proposal was generic, not just me solving how to do my own Library of Congress or Finds uploads a bit better. -- (talk) 14:56, 14 December 2018 (UTC)
Update A test run has started on trainpix uploads, adding Wayback templates and requesting InternetArchive archives the source page where it does not already have it (which seems most of these). If this works well, and I do not get blocked for mass processing, this may be turned into a more general housekeeping task, and I may incorporate it as a task that happens just before the file is uploaded to Commons. Doing this, clearly, makes any question of license review redundant. -- (talk) 21:00, 4 January 2019 (UTC)
, I am confused why much of your activity on Commons does not fall into Commons:Bots and thus require specific community approval and a separate bot account. I cannot see why adding Wayback templates, for example, is not a bot activity. My guess that is often you are reprocessing your own uploads so nobody cares but that itself isn't something our policy seems to permit. If you don't respond to this reasonable question, I shall have to make an admin request to block your main account, so could you be reasonable about this and explain why our bot policy does not apply to you? -- Colin (talk) 12:16, 5 January 2019 (UTC)
There are quite a lot of images uploaded every day with no request for a license review. I've been running User:RonBot Task 1 every day, and then adding "no permission" tags to lots of images. Which is why I posted the suggestion above for a new bot Commons:Village_pump/Proposals#Proposal_for_Bot, to slowly expand on that Task, and putting the results in a sensible category for license review to be done. I think the idea of allowing a time limit on review will result in copyright images being accepted - I too have found bad images way more than two years old. Ronhjones  (Talk) 00:40, 17 December 2018 (UTC)
  • I would generally  Support this proposal, but to me this could leave room for mistakes to happen and although this policy will only apply to otherwise highly trusted users (users such as Fæ), however there should be a better system where external links are automatically archived, a couple of months ago I suggested that the InternetArchiveBot or a similar bot would run on Wikimedia Commons as it preserves web pages on Wikipedia's however the discussion lead to nowhere and while the operator of the InternetArchiveBot showed that they weren’t interested in this, another user was interested but eventually no action was taken. Originally I was also planning at proposing the archiving of web links at the 2018 Community Wishlist but as I wasn't unblocked on the Meta-Wiki that clearly also didn't happen, as the Community Tech Bot actually runs efficiently for these kinds of tasks it's a shame. As Wikimedia Commons doesn't really have any direct support from the Wikimedia Foundation it's clear that the ageing software on this platform rarely gets updated and very few new features (if ever) get implemented, Fæ’s proposal isn't perfect but in the current environment it's the best we have, unless “the Technical community” would actually start to provide tools for the maintenance of files on Wikimedia Commons rather then just let everyone take both Wikimedia Commons and its files for granted then nothing will change, in fact the only time the Wikimedia Foundation seems to openly operate here is to delete things, not create them. I agree with the principle that these files should be preserved indefinitely and that we should actively try to discourage the deletion of files solely on the precautionary principle if the license from the historical website can’t be found. To give a recent example “Commons:Deletion requests/File:TrueSmart Beach (10).jpg” is a deletion request where a user (presumably) uploaded a free file from a website five (5) years ago but today no evidence of this license exists, yes this could've also been simply solved if the file’s license had been reviewed when it was uploaded, but if an archive of the website existed today of how it looked half a decade ago any license reviewer would've been able to debunk/confirm the license today. It would probably be best to try and solve the underlying (technical) issues rather then try and play these policy gymnastics, it seems to be a common trend among Wikimedians to build policy based on the limitations of the system rather than try to improve the system itself, let’s say the cycle works like this “The Wikimedia Foundation implements a half-baked feature (there may or may not have been a demand from “the community” for this), a better more complete feature is “promised for the future” but seemingly never comes to fruition, the users of the website start using this half-baked feature and starts creating (mostly unwritten) rules around this feature, these rules seem to be inconsistently applied by only the admins who believe in them and remain largely a secret for the out-group unless they would seek out the pattern, votes on solidifying these rules rarely seem to work and the preference for these vague unwritten rules seems to prevail” (this is probably because it gives admins more leeway, but this will inevitably confuse new users and makes the system looks unfriendly, there’s always an unwritten rule somewhere waiting for you to bite you in your hindquarters). It would also be wise if the state of these unwritten rules would be brought for scrutiny by the community rather then be dependent on the whim of the admin. Free within scope files get deleted everyday because the uploader didn't know how to do something properly, I would also suggest adding more manual links to the Internet Archive however the Internet Archive website won't load on my device anymore so this could too negatively impact users if such a rule was implemented. The best solution would still be to automatically archive links, and this is something that should be done ASAP but there doesn't seem to be any support for actually fighting the linkrot that’s so rampant on this project. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 13:26, 18 December 2018 (UTC)
  •  Oppose the proposal. We need better ways to quickly carry out independent licence review on images that are being bulk uploaded. Several users, either with home-grown closed-source unauthorised pseudo-bots or using flickr2commons and variants, are vastly exceeding the capability of the site's volunteers to monitor input. In think we need more emphasis on users to ensure they are helping to avoid simply creating a huge backlog, rather than making proposals that sweep the backlog under the carpet. -- Colin (talk) 11:17, 29 December 2018 (UTC)

Lift the delinker ban of: No replacement of images in other formats with SVGs

One of Delinkers rules is: No replacement of images in other formats with SVGs. To avoid World War III, CommonsDelinker will ignore a command to replace an image if the new image is in an SVG format and the original is not.

I don't think we would enter WW III anymore and sometimes its really inconvenient not being able to replace a file with an svg. Amada44  talk to me 20:53, 28 December 2018 (UTC)

 Weak support, but should be used with care (assuming there was a sound reason for this restriction when it was put in place). - Alexis Jazz ping plz 10:52, 29 December 2018 (UTC)
The heavily used, optimized png icons? Would it be sufficient to state that in the rules? e.g. Do NOT replace optimized png images with svg images. Amada44  talk to me 12:15, 29 December 2018 (UTC)
@Amada44: make it something like "do not replace raster images with svg images in templates and do not replace optimized raster icons in general". - Alexis Jazz ping plz 12:57, 29 December 2018 (UTC)
Actually, add "without consensus" to that. With consensus you could do those things too. - Alexis Jazz ping plz 14:56, 31 December 2018 (UTC)

Amada44 -- the restriction was applied because there have been some people on Commons in the past who were extremely zealous and rigidly inflexible in replacing all usages of a non-SVG file and trying to get it deleted whenever a claimed SVG equivalent was first uploaded, regardless of whether the SVG file had problems or was not truly equivalent... AnonMoos (talk) 15:50, 14 January 2019 (UTC)

@Amada44 and AnonMoos: that's a valid point, I've changed my support to weak. This is actually not a problem with image replacement but a problem with deletionists. In fact, I know who this is about. I just noticed them at it again. (though there are probably multiple people who do this) - Alexis Jazz ping plz 19:14, 14 January 2019 (UTC)

Exhibitionist uploads

I regularly see uploads low quality amateur porn which doesn't serve any educational purpose. It is usually a man masturbating and uploading a video of it, with no contributions to other projects. It seems plausible that there is some offsite collusion, although I don't know where. This has been an ongoing issue for a while. See these accounts in just the past couple of days:

Don't get me wrong. I believe sexual content is necessary. But these uploads are disruptive; Commons is not an amateur porn site, and it drives users away when they see their uploads in the same place as useless amateur porn.

As such, I believe we should add a CSD category for amateur porn with no educational value from non-contributors. Just like COM:CSD#F10, the keywords are non-contributors and low to medium quality. Magog the Ogre (talk) (contribs) 22:06, 24 December 2018 (UTC)

Spam is a worse problem, especially as several pron warriors lurk on this project. Amateur glamour stuff should go through deletion like any other file. The problem with "no educational value" is that every week I see perfectly valid files, including in scope nudity and sexuality related material, having speedy tags, wrongly, with this claim.
PS Christmas is not the right time to raise policy proposals. -- (talk) 23:03, 24 December 2018 (UTC)
We have deletion requests for a reason, but what constitutes "spam" because my impression is that everyone seems to have their own opinion on it, someone who imports a lot from the same website is "a spammer" then you would be considered one too as you're bombarding Commons with external links and Wikipedia doesn't care if that link is to a GLAM so why would Commons? I think that "spam" is too vague of a term to actually enforce here because anything less than negative of something could be seen as "promotional". Anyhow the deletion request system is the best option because I can see people abuse {{Speedy}} for basically anything, in fact regarding porn deletion requests there seems to be a user who just writes "{{Vd}} Worthless, poor quality, redundant and out of scope." At every DR that contains a penis even if it is in use or of high quality, now imagine if people like this had a policy they could abuse. DR's are also more transparent than speedy deletions ans though I agree that most of the bad quality images of porn aren't fit for this project, allowing careless deletion could do more harm than good. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 08:01, 25 December 2018 (UTC)
  •  Comment Deleted. Such files can always be speedy deleted as copyvios and/or vandalism. I am not sure a new deletion reason is necessary. Regards, Yann (talk) 12:05, 25 December 2018 (UTC)
  • @Yann: , I agree that no new rules would have to be established for this, but it could be an expansion of harassment rules as according to Cybercivilrights.org "w:en:Revenge porn" does have special legal ramifications other than merely a copyright violation, and I think that we should treat people who post revenge porns/pornography (such as unauthorised dickpics) to Wikimedia Commons should be treated as people who post legal threats. A couple of years ago someone I know went through a divorce and his ex-wife posted his nudes online, let's just say that he now has full custody over their children and she has a restraining order against her so "revenge porn" can have legal effects. I just hope that the poor man whose penis was uploaded wasn't harmed by those images. But Wikimedia Commons fals under the safe harbour provision of the DMCA anyhow so I don't think that "revenge porn" constitutes a special threat to the project other than "regular copyright violations", but we should be weary of it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:18, 25 December 2018 (UTC)
  • IMO, "revenge porn" falls into the same attack images category like racist or other harmful intent, but in this case, it is a bit less obvious to recognize, compared to mere exhibitionism. Regards, Yann (talk) 12:24, 25 December 2018 (UTC)

Pointing your camera 45 degrees down, from your face to your crotch is a nude selfie, so should be should be speedy deleted as F10.--BevinKacon (talk) 16:49, 28 December 2018 (UTC)

No, that's not a nude selfie. That's just a dickpic (or a..vagpic?). - Alexis Jazz ping plz 10:49, 29 December 2018 (UTC)
  •  Oppose with an exception for confirmed socks (which you suggest is the problem here). Send it to DR the first time. If a video of the same masturbating man is uploaded the day after deletion, even if it is a new (but similar in terms of content and quality) video, use G4. - Alexis Jazz ping plz 10:49, 29 December 2018 (UTC)
  •  Oppose A little while ago I went and browsed sexual content on Commons, and I must say that we barely have any. It is therefore very surprising how such files can be deleted without a Deletion Request discussion for at least two or three weeks. The way I see this sexuality and food would be a good analogy, both are aspects of life that are essential for the species, so we should have about the same amount of media give or take 5%. Currently we do not approach anything close to that. ℺ Gone Postal ( ) 08:14, 21 January 2019 (UTC)