Commons talk:OpenRefine

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

OpenRefine's file upload functionality should support all file formats that are natively supported by Wikimedia Commons as well[edit]

Tracked on GitHub #4276

In a conversation on the Wikimedia Commons Telegram channel, and in various other places, I have regularly heard people ask 'but will OpenRefine support batch uploading (video files) / (pdf files) / (djvu files)...' This sounds like a no-brainer, but as I've noticed that other batch upload tools sometimes don't support certain file formats, it's good to put this on the radar and to be aware / investigate why certain file formats would prove to be challenging. Spinster (talk) 07:06, 7 November 2021 (UTC)[reply]

Integrate thumbnail previews during the batch file upload process[edit]

Tracked on GitHub #4277

Request received in a conversation in the Wikimedia Commons Telegram channel. During a batch upload process of media files, it is extremely helpful if one can (easily) see thumbnail previews of the media files that are being uploaded.

Some existing Wikimedia Commons (batch) upload tools support this indeed (the default UploadWizard, for instance), others don't (Pattypan only shows previews of files and their infoboxes during the checking phase of the upload process, after all data has been prepared already).

OpenRefine is essentially a data-centric tool, so this may be a stretch, but it's good to have this request on the radar, as it makes a lot of sense IMO. Spinster (talk) 07:26, 7 November 2021 (UTC)[reply]

Error log while uploading[edit]

Using OpenRefine 3.7 on Windows. I see on the openrefine console errors. Is the a written log where I can see the errors happing during the upload process? Raymond 08:35, 27 July 2022 (UTC)[reply]

Hi @Raymond, thank you for trying OpenRefine! I am not sure if I fully understand your question. Are you interested in seeing more specific information about errors happening during upload than the ones you are currently seeing in the console? What kind of errors would you be interested in seeing? We could certainly do a better job at making the error reporting more detailed and descriptive. Best, Spinster (talk) 15:53, 11 August 2022 (UTC)[reply]
@Spinster During the upload runs I have seen different errors in the console:
  1. http-bad-status: There was a problem during the HTTP request: 415 Unsupported Media Type
  2. verification error: File extension ".jpg" does not match the detected MIME type of the file (inode/x-empty)
  3. abusefilter-warning: abusefilter-warning-overwriting-artwork
Screenshots: photos.app.goo.gl/84rzjVweQUrUFC338 (sorry, unable to link the URL because of the spam filter)
The reasons for 1. and 2. are currently not checked. Probably corrupt files. The reason for 3. is known. My fault because I forgot to exclude some already uploaded files.
What I am missing: a saved error log which includes the file names. Currently I have no idea which files are generating the errors from 1. and 2. Raymond 12:07, 13 August 2022 (UTC)[reply]

(edit): Comment added to https://github.com/OpenRefine/OpenRefine/issues/5166 now. Raymond 14:58, 13 August 2022 (UTC)[reply]

Flag for already finished uploads[edit]

Maybe I missed it somewhere in the docu: Is there a flag to filter for already finished uploads in case I have to stop OpenRefine in the middle of the upload process? Raymond 08:39, 27 July 2022 (UTC)[reply]

Hi Raymond, apologies for the late reply. If an upload failed in the middle, I would expect the file names of already uploaded files to appear in blue (reconciled) in your OpenRefine project. If that is not the case: you can go in your OpenRefine project, select the column of file names, clear reconciliation data, and reconcile that column against Wikimedia Commons again. The files that have already been uploaded will appear blue (reconciled); the other ones will again need to be marked as to be created new. Next, you can filter down to both separate sets by using the reconciliation judgment facet. I hope this makes sense! Spinster (talk) 15:52, 11 August 2022 (UTC)[reply]
@Spinster Thank you for the answer. That helps a lot and works now. Upload is running since yesterday. One comment: " I would expect the file names of already uploaded files to appear in blue (reconciled) in your OpenRefine project." Sadly not. Bug or feature? I had to do the steps you suggested (clearing, reconcile). Do you see any chance to avoid this step in a newer version? Raymond 11:53, 13 August 2022 (UTC)[reply]

A messy project[edit]

Hi, Here is a copy of my report to the forum.

Hi, I am trying to upload (a lot of) files to Commons, so I tried OR 3.7beta2 and OR 3.8-20221220.184714 (Java included).

OR 3.8-20221220.184714 loads OK, but the “Next” button after selecting files is not accessible, whatever number of files is selected. I am stuck there.

OR 3.7beta2 with Java included doesn’t even load (Java not found).

Second issue: Firefox is used, although Chrome is my default browser. OR loads in Chrome when copying-paste the URL http://127.0.0.1:3333/ to it.

General comment: Selecting files before a project exists in counter intuitive. The right order should be: first create a project, then select files to include. Yann (talk) 18:01, 26 December 2022 (UTC)[reply]

Pattypan comparison[edit]

I'm not sure what is meant with "You can't edit data inside Pattypan."? I think most Pattypan users rarely uses the spreadsheet and only uses the built-in edit features. Abbe98 (talk) 11:22, 10 January 2023 (UTC)[reply]

Userbox[edit]

I copied over the userbox from Wikidata, to make it possible use it on Commons too. You can use {{User loves OpenRefine}} and it gives you the following:

This user loves OpenRefine.



Not sure if this should be advertised on Commons:OpenRefine? I could not find a fitting section for it.

Those userboxes are quite useful to find a list of enthusiast users (it will populate Category:OpenRefine user when it gets used). The userbox is also available on meta. − Pintoch (talk) 18:41, 6 September 2023 (UTC)[reply]

Permission denied for Upload[edit]

I'm getting MediaWiki error while editing. It's showing "The action you have requested is limited to users in one of the groups: Users, Autoconfirmed users, Administrators, Confirmed users." I'm already a autoconfirmed users and don't know why I'm getting this error. Does anyone has got similar error? ❙❚❚❙❙ GnOeee ❚❙❚❙❙ 05:13, 23 September 2023 (UTC)[reply]

FilePath on PAWS[edit]

I am trying to build an upload with OpenRefine in the PAWS environment, but unfortunately i am failing with the file path. I have uploaded the files to my PAWS directory. But both attempts are failing: using the public-url https://public-paws.wmcloud.org/User:ZentralGut/CommonsUpload/Files_StAOW_Images/StAOW_257460.jpg fails due that this domain is not allowlisted. the linux-based filepath "/CommonsUpload/Files_StAOW_Images/StAOW_257460.jpg" is rejected as filepath before upload starts. Has anyone some experience with PAWS and file uploads? Best ZentralGut (talk) 13:36, 15 October 2023 (UTC)[reply]

Found the solution myself - add "/home/paws/" before your directory/filepath in your paws-account. ZentralGut (talk) 13:44, 15 October 2023 (UTC)[reply]

Commons extension is missing on PAWS[edit]

i dont seem to see this extension?

i tried to follow Commons:OpenRefine/Advanced tips and tricks#Adding the Wikimedia Commons reconciliation service to OpenRefine but it says

Error contacting recon service: timeout : timeout - https://commonsreconcile.toolforge.org/en/api

how do i boot OR up on paws to upload files? RZuo (talk) 18:37, 28 January 2024 (UTC)[reply]

@RZuo: Thanks for reporting this! I am not sure how the reconciliation service ended up in this state. I have restarted it and it seems to be accessible again. Can you try again on your side? − Pintoch (talk) 10:11, 29 January 2024 (UTC)[reply]
@Pintoch thx a lot! your toolforge page is back up and i was able to add it as a "standard service". RZuo (talk) 10:25, 29 January 2024 (UTC)[reply]

Feedback regarding the instructions (copied from Telegram chat)[edit]

Here are two issues we stumbled upon with the OpenRefine instructions:

1) It is not obvious from the instructions that uploaders may have to ingest data on Wikidata first. In the case of museum artefacts, the approach with the "Art photo" template looks very simple at first. However, the data ingest on Wikidata can be quite tricky and represents a major challenge. I think this should be mentioned in the instructions. Ideally, easier alternatives would be pointed to. - Maybe add a pros and cons for using OpenRefine vs. Pattypan.

2) The documentation of the "Art photo" template is incomplete in the sense that it is unclear which data fields are being picked up from Wikidata / SDC and which ones are not. The example in the OpenRefine instructions would suggest that everything is being picked up from Wikidata and Structured Data on Commons. The template documentation itself however lists further fields, such as "artwork license" and "photo license" (https://commons.wikimedia.org/wiki/Template:Art_Photo). It remains unclear what the best practice is with regard to the license information. There should be further examples in the instructions: Currently, the "Art photo" example seems to be a case of Freedom of Panorama (no permission indication for the artwork). There should be two other examples: One where the artwork clearly is in the Public Domain; and one where the artwork is under copyright.

My background: I am supporting heritage institutions with the uploading process in a volunteer capacity. That is, I don't do uploads myself, but I am advising people in institutions who are in charge of the upload process. In this context, I also recommend tools. With Pattypan, I have made the experience that it is largely self-explanatory; i.e., I just tell people to use Pattypan, and then they will usually find their way (this has been a great advantage of Pattypan over the now deprecated GLAM-Wiki Toolset). The approach via OpenRefine / SDC / Wikidata hasn't reached that level of user-friendlyness yet. - Beat Estermann (talk) 19:57, 9 March 2024 (UTC)[reply]

Adding files to a category[edit]

OR newbie here.

Suppose I have a spreadsheet, one column of which contains the full URL of Commons file pages.

I want to add all the listed files to a single, new, Commons category.

How can I do this in OR (PAWS version)?

Is there a better place where I should ask? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:15, 16 March 2024 (UTC)[reply]