User:Spinster/OR editing 2023
About | How to: upload files | How to: edit files | Advanced tips and tricks | Training | Projects |
Step by step instructions on how to (batch) add structured data to (existing) Wikimedia Commons files with OpenRefine.
Software installation
[edit]Download and install OpenRefine (version 3.6 or later!)
[edit]⚠️ For editing Wikimedia Commons, you need OpenRefine 3.6 or newer. Wikimedia Commons is not supported in OpenRefine 3.5 or earlier versions.
Download and install OpenRefine on your computer. To edit files on Wikimedia Commons, you need OpenRefine 3.6 or newer.
You can download OpenRefine for Windows, MacOS and Linux from https://openrefine.org/download.html.
There are detailed download instructions and installation instructions in OpenRefine's user manual.
Download and install the Wikimedia Commons extension for OpenRefine
[edit]Additionally, it is highly recommended to also install OpenRefine's Wikimedia Commons extension. It is very helpful for Wikimedia Commons batch editing. The extension offers:
- A start screen to load file names directly from Wikimedia Commons categories.
- Thumbnails of Wikimedia Commons files (not all file formats supported yet).
- Several dedicated GREL expressions to retrieve data from wikitext for further processing.
The extension can be downloaded from GitHub, where you can also follow installation instructions.
The explanation on this page assumes that you have installed this extension.
Alternative: run OpenRefine online, in the cloud (via Wikimedia PAWS)
[edit]If you are unable to install OpenRefine on your computer, or if it runs very slowly, then you can also use it in the cloud (on wmcloud.org through PAWS). Everyone with a Wikimedia account can access OpenRefine here. Visit https://hub-paws.wmcloud.org/, log in, and click on the OpenRefine (blue diamond) logo.
The Wikimedia Commons extension (mentioned above) is installed in OpenRefine on PAWS.
Please note: with OpenRefine on PAWS it is NOT possible to upload files to Wikimedia Commons from your local computer. But it is possible to edit existing files.
Start an OpenRefine project based on one or more Commons category / categories
[edit]These instructions assume that you are using OpenRefine's Wikimedia Commons extension. See its documentation for installation instructions. If you are not using this extension and want to start an OpenRefine project via another method, check the advanced tips and tricks page.
- Select the Wikimedia Commons option in OpenRefine's startup screen.
- Now you can type the name of one or more Wikimedia Commons categories. You can also specify the depth with which you will transverse the Commons category tree.
- Click
Next
. - The project preview will load. You will see a list of file names that are loaded from the category or categories you specified.
- At the bottom of the preview window, you can indicate whether you also want to load a column with the Commons categories of each file, and/or a column with M-ids of the files. Commons categories can be very informative and useful to extract data that can later be added as structured data. If you decide to not yet retrieve the files' categories now, you will also get the opportunity to do this later.
- Give your project a meaningful name and click
Create project
. The project will now load, showing thumbnails of the files. The file names are blue and clickable, which means they are already reconciled with Wikimedia Commons.
Extract Wikitext and structured data
[edit]This step is optional, but may be very useful. Existing files on Wikimedia Commons are always described with wikitext, which usually contains information about the file's creator, license, and one or more Wikimedia Commons categories. It will often make sense to parse this Wikitext in OpenRefine, retrieving valuable bits of data from it which can be converted to structured data in a next step. Good examples of such data may include:
- The file's description, which you can convert to a file caption
- The file's creator
- The file's source
- Things depicted in the file, and other valuable information, may be mentioned in the file's categories
In order to create one or more new columns with Wikitext (and structured data statements) from your column of reconciled file names, select Edit column
→ Add columns from reconciled values...
in the file column's menu. You will get a dialog window in which you can select one or more options; you can choose just one or multiple.
- Wikitext: will create a column with the (full) Wikitext of each file
- Various structured data statements; the dialog windows suggests several common ones, but you can use the search functionality to search for any property that you are interested in
- You can retrieve file captions by typing the capital letter
C
, followed by the two-letter language code (e.gCen
for English file captions,Cja
for Japanese file captions).
-
Start the process to extract additional data for your files
-
For instance, indicate that you want to retrieve Wikitext
See Add columns from reconciled values in OpenRefine's user manual for general information about this feature.
Reconcile other columns with Wikidata
[edit]Structured data on Commons describes files on Commons by using (multilingual) items and properties from Wikidata.
Perhaps some of your columns correspond to Wikidata items. You will need to reconcile these, to help OpenRefine understand that it will need to make the link to these Wikidata items. Examples include:
- Creators (if they have a Wikidata item)
- Copyright statuses and licenses
- Depicted things, artworks, places, species, people…
You will reconcile these columns against the Wikidata reconciliation service, in English or another language that may be relevant (English usually works fine). The English Wikidata reconciliation service is installed by default in OpenRefine.
Reconciled columns have a header that is underlined with a dark green stripe; values in the column are blue hyperlinks which point to Wikidata items.
-
Start reconciling a column to Wikidata
-
Settings for the reconciliation
-
The column is reconciled with Wikidata; items are blue and show a preview when you hover over them with your mouse
You can find more instructions on how to reconcile data in OpenRefine's user manual and on Wikidata.
Create your editing schema
[edit]Finally, you will build a schema in OpenRefine, to model the Wikimedia Commons edits that OpenRefine will perform for each row in your project.
Click on the Schema
tab in the blue bar above your dataset, or go to the Wikidata/Wikibase extension menu and select Edit Wikibase schema
. You will get an empty schema window at first. Verify that the info text on top mentions Wikimedia Commons; if it mentions Wikidata, then you need to switch your Wikibase instance to Wikimedia Commons via the Select Wikibase instance... menu item in the Wikidata/Wikibase extension menu.
Click on the blue + add media
link. Several fields will appear.
You can now type, and/or drag and drop all the info you want to include in the files' metadata.
- In the main field (which says
type entity or drag reconciled column here
), you will drag your reconciled column of file names (see previous instructions above). Note: that column must have a green line (as a result of the reconciliation). - Captions: if you have created columns with file captions, then you can drag them here. Make sure to add the corresponding language.
- Statements: click
+ add statement
to add structured data statements, one by one. You can type values that are the same for all your files, or drag (reconciled) columns.
-
Select
Edit Wikibase schema
-
Empty schema after clicking
+ add media
-
Language selection for the file captions
-
Selecting a statement
-
A simple 'filled' schema
See Schema alignment in OpenRefine's user manual for general information about schemas.
Make sure to follow Wikimedia Commons data modeling conventions
[edit]Don't invent your own method to describe files, but make sure to follow Wikimedia Commons best practices. In case of doubt, ask the Wikimedia Commons community for feedback on the general Structured Data talk page.
Data models for structured data about media files on Commons are explained and discussed at Commons:Structured_data/Modeling.
Basic structured data statements for all Wikimedia Commons files are:
Structured data to add | Brief instructions | In-depth instructions info about the data model in structured data |
---|---|---|
File caption(s) (multilingual) | A (short) textual description of the file, in at least one language. Plain text; no Wiki markup or hyperlinks. | Data modeling guidelines: File captions |
Date | Usually the date when the file was created; using a inception (P571) statement. | Data modeling guidelines: Date |
Source of the file | Information about where the file was taken from. Is it the uploader's own work, was it uploaded from an external website,...? Typically using a source of file (P7482) statement. | Data modeling guidelines: Source of the file |
Creator | Who created the file? Typically described with a creator (P170) statement. | Data modeling guidelines: Creator of the file |
Copyright status and license | Is the file still under copyright, or is it public domain? If still under copyright, which license(s) applies/apply? Using copyright status (P6216) and copyright license (P275). | Data modeling guidelines: Copyright and licenses |
- In many cases it makes sense to add one or more depicts (P180) statements. See Data modeling guidelines: Depiction
- If the file shows an artwork, the statements (main subject) and (digital representation) are also commonly used. See Data modeling guidelines: Visual artworks
Preview and upload your edits to Wikimedia Commons
[edit]You can preview your edits by clicking the Preview
tab on top of your schema. The Issues
tab will inform you about errors that may be present in your data or schema, so that you can fix them.
When you are ready to upload your edits, then select Upload edits to Wikibase...
in the Wikidata/Wikibase extension menu, and log in with your Wikimedia Commons credentials. OpenRefine will encourage you to use a bot password, but if you like, you can ignore this warning. Provide a descriptive edit summary. No need to change the maxlag value. Click Upload edits
and your batch edit will start.
You will see your recently edited files in your own edit history on Wikimedia Commons.
-
Preview your edits via the
Preview
tab -
Start the upload process
-
Enter your Wikimedia Commons username and password
-
Provide a meaningful edit summary
See documentation about uploading in OpenRefine's user manual for general information about this feature.
Correcting mistakes with the EditGroups tool
[edit]When checking your user contributions, you will see your recent Wikimedia Commons edits done with OpenRefine. Each OpenRefine edit displays a (details) hyperlink after the edit summary, which links to the edit batch in the EditGroups tool.
In EditGroups, entire batches can be easily undone, in case some mistakes have been made.
All Wikimedia Commons batches with OpenRefine are listed at https://editgroups-commons.toolforge.org/?tool=OR.
-
OpenRefine batches listed in the EditGroups tool
-
One OpenRefine batch upload in the EditGroups tool; it can be reverted if grave mistakes have been made
Advanced tasks
[edit]Obtain file names with the PetScan tool
[edit]If you want to get a list of file names from Wikimedia Commons in another way than via the "categories" approach through OpenRefine's Wikimedia Commons extension, you can also retrieve a selection of file names with the PetScan tool.
PetScan gives you many different options to retrieve lists of file names based on various criteria, e.g. usage of specific templates, or using search.
Expand the table below for detailed instructions on how to do this with PetScan:
Commons:PetScan/Generate list of Commons files
PetScan's full manual is available on meta.wikimedia.org.
Other ways to obtain lists of file names to work with
[edit]You can also retrieve / obtain this list in other ways, e.g. from the Wikimedia Commons or Wikidata query service, or via another method of your choosing.
Other ways to start OpenRefine projects with lists of file names
[edit]You may have just a list of file names, or a larger spreadsheet or dataset with extra data about the files. Both are good starting points in OpenRefine.
Depending on the data format you have, you can enter this data into OpenRefine and start a project with it. You can use OpenRefine's Clipboard option to paste a list of file names (or a small dataset) from your computer's clipboard. Or you can have a list of files in a .csv or spreadsheet which you can open regularly in OpenRefine.
-
Starting a project from clipboard. Here, you can (for instance) simply paste a list of file names.
-
Starting an OpenRefine project by giving it a file on your computer.
You can read more about starting projects (and the settings for various data formats) in OpenRefine's user manual: https://docs.openrefine.org/manual/starting
Wikimedia Commons functionalities not present? Adding the Wikimedia Commons manifest to OpenRefine
[edit]If you don't see Wikimedia Commons as an option for reconciliation or in the schema (as described above), then you must still add the Wikimedia Commons manifest to OpenRefine.
This manifest is a kind of 'settings' file that provides OpenRefine with all the information it needs to be able to edit Wikimedia Commons. Do this as follows:
- In the Wikidata extension menu at the top right of your OpenRefine project, choose
Select Wikibase instance...
. ClickAdd Wikibase
. You will be prompted to paste either a manifest URL (this is recommended), or paste the JSON directly. Wikimedia Commons' manifest URL is:https://raw.githubusercontent.com/OpenRefine/wikibase-manifests/master/wikimedia-commons-manifest.json
- After adding this URL, you should now see Wikimedia Commons in your list of Wikibase instances. Click Wikimedia Commons to activate it. You can now close this dialog window by clicking the
Close
button. - Adding the Wikimedia Commons manifest in OpenRefine will also automatically add the Wikimedia Commons reconciliation service.
-
Paste the link to the Wikimedia Commons manifest
-
Make sure to select (activate) the Wikimedia Commons manifest
You can read more about Wikibase manifests and their application and usage in OpenRefine's user manual: https://docs.openrefine.org/manual/wikibase/configuration#for-wikibase-end-users. A list of Wikibase manifests (including the one of Wikimedia Commons) is maintained on GitHub at https://github.com/OpenRefine/wikibase-manifests.
Adding the Wikimedia Commons reconciliation service to OpenRefine
[edit]If you don't see Wikimedia Commons as an option for reconciliation (as described above), then you must still add the Wikimedia Commons reconciliation service to OpenRefine.
Select Reconcile
→ Start reconciling...
In the resulting (reconciliation) dialog window, click the button Add standard service...
and paste https://commonsreconcile.toolforge.org/en/api
there. If you prefer working with properties and labels in a different language, you can replace the en
string in that URL with the two-letter language code of your choice.
More info and documentation about the Commons reconciliation service is available at https://commonsreconcile.toolforge.org/.
Manually reconciling file names with Wikimedia Commons
[edit]If you start OpenRefine projects via OpenRefine's Wikimedia Commons extension, as described above, then file names will already be reconciled. They will be blue and clickable, and the file name column will be highlighted with a dark green line. If you start an OpenRefine project in another way, using a list of Wikimedia Commons files, you will still need to actively use the Wikimedia Commons Reconciliation Service as a starting point to begin batch editing these files. This step makes sure that OpenRefine recognizes these files, links them to their M-ids on Wikimedia Commons, and ensures that OpenRefine can edit them later.
You start the reconciliation process by selecting Reconcile
→ Start reconciling...
in the file column's menu. Then select the Wikimedia Commons reconciliation service and click the Start reconciling...
button. (See above on how to add the service if you don't see the Wikimedia Commons option yet.)
-
Watch a short (3'26") demo video of Wikimedia Commons reconciliation in OpenRefine
-
First step to reconcile a column of file names against Wikimedia Commons
-
A list of reconciled files. Notice that the file names are now blue hyperlinks.