Commons:WriteSDoCfromExcel

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

writeSDoCfromExcel on Github

writeSDoCfromExcel is a Python script that adds structured data to files on Wikimedia Commons from an Excel sheet. It writes Property-Qid pairs from an Excel to the structured data of files on Wikimedia Commons.

For instance it can add putto (Q284865) to the depicts (P180) (Depicts) property of the file File:Atlas_Schoemaker-UTRECHT-DEEL1-3120-Utrecht,_Utrecht.jpeg from the Excel file P180Inputfile.xlsx (row 2 in that Excel).

Althought mainly intended to add P180 values in bulk, this script is also able to add Wikidata Qids to other properties (than P180) in the structured data.

Configuration[edit]

The Python and the Excel files need to be in the same folder/directory.

As you can see from the example Excel P180Inputfile.xlsx, the script expects 3 inputs, corresponding to column names in the sheet:

A handy way to get all files from a category, in this case Category:Atlas_Schoemaker-Utrecht is via the API call https://commons.wikimedia.org/w/api.php?action=query&generator=categorymembers&gcmlimit=500&gcmtitle=Category:Atlas_Schoemaker-Utrecht&format=xml&gcmnamespace=6 and clean up that XML using a regexp.
  • CommonsMid: the media ID (M-number) of the file. It consists of 'M + Page ID', where the Page ID of the file is listed in the Page information, so in this case 41686589.
An easy way to find M-numbers (in bulk) from file titles is by using the Minefield tool. You can copy-paste the full CommonsFile column into the tool, run it and obtain a list of M-numbers ("mid" in the CSV)
  • QidDepicts: The Wikidata Q-numbers (Qids) of the things that are depicted in the files (in case P180 is used). Use one Qid per row. See the yellow, green, blue etc. rows (in the Excel) for examples of multiple values for the same file.

In the Excel, the column DepictsLabelForReconciliation is a helper column, as input for reconciliation via OpenRefine, to find the Qids that correspond to the labels in that column. This column is not used in the script.

Of course, you can modify the variable names in the script and the columns names in the Excel to your own needs/taste.

Additionally, in the Python script itself:

  • the target property to add Qids to must be set, the default is "P180"
  • Wikimedia credentials: You can specify your Wikimedia username and password in the USER and PASS variables. As these are not prefilled in the script, the .py file has a "_nopasswd.py" suffix. If the USER and PASS variables are left blank, or if incorrect (eg. you entered an incorrect password), edits will still be done, but will be shown as done from your IP address.

Known bugs & issues[edit]

  • This script is still in beta, it has not been not fully tested and/or is 100% reliable. You might encounter some errors when running it, especially for target properties other than P180 and messy Excel-inputs. Always test with a small number of Excel rows (for i in range(0,3):) to make sure everything works as expected before running the full sheet (for i in range(0,len(df2)):)
  • Let me know any bugs or problems you've encountered.

Features to add[edit]

  • Make it availabe on PAWS (Wikimedia Jypyter notebook cloud service)
  • Let me know your feature requests.

Credits[edit]

The script uses the addClaim function in this script by User:Multichill

Reusing this script[edit]

Feel free to reuse, adapt, license = CC0

Contact[edit]

This tool was written by Olaf Janssen, the Wikimedia coordinator of the KB, national library of the Netherlands. Please contact him if you've any questions etc.