Commons:Bots/Requests/SLiuBot

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Operator: Stevenliuyi (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)

Bot's tasks for which permission is being sought: Update COVID-19 cases data from STC COVID-19 Dataset as Tabular Data

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): Daily

Maximum edit rate (e.g. edits per minute): Pywikibot default

Bot flag requested: (Y/N): N

Programming language(s): Python (Pywikibot)

Stevenliuyi (talk) 13:29, 6 December 2020 (UTC)[reply]

Discussion

I am planning to import a COVID-19 cases dataset (https://github.com/stccenter/COVID-19-Data, under CC-BY-4.0) as tabular data on Commons. This dataset focuses on COVID-19 cases of subnational divisions across the world. Data are collected, validated and curated by NSF Spatiotemporal Innovation Center, which is jointly operated by George Mason, Harvard and UCSB. For more information about the dataset, you can take a look at the corresponding paper, which is freely available. (COI disclosure: I am collaborating with them on this project and I am also one of the authors of the paper, though I am not a member of the institution.) I have done a test run, including examples like Data:Sandbox/Stevenliuyi/COVID-19/IT/Q1210.tab, Data:Sandbox/Stevenliuyi/COVID-19/US/Q104994.tab, Data:Sandbox/Stevenliuyi/COVID-19/CN/Q46862.tab. There are also summary tables which record file names of individual tables, such as Data:Sandbox/Stevenliuyi/COVID-19/Summary.tab, Data:Sandbox/Stevenliuyi/COVID-19/US/Summary CA.tab, Data:Sandbox/Stevenliuyi/COVID-19/CN/Summary.tab. As a side note, I also plan to link all the tabular data from Wikidata through d:P:P8204 after the dataset is imported. --Stevenliuyi (talk) 13:29, 6 December 2020 (UTC)[reply]

Where is license statement located? --EugeneZelenko (talk) 15:55, 6 December 2020 (UTC)[reply]
The license statement is at the end of the paper in the “Data availability statement”. --Stevenliuyi (talk) 18:36, 6 December 2020 (UTC)[reply]
Version is not specified. Should be 1.0 assumed? Link to article must be in Permission field of uploaded files. --EugeneZelenko (talk) 15:35, 7 December 2020 (UTC)[reply]
It actually is CC-BY 4.0. Since the version is not specified in the paper, I just asked the database maintainer to add a license statement with specific version to the GitHub repository (right now there's a license badge just below the title, and also a statement at the end of the page). It seems Tabular Data only has a licence field and a sources field. I have filled both fields, CC-BY 4.0 is specified in license and links to both GitHub and the paper are included in sources, and the information are displayed in all updated files. Is it okay? --Stevenliuyi (talk) 20:25, 7 December 2020 (UTC)[reply]
GitHub repository link should be enough, since data is imported from there. --EugeneZelenko (talk) 14:52, 8 December 2020 (UTC)[reply]

Approved. --Krd 11:30, 23 December 2020 (UTC)[reply]