English subtitles for clip: File:Wikidata Editing with OpenRefine - Part 3.webm
Jump to navigation
Jump to search
1 00:00:00,550 --> 00:00:02,800 Welcome back to this tutorial 2 00:00:02,850 --> 00:00:04,225 on using OpenRefine 3 00:00:04,225 --> 00:00:06,050 to import data into Wikidata. 4 00:00:07,700 --> 00:00:09,775 In the previous videos, 5 00:00:09,775 --> 00:00:10,988 we have matched the films 6 00:00:10,988 --> 00:00:12,525 and locations in our table 7 00:00:12,525 --> 00:00:15,375 to items. 8 00:00:13,883 --> 00:00:16,883 We now want to transform our table into statements 9 00:00:16,882 --> 00:00:19,882 and upload them to Wikidata. 10 00:00:19,988 --> 00:00:22,169 Let's first look at how this information 11 00:00:22,169 --> 00:00:24,350 is typically modelled in Wikidata. 12 00:00:24,986 --> 00:00:26,743 Pick a well-known movie 13 00:00:26,743 --> 00:00:28,500 where we expect to find this information. 14 00:00:29,114 --> 00:00:32,113 We can see that there is a "filming location" property for that. 15 00:00:33,850 --> 00:00:35,575 We review the page of the property 16 00:00:35,575 --> 00:00:37,300 and make sure it fits our needs. 17 00:00:37,695 --> 00:00:40,695 In this case it looks like a perfect fit! 18 00:00:44,867 --> 00:00:46,367 Click the Wikidata button 19 00:00:46,367 --> 00:00:47,383 in the top right corner 20 00:00:47,383 --> 00:00:50,770 and choose "Edit Wikidata schema". 21 00:00:52,010 --> 00:00:54,600 A schema is a template of Wikidata edits 22 00:00:54,600 --> 00:00:56,650 that describes how your tabular data 23 00:00:56,650 --> 00:00:59,610 will be transformed into Wikidata edits. 24 00:01:00,980 --> 00:01:03,415 It works pretty much like the Wikidata interface, 25 00:01:03,415 --> 00:01:06,000 except that you can drag and drop column names 26 00:01:05,950 --> 00:01:07,650 in place of values. 27 00:01:08,947 --> 00:01:11,947 Click "Add item". 28 00:01:12,099 --> 00:01:14,750 The items we want to modify 29 00:01:14,650 --> 00:01:15,725 are the films 30 00:01:15,725 --> 00:01:16,963 which we have reconciled 31 00:01:16,963 --> 00:01:18,200 in the "Title" column. 32 00:01:18,690 --> 00:01:21,690 So drag and drop that column to the item. 33 00:01:23,850 --> 00:01:25,450 You can see that this column 34 00:01:25,400 --> 00:01:27,625 is underlined in green: 35 00:01:27,625 --> 00:01:29,800 that is because we have reconciled it 36 00:01:29,000 --> 00:01:30,550 to Wikidata. 37 00:01:30,789 --> 00:01:33,019 You can only use reconciled columns 38 00:01:33,019 --> 00:01:35,550 in the inputs where an item is expected. 39 00:01:38,249 --> 00:01:40,249 On each of these items, 40 00:01:40,249 --> 00:01:42,250 we want to add the filming locations. 41 00:01:42,968 --> 00:01:45,009 Drag and drop the street column 42 00:01:45,009 --> 00:01:47,050 in the filming location. 43 00:01:48,559 --> 00:01:50,605 You can get a preview of the edits 44 00:01:50,605 --> 00:01:52,950 generated by the schema in the "Preview" tab. 45 00:01:54,367 --> 00:01:55,570 In the "Issues" tab, 46 00:01:55,570 --> 00:01:56,750 you get some feedback 47 00:01:56,750 --> 00:01:58,580 about the quality of your edits 48 00:01:58,580 --> 00:02:01,100 before they are made. 49 00:02:01,100 --> 00:02:02,940 OpenRefine complains about the fact 50 00:02:02,940 --> 00:02:06,099 that we haven't added any reference 51 00:02:04,200 --> 00:02:05,830 to our statements. 52 00:02:05,830 --> 00:02:08,050 So let's do that. 53 00:02:08,525 --> 00:02:10,288 I'm going to use the URL for the dataset 54 00:02:10,288 --> 00:02:12,050 with a retrieved date. 55 00:02:12,820 --> 00:02:15,540 We can also create an item for the dataset 56 00:02:15,540 --> 00:02:18,794 and use it in the reference if you prefer. 57 00:02:23,501 --> 00:02:25,701 While we are here, why not 58 00:02:25,701 --> 00:02:28,621 adding a few qualifiers to the statements... 59 00:02:28,625 --> 00:02:32,625 We have the start and end dates of the shooting 60 00:02:32,641 --> 00:02:36,641 as well as the geographical coordinates. 61 00:03:07,144 --> 00:03:09,524 It is also useful to check, if our additions 62 00:03:09,529 --> 00:03:11,779 will conflict with any existing data 63 00:03:11,779 --> 00:03:12,819 on the items. 64 00:03:13,380 --> 00:03:16,410 We fetch the existing values for the filming locations 65 00:03:35,208 --> 00:03:38,018 Once fetching has completed we use a text facet 66 00:03:38,018 --> 00:03:41,041 to inspect the sort of values they are. 67 00:03:42,057 --> 00:03:44,317 Most of these films do not have any 68 00:03:44,317 --> 00:03:45,857 filming location yet. 69 00:03:45,865 --> 00:03:47,345 and when they have one, 70 00:03:47,355 --> 00:03:50,705 it is a much less precise location. 71 00:03:50,874 --> 00:03:53,974 It should not be hard to remove the less precise location 72 00:03:53,975 --> 00:03:56,335 which are redundant with our additions 73 00:03:56,335 --> 00:03:58,844 once the dataset is uploaded. 74 00:03:58,844 --> 00:04:00,964 So, we're happy with our edits. 75 00:04:01,283 --> 00:04:03,763 Because this is a rather small dataset, 76 00:04:03,763 --> 00:04:06,021 we will upload it directly. 77 00:04:06,021 --> 00:04:07,601 For larger imports, 78 00:04:07,614 --> 00:04:08,954 complicated schemas, 79 00:04:08,965 --> 00:04:11,065 or large scale creation of new items 80 00:04:11,073 --> 00:04:13,023 it is good to request feedback 81 00:04:13,023 --> 00:04:15,113 about the import in Wikidata first. 82 00:04:15,250 --> 00:04:19,020 Click "Wikidata" – "upload edits to Wikidata" 83 00:04:19,927 --> 00:04:21,787 You will need to log in with your 84 00:04:21,796 --> 00:04:22,936 Wikidata account, 85 00:04:22,936 --> 00:04:26,166 this account will be used to make the edits. 86 00:04:26,166 --> 00:04:28,056 Add a meaningful edit summary 87 00:04:28,056 --> 00:04:29,376 to describe your edits 88 00:04:31,484 --> 00:04:34,073 This is important because it helps other editors 89 00:04:34,073 --> 00:04:35,973 understand what your edits do, 90 00:04:35,976 --> 00:04:38,376 when they look in the history of an item. 91 00:04:39,677 --> 00:04:43,977 We can now upload the dataset to Wikidata. 92 00:04:43,998 --> 00:04:46,188 You can check how the upload is going 93 00:04:46,188 --> 00:04:50,428 by looking at your own contributions. 94 00:04:53,710 --> 00:04:55,834 If you notice any issue with an edit 95 00:04:55,834 --> 00:04:58,604 you can cancel the upload in OpenRefine. 96 00:04:58,614 --> 00:05:01,154 This will stop making any further edits 97 00:05:01,159 --> 00:05:05,156 but will not remove the edits already made. 98 00:05:05,156 --> 00:05:06,616 To remove these edits 99 00:05:06,616 --> 00:05:08,276 click in the "details" link 100 00:05:08,276 --> 00:05:10,206 of any edit in the group. 101 00:05:10,214 --> 00:05:13,334 This will lead you to the edit groups tool 102 00:05:13,354 --> 00:05:17,354 where you can undo the entire edit group easily. 103 00:05:27,178 --> 00:05:29,038 This is the end of the tutorial 104 00:05:29,040 --> 00:05:33,040 I hope you enjoyed it, thanks for watching!