English subtitles for clip: File:OpenRefine Commons - editing - reconcile columns with Wikidata.webm
Jump to navigation
Jump to search
1 00:00:07,080 --> 00:00:12,440 This is an OpenRefine project based on some Wikimedia Commons files. 2 00:00:12,440 --> 00:00:15,400 I have already done a bit of work with this project. 3 00:00:15,400 --> 00:00:19,080 So, I already retrieved wikitext from the files. 4 00:00:19,080 --> 00:00:21,360 And I have a short description. 5 00:00:21,360 --> 00:00:23,520 And then I also did some work 6 00:00:23,520 --> 00:00:27,160 to create a column of what the files depict. 7 00:00:27,160 --> 00:00:28,520 Sometimes it's a person. 8 00:00:28,520 --> 00:00:31,840 But sometimes it's also a building like in this case. 9 00:00:31,840 --> 00:00:36,120 And I also have some photos of some notable people in here. 10 00:00:36,120 --> 00:00:39,720 So, some of the people are notable, some of them not. 11 00:00:39,720 --> 00:00:43,400 Later on, I want to add Depicts statements to these files. 12 00:00:43,400 --> 00:00:48,360 In the case where the files show a notable thing, 13 00:00:48,360 --> 00:00:51,360 like a notable person or a notable building. 14 00:00:51,360 --> 00:00:56,640 I want to let this Depicts statement point to the Wikidata item. 15 00:00:56,640 --> 00:00:57,920 And in order to do that, 16 00:00:57,920 --> 00:01:00,680 I need to reconcile this column with Wikidata. 17 00:01:00,680 --> 00:01:01,840 So I need to look up 18 00:01:01,840 --> 00:01:05,920 whether these things that are being depicted in the photos 19 00:01:05,920 --> 00:01:08,840 do have a Wikidata item or not. 20 00:01:08,840 --> 00:01:10,000 I want to figure out 21 00:01:10,000 --> 00:01:13,800 whether the things that are depicted in these photos, 22 00:01:13,800 --> 00:01:16,040 whether the people or the buildings, 23 00:01:16,040 --> 00:01:18,760 whether they have a Wikidata item or not. 24 00:01:18,760 --> 00:01:22,600 And for that, I need to reconcile this column with Wikidata. 25 00:01:22,600 --> 00:01:25,160 This is a familiar operation 26 00:01:25,160 --> 00:01:27,360 for people who have used OpenRefine before 27 00:01:27,360 --> 00:01:29,320 for Wikidata editing. 28 00:01:29,320 --> 00:01:31,360 I am taking this column, 29 00:01:31,360 --> 00:01:33,680 and I'm going to the column menu. 30 00:01:33,680 --> 00:01:37,160 And I say "Reconcile..." - "Start reconciling". 31 00:01:37,160 --> 00:01:40,200 And evidently, I need to choose Wikidata for this. 32 00:01:40,200 --> 00:01:43,080 So, I want to know whether these things 33 00:01:43,080 --> 00:01:44,280 that are being depicted, 34 00:01:44,280 --> 00:01:45,200 these strings, 35 00:01:45,200 --> 00:01:47,760 have corresponding Wikidata items. 36 00:01:47,760 --> 00:01:49,600 I am dealing with a Dutch dataset. 37 00:01:49,600 --> 00:01:52,680 And so I choose the Dutch reconciliation service. 38 00:01:52,680 --> 00:01:55,000 But I could equally choose the English one, 39 00:01:55,000 --> 00:01:56,600 or another language. 40 00:01:56,600 --> 00:02:00,720 I click, and in most cases it will discover 41 00:02:00,720 --> 00:02:02,000 that it is a human being 42 00:02:02,000 --> 00:02:06,120 so I go for, indeed, the option Q5, human being, 43 00:02:06,120 --> 00:02:07,720 although there are a few buildings there. 44 00:02:07,720 --> 00:02:08,560 But that's okay. 45 00:02:08,560 --> 00:02:10,560 I will be able to fix that. 46 00:02:10,560 --> 00:02:13,680 And then I say "Start reconciling...". 47 00:02:13,680 --> 00:02:17,000 Then OpenRefine will reconcile the column for me. 48 00:02:17,000 --> 00:02:17,680 As you can see, 49 00:02:17,680 --> 00:02:21,320 some of the people have not been recognized as such. 50 00:02:21,320 --> 00:02:22,920 So they have not been recognized 51 00:02:22,920 --> 00:02:24,840 as notable people with a Wikidata item. 52 00:02:24,840 --> 00:02:26,280 And that is correct, 53 00:02:26,280 --> 00:02:28,440 because some of these people are not notable. 54 00:02:28,440 --> 00:02:30,360 They are just random people 55 00:02:30,360 --> 00:02:33,640 from the history of the town of Leiden. 56 00:02:33,640 --> 00:02:36,000 But I also have some notable buildings. 57 00:02:36,000 --> 00:02:39,320 And I can then look these up. 58 00:02:39,320 --> 00:02:41,320 This is this city gate. 59 00:02:41,320 --> 00:02:42,280 I can look it up. 60 00:02:42,280 --> 00:02:44,560 And I can reconcile the item here. 61 00:02:44,560 --> 00:02:46,280 I have a different spelling 62 00:02:46,280 --> 00:02:48,120 but I will fix this. 63 00:02:48,120 --> 00:02:49,720 Here we go. 64 00:02:49,720 --> 00:02:50,800 And later on 65 00:02:50,800 --> 00:02:55,600 I also have a few professors from the University of Leiden. 66 00:02:55,600 --> 00:02:57,680 They should definitely be reconciled. 67 00:02:57,680 --> 00:03:00,800 And this specific building should also be reconciled. 68 00:03:00,800 --> 00:03:03,280 It is a notable building. 69 00:03:03,280 --> 00:03:06,040 I'm going back to it. 70 00:03:06,040 --> 00:03:06,880 Here we go. 71 00:03:06,880 --> 00:03:08,400 That's good. 72 00:03:08,400 --> 00:03:10,160 Then I will double check 73 00:03:10,160 --> 00:03:11,760 whether this person is correct. 74 00:03:11,760 --> 00:03:13,160 This is not a correct match 75 00:03:13,160 --> 00:03:16,080 so I will unmatch this. 76 00:03:16,080 --> 00:03:17,000 Same here. 77 00:03:17,000 --> 00:03:19,880 This is not a correct match. 78 00:03:19,880 --> 00:03:23,000 But right here we have a person 79 00:03:23,000 --> 00:03:26,640 who also should be correctly matched. 80 00:03:26,640 --> 00:03:28,480 It is the more recent person. 81 00:03:28,480 --> 00:03:29,680 It is this one. 82 00:03:33,480 --> 00:03:35,040 Well... this is the normal process 83 00:03:35,040 --> 00:03:38,280 to proceed with matching items. 84 00:03:39,080 --> 00:03:43,200 And matching a column in OpenRefine with Wikidata. 85 00:03:43,200 --> 00:03:47,680 And this is the way that I tell my OpenRefine project 86 00:03:47,680 --> 00:03:55,000 that certain of these photographs point to Wikidata items.