English subtitles for clip: File:OpenRefine Commons - editing - create schema.webm
Jump to navigation
Jump to search
1 00:00:06,080 --> 00:00:09,360 In order to prepare my upload to Wikimedia Commons 2 00:00:09,360 --> 00:00:11,360 of extra data about these files, 3 00:00:11,360 --> 00:00:12,920 structured data, 4 00:00:12,920 --> 00:00:15,840 I have to prepare a schema inside OpenRefine. 5 00:00:15,840 --> 00:00:17,960 And this is a function you will be familiar with 6 00:00:17,960 --> 00:00:22,240 when you've done edits to Wikidata with OpenRefine as well. 7 00:00:22,240 --> 00:00:25,600 You can access the schema through the Wikibase menu. 8 00:00:25,600 --> 00:00:27,520 Select "Edit Wikibase schema...". 9 00:00:27,520 --> 00:00:29,960 And then you will get a familiar empty screen. 10 00:00:30,520 --> 00:00:32,960 By default, it will be set to Wikidata. 11 00:00:32,960 --> 00:00:35,680 "Target Wikibase instance": Wikidata. 12 00:00:35,680 --> 00:00:38,880 But since we will be editing files on Wikimedia Commons, 13 00:00:38,880 --> 00:00:42,040 we need to set this to Wikimedia Commons. 14 00:00:42,040 --> 00:00:44,880 And it will also confirm here 15 00:00:44,880 --> 00:00:47,480 that you will transform your data 16 00:00:47,480 --> 00:00:50,080 into Wikimedia Commons edits. 17 00:00:50,080 --> 00:00:53,960 Now I have prepared a few columns that I want to work with. 18 00:00:53,960 --> 00:00:57,720 I just want to add a little bit of structured data to my files. 19 00:00:57,720 --> 00:00:59,440 Namely: a Depicts statement. 20 00:00:59,440 --> 00:01:03,360 And some captions: File Captions. 21 00:01:03,360 --> 00:01:07,920 And the source. I want to link to the source of these files. 22 00:01:07,920 --> 00:01:10,440 So I click on "Add media". 23 00:01:10,440 --> 00:01:13,920 And first I have to tell OpenRefine 24 00:01:13,920 --> 00:01:16,200 that it needs to edit these specific files. 25 00:01:16,200 --> 00:01:18,760 So I take my file column, 26 00:01:18,760 --> 00:01:21,840 which has been reconciled with Wikimedia Commons. 27 00:01:21,840 --> 00:01:23,040 And I go to the schema. 28 00:01:23,040 --> 00:01:24,400 And I drag this here 29 00:01:24,400 --> 00:01:28,200 into the thing that needs to be edited. 30 00:01:28,200 --> 00:01:31,320 Now, there's various other things that need to be dragged here. 31 00:01:31,320 --> 00:01:32,760 There is the file path. 32 00:01:32,760 --> 00:01:34,520 That is only used in the case 33 00:01:34,520 --> 00:01:37,200 when you upload new files to Wikimedia Commons. 34 00:01:37,200 --> 00:01:40,600 So in this case I am editing existing files. 35 00:01:40,600 --> 00:01:43,680 And I do not need to drag anything here, 36 00:01:43,680 --> 00:01:46,960 I also drag the file name. 37 00:01:46,960 --> 00:01:49,520 That is actually the file that we are using. 38 00:01:49,520 --> 00:01:52,840 And if I want to change the wikitext 39 00:01:52,840 --> 00:01:54,360 that is present for these files. 40 00:01:54,360 --> 00:01:55,840 So, the files have wikitext 41 00:01:55,840 --> 00:01:57,720 which is in the column here. 42 00:01:57,720 --> 00:01:59,240 If I want to change it, 43 00:01:59,240 --> 00:02:01,120 then I could take a column 44 00:02:01,120 --> 00:02:02,920 and drag that here. 45 00:02:02,920 --> 00:02:06,280 And I could actually also overwrite existing wikitext 46 00:02:06,280 --> 00:02:07,040 if I wanted to. 47 00:02:07,040 --> 00:02:09,919 But that's not something I'm interested in here. 48 00:02:09,919 --> 00:02:12,600 I do want to add file captions. 49 00:02:12,600 --> 00:02:18,080 So I have extracted some short descriptions or captions 50 00:02:18,080 --> 00:02:19,600 from the wikitext, 51 00:02:19,600 --> 00:02:20,600 looking like this. 52 00:02:20,600 --> 00:02:23,800 And I want to add these as file captions in Dutch. 53 00:02:23,800 --> 00:02:25,600 In my native language Dutch. 54 00:02:25,600 --> 00:02:28,120 So I choose Dutch as a language. 55 00:02:28,120 --> 00:02:31,520 And then I drag this column here. 56 00:02:31,520 --> 00:02:32,960 I can also choose 57 00:02:32,960 --> 00:02:35,680 whether or not I want to overwrite existing captions, 58 00:02:35,680 --> 00:02:36,560 if they are present. 59 00:02:36,560 --> 00:02:38,480 But I will only add captions 60 00:02:38,480 --> 00:02:40,200 if there are none yet. 61 00:02:40,200 --> 00:02:43,520 And then I also want to add a few statements. 62 00:02:43,520 --> 00:02:45,640 For instance Depicts. 63 00:02:45,640 --> 00:02:47,320 What the files depict. 64 00:02:47,320 --> 00:02:51,040 I'm searching for the Depicts property. 65 00:02:51,040 --> 00:02:53,480 And then I can drag the reconciled column 66 00:02:53,480 --> 00:02:55,720 of depicted things. 67 00:02:55,720 --> 00:02:57,560 You will see that I have some values 68 00:02:57,560 --> 00:02:59,920 that are being reconciled here. 69 00:02:59,920 --> 00:03:02,640 And so I drag these. 70 00:03:02,640 --> 00:03:05,880 And then I want to add some other statements as well. 71 00:03:05,880 --> 00:03:10,760 I am also interested in adding the collection. 72 00:03:10,760 --> 00:03:17,720 So the collection is "Erfgoed Leiden en omstreken". 73 00:03:17,720 --> 00:03:19,720 That's this one. 74 00:03:19,720 --> 00:03:22,960 And I also want to add the source of the file. 75 00:03:22,960 --> 00:03:26,120 Now I have to add the source of the file 76 00:03:26,120 --> 00:03:29,480 using the proper data modeling conventions. 77 00:03:29,480 --> 00:03:31,720 And I have looked these up on Wikimedia Commons. 78 00:03:31,720 --> 00:03:33,440 I know how to do this. 79 00:03:33,440 --> 00:03:36,840 So I have a URL here in my data set. 80 00:03:36,840 --> 00:03:39,960 That is basically the URL where I can find this file. 81 00:03:39,960 --> 00:03:40,920 And I want to make sure 82 00:03:40,920 --> 00:03:43,440 that that is present in the structured data. 83 00:03:43,440 --> 00:03:48,320 So I say "Source of file: 84 00:03:48,320 --> 00:03:54,400 And then I have to use "File on the internet"... 85 00:03:54,400 --> 00:03:56,880 "File available on the internet". 86 00:03:56,880 --> 00:04:00,320 And, by data modeling conventions, 87 00:04:00,320 --> 00:04:05,800 This has a few qualifiers. 88 00:04:05,800 --> 00:04:06,840 I will say: 89 00:04:06,840 --> 00:04:17,360 "URL" or "Described at URL". 90 00:04:17,360 --> 00:04:20,160 Then I drag the source URL here. 91 00:04:20,160 --> 00:04:21,079 But I also say: 92 00:04:21,079 --> 00:04:29,600 the operator is the website of Heritage Leiden. 93 00:04:32,440 --> 00:04:33,080 Here we go. 94 00:04:33,080 --> 00:04:35,320 So I have added a few extra statements 95 00:04:35,320 --> 00:04:38,360 that will become structured data for my file. 96 00:04:38,360 --> 00:04:39,720 Now I can check 97 00:04:39,720 --> 00:04:41,400 whether there are any issues. 98 00:04:41,400 --> 00:04:46,240 I see that there is some duplicate whitespace in some file names. 99 00:04:46,240 --> 00:04:50,280 But that is not something that is to worry about. 100 00:04:50,280 --> 00:04:55,520 And, yeah, this is also something to not worry about. 101 00:04:55,520 --> 00:04:57,040 And then I go to the preview. 102 00:04:57,040 --> 00:04:58,880 And I see that all of my files 103 00:04:58,880 --> 00:05:01,840 will get the Source statement. 104 00:05:01,840 --> 00:05:03,160 And the collection statement. 105 00:05:03,160 --> 00:05:04,200 And some of them, 106 00:05:04,200 --> 00:05:07,720 when the Depicts column has been reconciled, 107 00:05:07,720 --> 00:05:10,720 will also get a Depicts statement. 108 00:05:10,720 --> 00:05:16,600 And this looks good to me.