English subtitles for clip: File:OpenRefine Commons - editing - extract values from template parameters.webm
Jump to navigation
Jump to search
1 00:00:07,640 --> 00:00:12,120 I have an OpenRefine project with Wikimedia Commons files 2 00:00:12,120 --> 00:00:16,360 and I have previously retrieved wikitext from those files. 3 00:00:16,360 --> 00:00:19,040 So, I have a column with wikitext. 4 00:00:19,040 --> 00:00:23,080 In this wikitext, there is always a short description 5 00:00:23,080 --> 00:00:26,720 which I want to reuse as the caption of my file. 6 00:00:26,720 --> 00:00:30,040 So I am interested in retrieving this description, 7 00:00:30,040 --> 00:00:34,360 Now, I could do this with normal operations in OpenRefine: 8 00:00:34,360 --> 00:00:38,880 splitting the column, for instance, or using generic GREL commands, 9 00:00:38,880 --> 00:00:42,400 if I am skilled enough to do that. 10 00:00:42,400 --> 00:00:47,480 But since I have the Wikimedia Commons extension installed in OpenRefine, 11 00:00:47,480 --> 00:00:50,320 I can also use a specialized GREL expression 12 00:00:50,320 --> 00:00:52,040 that makes this very easy. 13 00:00:52,040 --> 00:00:53,240 I do this as follows: 14 00:00:53,240 --> 00:00:56,080 I go to the wikitext column itself. 15 00:00:56,080 --> 00:00:59,240 I select the top menu of that column. 16 00:00:59,240 --> 00:01:04,480 I say "Edit column..." - "Add column based on this column", 17 00:01:04,480 --> 00:01:09,120 and then I can use a specific piece of GREL. 18 00:01:09,120 --> 00:01:11,200 It looks like this: 19 00:01:11,200 --> 00:01:13,400 extractFromTemplate 20 00:01:13,400 --> 00:01:17,840 and then it has a certain syntax in which I need to specify: 21 00:01:17,840 --> 00:01:21,320 first, the name of the template 22 00:01:21,320 --> 00:01:24,400 from which I want to extract the data. 23 00:01:24,400 --> 00:01:29,360 In this case it is the "Photograph" template. 24 00:01:29,360 --> 00:01:31,960 And then, as a second parameter, 25 00:01:31,960 --> 00:01:38,000 I need to indicate the parameter 26 00:01:38,000 --> 00:01:41,680 from which I want to extract the information in the template, 27 00:01:41,680 --> 00:01:43,080 in this case "Description". 28 00:01:43,080 --> 00:01:45,240 So, this is already correct. 29 00:01:45,240 --> 00:01:47,280 I will preview this. 30 00:01:47,280 --> 00:01:50,640 As you can see, it indeed produces, for me, 31 00:01:50,640 --> 00:01:58,120 the value that is inside that specific parameter of the template. 32 00:01:58,120 --> 00:01:59,960 And I will give my column a name, 33 00:01:59,960 --> 00:02:05,200 "description". Here we go. 34 00:02:05,200 --> 00:02:07,960 I click "OK", and then very quickly 35 00:02:07,960 --> 00:02:11,680 OpenRefine gives me a column with the description. 36 00:02:11,680 --> 00:02:13,720 As you can see in this case, 37 00:02:13,720 --> 00:02:17,920 the description still is surrounded by language tags 38 00:02:17,920 --> 00:02:20,680 that are often used on Wikimedia Commons. 39 00:02:20,680 --> 00:02:22,960 But I can easily remove these 40 00:02:22,960 --> 00:02:26,120 with general OpenRefine functionalities, 41 00:02:26,120 --> 00:02:29,120 like Find and Replace.