English subtitles for clip: File:OpenRefine Commons - editing - extract values from template parameters.webm

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
1
00:00:07,640 --> 00:00:12,120
I have an OpenRefine project 
with Wikimedia Commons files

2
00:00:12,120 --> 00:00:16,360
and I have previously retrieved 
wikitext from those files.

3
00:00:16,360 --> 00:00:19,040
So, I have a column with wikitext.

4
00:00:19,040 --> 00:00:23,080
In this wikitext, there is 
always a short description

5
00:00:23,080 --> 00:00:26,720
which I want to reuse as the caption of my file.

6
00:00:26,720 --> 00:00:30,040
So I am interested in retrieving this description,

7
00:00:30,040 --> 00:00:34,360
Now, I could do this with 
normal operations in OpenRefine:

8
00:00:34,360 --> 00:00:38,880
splitting the column, for instance, 
or using generic GREL commands,

9
00:00:38,880 --> 00:00:42,400
if I am skilled enough to do that.

10
00:00:42,400 --> 00:00:47,480
But since I have the Wikimedia Commons 
extension installed in OpenRefine,

11
00:00:47,480 --> 00:00:50,320
I can also use a specialized GREL expression

12
00:00:50,320 --> 00:00:52,040
that makes this very easy.

13
00:00:52,040 --> 00:00:53,240
I do this as follows:

14
00:00:53,240 --> 00:00:56,080
I go to the wikitext column itself.

15
00:00:56,080 --> 00:00:59,240
I select the top menu of that column.

16
00:00:59,240 --> 00:01:04,480
I say "Edit column..." - "Add 
column based on this column",

17
00:01:04,480 --> 00:01:09,120
and then I can use a specific piece of GREL.

18
00:01:09,120 --> 00:01:11,200
It looks like this:

19
00:01:11,200 --> 00:01:13,400
extractFromTemplate

20
00:01:13,400 --> 00:01:17,840
and then it has a certain syntax 
in which I need to specify:

21
00:01:17,840 --> 00:01:21,320
first, the name of the template

22
00:01:21,320 --> 00:01:24,400
from which I want to extract the data.

23
00:01:24,400 --> 00:01:29,360
In this case it is the "Photograph" template.

24
00:01:29,360 --> 00:01:31,960
And then, as a second parameter,

25
00:01:31,960 --> 00:01:38,000
I need to indicate the parameter

26
00:01:38,000 --> 00:01:41,680
from which I want to extract 
the information in the template,

27
00:01:41,680 --> 00:01:43,080
in this case "Description".

28
00:01:43,080 --> 00:01:45,240
So, this is already correct.

29
00:01:45,240 --> 00:01:47,280
I will preview this.

30
00:01:47,280 --> 00:01:50,640
As you can see, it indeed produces, for me,

31
00:01:50,640 --> 00:01:58,120
the value that is inside that 
specific parameter of the template.

32
00:01:58,120 --> 00:01:59,960
And I will give my column a name,

33
00:01:59,960 --> 00:02:05,200
"description". Here we go.

34
00:02:05,200 --> 00:02:07,960
I click "OK", and then very quickly

35
00:02:07,960 --> 00:02:11,680
OpenRefine gives me a column with the description.

36
00:02:11,680 --> 00:02:13,720
As you can see in this case,

37
00:02:13,720 --> 00:02:17,920
the description still is 
surrounded by language tags

38
00:02:17,920 --> 00:02:20,680
that are often used on Wikimedia Commons.

39
00:02:20,680 --> 00:02:22,960
But I can easily remove these

40
00:02:22,960 --> 00:02:26,120
with general OpenRefine functionalities,

41
00:02:26,120 --> 00:02:29,120
like Find and Replace.