English subtitles for clip: File:OpenRefine Commons - editing - retrieve structured data from Commons files.webm

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
1
00:00:06,560 --> 00:00:12,000
I have an OpenRefine project here, 
based on files on Wikimedia Commons.

2
00:00:12,000 --> 00:00:13,920
I am interested in finding out

3
00:00:13,920 --> 00:00:16,760
whether the files that I have selected here

4
00:00:16,760 --> 00:00:19,240
already have certain structured data statements.

5
00:00:19,240 --> 00:00:23,720
So that I'm sure that I'm not 
adding duplicates of them.

6
00:00:23,720 --> 00:00:27,080
And just to explore what is already there.

7
00:00:27,080 --> 00:00:30,240
In practice I'm interested in finding out:

8
00:00:30,240 --> 00:00:33,760
whether these files already 
have a Depicts statement,

9
00:00:33,760 --> 00:00:36,040
whether they already have a Creator,

10
00:00:36,040 --> 00:00:37,080
and a Collection,

11
00:00:37,080 --> 00:00:41,000
because I would be interested in 
adding this information to the files.

12
00:00:41,000 --> 00:00:45,960
How do I check in OpenRefine whether 
these files already have this information?

13
00:00:45,960 --> 00:00:49,760
I can create columns with that structured data.

14
00:00:49,760 --> 00:00:51,320
I do that as follows.

15
00:00:51,320 --> 00:00:56,680
I go to the file column menu.

16
00:00:56,680 --> 00:01:00,120
The files already need to be 
reconciled with Wikimedia Commons

17
00:01:00,120 --> 00:01:02,240
And you can see that that has happened

18
00:01:02,240 --> 00:01:05,519
if the column has a dark green line,

19
00:01:05,519 --> 00:01:07,080
if the file names are blue,

20
00:01:07,080 --> 00:01:12,400
and you can click on them 
and open them in a new tab,

21
00:01:12,400 --> 00:01:15,880
and, if you have the Wikimedia Commons 
extension installed in OpenRefine,

22
00:01:15,880 --> 00:01:19,360
you should also see thumbnails of the files.

23
00:01:19,360 --> 00:01:23,880
I am selecting the menu of the file column.

24
00:01:23,880 --> 00:01:30,640
I select the option "Edit column..." - 
"Add columns from reconciled values ".

25
00:01:30,640 --> 00:01:36,440
Then OpenRefine will present me with some 
options of structured data that I can retrieve.

26
00:01:36,440 --> 00:01:41,280
As I said, I was interested in 
the Collection of the files.

27
00:01:41,280 --> 00:01:44,000
I was also interested in the Creator,

28
00:01:44,000 --> 00:01:47,680
and whether they have a Depicts statement.

29
00:01:47,680 --> 00:01:50,240
In the preview you can already see that.

30
00:01:50,240 --> 00:01:54,400
It shows me in advance some 
things that I'm interested in.

31
00:01:54,400 --> 00:01:58,160
Let's say... I'm also clicking on Inception.

32
00:01:58,160 --> 00:02:00,720
But then I decide I actually am not interested

33
00:02:00,720 --> 00:02:03,920
to see whether these files 
have an Inception statement.

34
00:02:03,920 --> 00:02:07,120
Then I can remove this option again.

35
00:02:07,120 --> 00:02:09,199
I click "OK".

36
00:02:09,199 --> 00:02:13,880
And then OpenRefine will load columns 
with that structured data for me.

37
00:02:13,880 --> 00:02:19,040
And, as you can see, there is no 
information at all about the collection.

38
00:02:19,040 --> 00:02:22,640
There's no structured data yet around collection.

39
00:02:22,640 --> 00:02:26,960
But all the files already have Creator statements.

40
00:02:26,960 --> 00:02:29,680
But none of them has Depicts statements.