English subtitles for clip: File:OpenRefine Commons - editing - reconcile columns with Wikidata.webm

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
1
00:00:07,080 --> 00:00:12,440
This is an OpenRefine project based 
on some Wikimedia Commons files.

2
00:00:12,440 --> 00:00:15,400
I have already done a bit 
of work with this project.

3
00:00:15,400 --> 00:00:19,080
So, I already retrieved wikitext from the files.

4
00:00:19,080 --> 00:00:21,360
And I have a short description.

5
00:00:21,360 --> 00:00:23,520
And then I also did some work

6
00:00:23,520 --> 00:00:27,160
to create a column of what the files depict.

7
00:00:27,160 --> 00:00:28,520
Sometimes it's a person.

8
00:00:28,520 --> 00:00:31,840
But sometimes it's also a 
building like in this case.

9
00:00:31,840 --> 00:00:36,120
And I also have some photos of 
some notable people in here.

10
00:00:36,120 --> 00:00:39,720
So, some of the people are 
notable, some of them not.

11
00:00:39,720 --> 00:00:43,400
Later on, I want to add Depicts 
statements to these files.

12
00:00:43,400 --> 00:00:48,360
In the case where the files show a notable thing,

13
00:00:48,360 --> 00:00:51,360
like a notable person or a notable building.

14
00:00:51,360 --> 00:00:56,640
I want to let this Depicts statement 
point to the Wikidata item.

15
00:00:56,640 --> 00:00:57,920
And in order to do that,

16
00:00:57,920 --> 00:01:00,680
I need to reconcile this column with Wikidata.

17
00:01:00,680 --> 00:01:01,840
So I need to look up

18
00:01:01,840 --> 00:01:05,920
whether these things that are 
being depicted in the photos

19
00:01:05,920 --> 00:01:08,840
do have a Wikidata item or not.

20
00:01:08,840 --> 00:01:10,000
I want to figure out

21
00:01:10,000 --> 00:01:13,800
whether the things that are 
depicted in these photos,

22
00:01:13,800 --> 00:01:16,040
whether the people or the buildings,

23
00:01:16,040 --> 00:01:18,760
whether they have a Wikidata item or not.

24
00:01:18,760 --> 00:01:22,600
And for that, I need to reconcile 
this column with Wikidata.

25
00:01:22,600 --> 00:01:25,160
This is a familiar operation

26
00:01:25,160 --> 00:01:27,360
for people who have used OpenRefine before

27
00:01:27,360 --> 00:01:29,320
for Wikidata editing.

28
00:01:29,320 --> 00:01:31,360
I am taking this column,

29
00:01:31,360 --> 00:01:33,680
and I'm going to the column menu.

30
00:01:33,680 --> 00:01:37,160
And I say "Reconcile..." - "Start reconciling".

31
00:01:37,160 --> 00:01:40,200
And evidently, I need to choose Wikidata for this.

32
00:01:40,200 --> 00:01:43,080
So, I want to know whether these things

33
00:01:43,080 --> 00:01:44,280
that are being depicted,

34
00:01:44,280 --> 00:01:45,200
these strings,

35
00:01:45,200 --> 00:01:47,760
have corresponding Wikidata items.

36
00:01:47,760 --> 00:01:49,600
I am dealing with a Dutch dataset.

37
00:01:49,600 --> 00:01:52,680
And so I choose the Dutch reconciliation service.

38
00:01:52,680 --> 00:01:55,000
But I could equally choose the English one,

39
00:01:55,000 --> 00:01:56,600
or another language.

40
00:01:56,600 --> 00:02:00,720
I click, and in most cases it will discover

41
00:02:00,720 --> 00:02:02,000
that it is a human being

42
00:02:02,000 --> 00:02:06,120
so I go for, indeed, the option Q5, human being,

43
00:02:06,120 --> 00:02:07,720
although there are a few buildings there.

44
00:02:07,720 --> 00:02:08,560
But that's okay.

45
00:02:08,560 --> 00:02:10,560
I will be able to fix that.

46
00:02:10,560 --> 00:02:13,680
And then I say "Start reconciling...".

47
00:02:13,680 --> 00:02:17,000
Then OpenRefine will reconcile the column for me.

48
00:02:17,000 --> 00:02:17,680
As you can see,

49
00:02:17,680 --> 00:02:21,320
some of the people have not 
been recognized as such.

50
00:02:21,320 --> 00:02:22,920
So they have not been recognized

51
00:02:22,920 --> 00:02:24,840
as notable people with a Wikidata item.

52
00:02:24,840 --> 00:02:26,280
And that is correct,

53
00:02:26,280 --> 00:02:28,440
because some of these people are not notable.

54
00:02:28,440 --> 00:02:30,360
They are just random people

55
00:02:30,360 --> 00:02:33,640
from the history of the town of Leiden.

56
00:02:33,640 --> 00:02:36,000
But I also have some notable buildings.

57
00:02:36,000 --> 00:02:39,320
And I can then look these up.

58
00:02:39,320 --> 00:02:41,320
This is this city gate.

59
00:02:41,320 --> 00:02:42,280
I can look it up.

60
00:02:42,280 --> 00:02:44,560
And I can reconcile the item here.

61
00:02:44,560 --> 00:02:46,280
I have a different spelling

62
00:02:46,280 --> 00:02:48,120
but I will fix this.

63
00:02:48,120 --> 00:02:49,720
Here we go.

64
00:02:49,720 --> 00:02:50,800
And later on

65
00:02:50,800 --> 00:02:55,600
I also have a few professors 
from the University of Leiden.

66
00:02:55,600 --> 00:02:57,680
They should definitely be reconciled.

67
00:02:57,680 --> 00:03:00,800
And this specific building 
should also be reconciled.

68
00:03:00,800 --> 00:03:03,280
It is a notable building.

69
00:03:03,280 --> 00:03:06,040
I'm going back to it.

70
00:03:06,040 --> 00:03:06,880
Here we go.

71
00:03:06,880 --> 00:03:08,400
That's good.

72
00:03:08,400 --> 00:03:10,160
Then I will double check

73
00:03:10,160 --> 00:03:11,760
whether this person is correct.

74
00:03:11,760 --> 00:03:13,160
This is not a correct match

75
00:03:13,160 --> 00:03:16,080
so I will unmatch this.

76
00:03:16,080 --> 00:03:17,000
Same here.

77
00:03:17,000 --> 00:03:19,880
This is not a correct match.

78
00:03:19,880 --> 00:03:23,000
But right here we have a person

79
00:03:23,000 --> 00:03:26,640
who also should be correctly matched.

80
00:03:26,640 --> 00:03:28,480
It is the more recent person.

81
00:03:28,480 --> 00:03:29,680
It is this one.

82
00:03:33,480 --> 00:03:35,040
Well... this is the normal process

83
00:03:35,040 --> 00:03:38,280
to proceed with matching items.

84
00:03:39,080 --> 00:03:43,200
And matching a column in OpenRefine with Wikidata.

85
00:03:43,200 --> 00:03:47,680
And this is the way that I 
tell my OpenRefine project

86
00:03:47,680 --> 00:03:55,000
that certain of these photographs 
point to Wikidata items.