English subtitles for clip: File:OpenRefine Commons - editing - create schema.webm

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
1
00:00:06,080 --> 00:00:09,360
In order to prepare my upload to Wikimedia Commons

2
00:00:09,360 --> 00:00:11,360
of extra data about these files,

3
00:00:11,360 --> 00:00:12,920
structured data,

4
00:00:12,920 --> 00:00:15,840
I have to prepare a schema inside OpenRefine.

5
00:00:15,840 --> 00:00:17,960
And this is a function you will be familiar with

6
00:00:17,960 --> 00:00:22,240
when you've done edits to 
Wikidata with OpenRefine as well.

7
00:00:22,240 --> 00:00:25,600
You can access the schema 
through the Wikibase menu.

8
00:00:25,600 --> 00:00:27,520
Select "Edit Wikibase schema...".

9
00:00:27,520 --> 00:00:29,960
And then you will get a familiar empty screen.

10
00:00:30,520 --> 00:00:32,960
By default, it will be set to Wikidata.

11
00:00:32,960 --> 00:00:35,680
"Target Wikibase instance": Wikidata.

12
00:00:35,680 --> 00:00:38,880
But since we will be editing 
files on Wikimedia Commons,

13
00:00:38,880 --> 00:00:42,040
we need to set this to Wikimedia Commons.

14
00:00:42,040 --> 00:00:44,880
And it will also confirm here

15
00:00:44,880 --> 00:00:47,480
that you will transform your data

16
00:00:47,480 --> 00:00:50,080
into Wikimedia Commons edits.

17
00:00:50,080 --> 00:00:53,960
Now I have prepared a few 
columns that I want to work with.

18
00:00:53,960 --> 00:00:57,720
I just want to add a little bit 
of structured data to my files.

19
00:00:57,720 --> 00:00:59,440
Namely: a Depicts statement.

20
00:00:59,440 --> 00:01:03,360
And some captions: File Captions.

21
00:01:03,360 --> 00:01:07,920
And the source. I want to link 
to the source of these files.

22
00:01:07,920 --> 00:01:10,440
So I click on "Add media".

23
00:01:10,440 --> 00:01:13,920
And first I have to tell OpenRefine

24
00:01:13,920 --> 00:01:16,200
that it needs to edit these specific files.

25
00:01:16,200 --> 00:01:18,760
So I take my file column,

26
00:01:18,760 --> 00:01:21,840
which has been reconciled with Wikimedia Commons.

27
00:01:21,840 --> 00:01:23,040
And I go to the schema.

28
00:01:23,040 --> 00:01:24,400
And I drag this here

29
00:01:24,400 --> 00:01:28,200
into the thing that needs to be edited.

30
00:01:28,200 --> 00:01:31,320
Now, there's various other things 
that need to be dragged here.

31
00:01:31,320 --> 00:01:32,760
There is the file path.

32
00:01:32,760 --> 00:01:34,520
That is only used in the case

33
00:01:34,520 --> 00:01:37,200
when you upload new files to Wikimedia Commons.

34
00:01:37,200 --> 00:01:40,600
So in this case I am editing existing files.

35
00:01:40,600 --> 00:01:43,680
And I do not need to drag anything here,

36
00:01:43,680 --> 00:01:46,960
I also drag the file name.

37
00:01:46,960 --> 00:01:49,520
That is actually the file that we are using.

38
00:01:49,520 --> 00:01:52,840
And if I want to change the wikitext

39
00:01:52,840 --> 00:01:54,360
that is present for these files.

40
00:01:54,360 --> 00:01:55,840
So, the files have wikitext

41
00:01:55,840 --> 00:01:57,720
which is in the column here.

42
00:01:57,720 --> 00:01:59,240
If I want to change it,

43
00:01:59,240 --> 00:02:01,120
then I could take a column

44
00:02:01,120 --> 00:02:02,920
and drag that here.

45
00:02:02,920 --> 00:02:06,280
And I could actually also 
overwrite existing wikitext

46
00:02:06,280 --> 00:02:07,040
if I wanted to.

47
00:02:07,040 --> 00:02:09,919
But that's not something I'm interested in here.

48
00:02:09,919 --> 00:02:12,600
I do want to add file captions.

49
00:02:12,600 --> 00:02:18,080
So I have extracted some 
short descriptions or captions

50
00:02:18,080 --> 00:02:19,600
from the wikitext,

51
00:02:19,600 --> 00:02:20,600
looking like this.

52
00:02:20,600 --> 00:02:23,800
And I want to add these as file captions in Dutch.

53
00:02:23,800 --> 00:02:25,600
In my native language Dutch.

54
00:02:25,600 --> 00:02:28,120
So I choose Dutch as a language.

55
00:02:28,120 --> 00:02:31,520
And then I drag this column here.

56
00:02:31,520 --> 00:02:32,960
I can also choose

57
00:02:32,960 --> 00:02:35,680
whether or not I want to 
overwrite existing captions,

58
00:02:35,680 --> 00:02:36,560
if they are present.

59
00:02:36,560 --> 00:02:38,480
But I will only add captions

60
00:02:38,480 --> 00:02:40,200
if there are none yet.

61
00:02:40,200 --> 00:02:43,520
And then I also want to add a few statements.

62
00:02:43,520 --> 00:02:45,640
For instance Depicts.

63
00:02:45,640 --> 00:02:47,320
What the files depict.

64
00:02:47,320 --> 00:02:51,040
I'm searching for the Depicts property.

65
00:02:51,040 --> 00:02:53,480
And then I can drag the reconciled column

66
00:02:53,480 --> 00:02:55,720
of depicted things.

67
00:02:55,720 --> 00:02:57,560
You will see that I have some values

68
00:02:57,560 --> 00:02:59,920
that are being reconciled here.

69
00:02:59,920 --> 00:03:02,640
And so I drag these.

70
00:03:02,640 --> 00:03:05,880
And then I want to add some 
other statements as well.

71
00:03:05,880 --> 00:03:10,760
I am also interested in adding the collection.

72
00:03:10,760 --> 00:03:17,720
So the collection is "Erfgoed 
Leiden en omstreken".

73
00:03:17,720 --> 00:03:19,720
That's this one.

74
00:03:19,720 --> 00:03:22,960
And I also want to add the source of the file.

75
00:03:22,960 --> 00:03:26,120
Now I have to add the source of the file

76
00:03:26,120 --> 00:03:29,480
using the proper data modeling conventions.

77
00:03:29,480 --> 00:03:31,720
And I have looked these up on Wikimedia Commons.

78
00:03:31,720 --> 00:03:33,440
I know how to do this.

79
00:03:33,440 --> 00:03:36,840
So I have a URL here in my data set.

80
00:03:36,840 --> 00:03:39,960
That is basically the URL 
where I can find this file.

81
00:03:39,960 --> 00:03:40,920
And I want to make sure

82
00:03:40,920 --> 00:03:43,440
that that is present in the structured data.

83
00:03:43,440 --> 00:03:48,320
So I say "Source of file:

84
00:03:48,320 --> 00:03:54,400
And then I have to use "File on the internet"...

85
00:03:54,400 --> 00:03:56,880
"File available on the internet".

86
00:03:56,880 --> 00:04:00,320
And, by data modeling conventions,

87
00:04:00,320 --> 00:04:05,800
This has a few qualifiers.

88
00:04:05,800 --> 00:04:06,840
I will say:

89
00:04:06,840 --> 00:04:17,360
"URL" or "Described at URL".

90
00:04:17,360 --> 00:04:20,160
Then I drag the source URL here.

91
00:04:20,160 --> 00:04:21,079
But I also say:

92
00:04:21,079 --> 00:04:29,600
the operator is the website of Heritage Leiden.

93
00:04:32,440 --> 00:04:33,080
Here we go.

94
00:04:33,080 --> 00:04:35,320
So I have added a few extra statements

95
00:04:35,320 --> 00:04:38,360
that will become structured data for my file.

96
00:04:38,360 --> 00:04:39,720
Now I can check

97
00:04:39,720 --> 00:04:41,400
whether there are any issues.

98
00:04:41,400 --> 00:04:46,240
I see that there is some duplicate 
whitespace in some file names.

99
00:04:46,240 --> 00:04:50,280
But that is not something that is to worry about.

100
00:04:50,280 --> 00:04:55,520
And, yeah, this is also 
something to not worry about.

101
00:04:55,520 --> 00:04:57,040
And then I go to the preview.

102
00:04:57,040 --> 00:04:58,880
And I see that all of my files

103
00:04:58,880 --> 00:05:01,840
will get the Source statement.

104
00:05:01,840 --> 00:05:03,160
And the collection statement.

105
00:05:03,160 --> 00:05:04,200
And some of them,

106
00:05:04,200 --> 00:05:07,720
when the Depicts column has been reconciled,

107
00:05:07,720 --> 00:05:10,720
will also get a Depicts statement.

108
00:05:10,720 --> 00:05:16,600
And this looks good to me.