English subtitles for clip: File:Wikidata Editing with OpenRefine - Part 3.webm

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
1
00:00:00,550 --> 00:00:02,800
Welcome back to this tutorial

2
00:00:02,850 --> 00:00:04,225
on using OpenRefine

3
00:00:04,225 --> 00:00:06,050
to import data into Wikidata.

4
00:00:07,700 --> 00:00:09,775
In the previous videos,

5
00:00:09,775 --> 00:00:10,988
we have matched the films

6
00:00:10,988 --> 00:00:12,525
and locations in our table

7
00:00:12,525 --> 00:00:15,375
to items.

8
00:00:13,883 --> 00:00:16,883
We now want to transform our table into statements

9
00:00:16,882 --> 00:00:19,882
and upload them to Wikidata.

10
00:00:19,988 --> 00:00:22,169
Let's first look at how this information

11
00:00:22,169 --> 00:00:24,350
is typically modelled in Wikidata.

12
00:00:24,986 --> 00:00:26,743
Pick a well-known movie

13
00:00:26,743 --> 00:00:28,500
where we expect to find this information.

14
00:00:29,114 --> 00:00:32,113
We can see that there is a "filming location" property for that.

15
00:00:33,850 --> 00:00:35,575
We review the page of the property

16
00:00:35,575 --> 00:00:37,300
and make sure it fits our needs.

17
00:00:37,695 --> 00:00:40,695
In this case it looks like a perfect fit!

18
00:00:44,867 --> 00:00:46,367
Click the Wikidata button

19
00:00:46,367 --> 00:00:47,383
in the top right corner

20
00:00:47,383 --> 00:00:50,770
and choose "Edit Wikidata schema".

21
00:00:52,010 --> 00:00:54,600
A schema is a template of Wikidata edits

22
00:00:54,600 --> 00:00:56,650
that describes how your tabular data

23
00:00:56,650 --> 00:00:59,610
will be transformed into Wikidata edits.

24
00:01:00,980 --> 00:01:03,415
It works pretty much like the Wikidata interface,

25
00:01:03,415 --> 00:01:06,000
except that you can drag and drop column names

26
00:01:05,950 --> 00:01:07,650
in place of values.

27
00:01:08,947 --> 00:01:11,947
Click "Add item".

28
00:01:12,099 --> 00:01:14,750
The items we want to modify

29
00:01:14,650 --> 00:01:15,725
are the films

30
00:01:15,725 --> 00:01:16,963
which we have reconciled

31
00:01:16,963 --> 00:01:18,200
in the "Title" column.

32
00:01:18,690 --> 00:01:21,690
So drag and drop that column to the item.

33
00:01:23,850 --> 00:01:25,450
You can see that this column

34
00:01:25,400 --> 00:01:27,625
is underlined in green:

35
00:01:27,625 --> 00:01:29,800
that is because we have reconciled it

36
00:01:29,000 --> 00:01:30,550
to Wikidata.

37
00:01:30,789 --> 00:01:33,019
You can only use reconciled columns

38
00:01:33,019 --> 00:01:35,550
in the inputs where an item is expected.

39
00:01:38,249 --> 00:01:40,249
On each of these items,

40
00:01:40,249 --> 00:01:42,250
we want to add the filming locations.

41
00:01:42,968 --> 00:01:45,009
Drag and drop the street column

42
00:01:45,009 --> 00:01:47,050
in the filming location.

43
00:01:48,559 --> 00:01:50,605
You can get a preview of the edits

44
00:01:50,605 --> 00:01:52,950
generated by the schema in the "Preview" tab.

45
00:01:54,367 --> 00:01:55,570
In the "Issues" tab,

46
00:01:55,570 --> 00:01:56,750
you get some feedback

47
00:01:56,750 --> 00:01:58,580
about the quality of your edits

48
00:01:58,580 --> 00:02:01,100
before they are made.

49
00:02:01,100 --> 00:02:02,940
OpenRefine complains about the fact

50
00:02:02,940 --> 00:02:06,099
that we haven't added any reference

51
00:02:04,200 --> 00:02:05,830
to our statements.

52
00:02:05,830 --> 00:02:08,050
So let's do that.

53
00:02:08,525 --> 00:02:10,288
I'm going to use the URL for the dataset

54
00:02:10,288 --> 00:02:12,050
with a retrieved date.

55
00:02:12,820 --> 00:02:15,540
We can also create an item for the dataset

56
00:02:15,540 --> 00:02:18,794
and use it in the reference if you prefer.

57
00:02:23,501 --> 00:02:25,701
While we are here, why not

58
00:02:25,701 --> 00:02:28,621
adding a few qualifiers to the statements...

59
00:02:28,625 --> 00:02:32,625
We have the start and end dates of the shooting

60
00:02:32,641 --> 00:02:36,641
as well as the geographical coordinates.

61
00:03:07,144 --> 00:03:09,524
It is also useful to check, if our additions

62
00:03:09,529 --> 00:03:11,779
will conflict with any existing data

63
00:03:11,779 --> 00:03:12,819
on the items.

64
00:03:13,380 --> 00:03:16,410
We fetch the existing values for the filming locations

65
00:03:35,208 --> 00:03:38,018
Once fetching has completed we use a text facet

66
00:03:38,018 --> 00:03:41,041
to inspect the sort of values they are.

67
00:03:42,057 --> 00:03:44,317
Most of these films do not have any

68
00:03:44,317 --> 00:03:45,857
filming location yet.

69
00:03:45,865 --> 00:03:47,345
and when they have one,

70
00:03:47,355 --> 00:03:50,705
it is a much less precise location.

71
00:03:50,874 --> 00:03:53,974
It should not be hard to remove the less precise location

72
00:03:53,975 --> 00:03:56,335
which are redundant with our additions

73
00:03:56,335 --> 00:03:58,844
once the dataset is uploaded.

74
00:03:58,844 --> 00:04:00,964
So, we're happy with our edits.

75
00:04:01,283 --> 00:04:03,763
Because this is a rather small dataset,

76
00:04:03,763 --> 00:04:06,021
we will upload it directly.

77
00:04:06,021 --> 00:04:07,601
For larger imports,

78
00:04:07,614 --> 00:04:08,954
complicated schemas,

79
00:04:08,965 --> 00:04:11,065
or large scale creation of new items

80
00:04:11,073 --> 00:04:13,023
it is good to request feedback

81
00:04:13,023 --> 00:04:15,113
about the import in Wikidata first.

82
00:04:15,250 --> 00:04:19,020
Click "Wikidata" – "upload edits to Wikidata"

83
00:04:19,927 --> 00:04:21,787
You will need to log in with your

84
00:04:21,796 --> 00:04:22,936
Wikidata account,

85
00:04:22,936 --> 00:04:26,166
this account will be used to make the edits.

86
00:04:26,166 --> 00:04:28,056
Add a meaningful edit summary

87
00:04:28,056 --> 00:04:29,376
to describe your edits

88
00:04:31,484 --> 00:04:34,073
This is important because it helps other editors

89
00:04:34,073 --> 00:04:35,973
understand what your edits do,

90
00:04:35,976 --> 00:04:38,376
when they look in the history of an item.

91
00:04:39,677 --> 00:04:43,977
We can now upload the dataset to Wikidata.

92
00:04:43,998 --> 00:04:46,188
You can check how the upload is going

93
00:04:46,188 --> 00:04:50,428
by looking at your own contributions.

94
00:04:53,710 --> 00:04:55,834
If you notice any issue with an edit

95
00:04:55,834 --> 00:04:58,604
you can cancel the upload in OpenRefine.

96
00:04:58,614 --> 00:05:01,154
This will stop making any further edits

97
00:05:01,159 --> 00:05:05,156
but will not remove the edits already made.

98
00:05:05,156 --> 00:05:06,616
To remove these edits

99
00:05:06,616 --> 00:05:08,276
click in the "details" link

100
00:05:08,276 --> 00:05:10,206
of any edit in the group.

101
00:05:10,214 --> 00:05:13,334
This will lead you to the edit groups tool

102
00:05:13,354 --> 00:05:17,354
where you can undo the entire edit group easily.

103
00:05:27,178 --> 00:05:29,038
This is the end of the tutorial

104
00:05:29,040 --> 00:05:33,040
I hope you enjoyed it, thanks for watching!