Bangla subtitles for clip: File:A Gentle Introduction to Wikidata for Absolute Beginners (including non-techies!).webm

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
1
00:00:00,000 --> 00:00:02,651


2
00:00:02,651 --> 00:00:03,900
<!-- ASAF BARTOV: Testing, testing. -->
আসফ বারতভ: টেস্টিং টেস্টিং

3
00:00:03,900 --> 00:00:10,036


4
00:00:10,036 --> 00:00:12,640
<!--Is this heard in the room?-->
আমার কথা কি পুরো ঘরে শোনা যাচ্ছে?


5
00:00:12,640 --> 00:00:15,190


6
00:00:15,190 --> 00:00:15,690
<!--Testing.-->
টেস্টিং

7
00:00:15,690 --> 00:00:22,620


8
00:00:22,620 --> 00:00:24,930
<!-- Hello, everyone.-->
সকলকে নমস্কার

9
00:00:24,930 --> 00:00:29,460
<!-- This is a gentle
introduction to Wikidata -->
আজকের স্পিচ উইকিডাটা সম্পর্কে সম্যক ধারনা দেবে

10
00:00:29,460 --> 00:00:31,922
<!-- for absolute beginners. -->
উইকিডাটাতে যারা নতুন তাদের জন্য

11
00:00:31,922 --> 00:00:34,130
<!--If you're an absolute
beginner, if you've never heard-->
যদি আপনি উইকিডাটা প্রোজেক্টে নতুন হন, কিম্বা উইকিডাটা প্রোজেক্ট সম্পর্কে আগে না শুনে থাকেন

12
00:00:34,130 --> 00:00:38,210
অথবা যদি আপনি উইকডাটা সম্পর্কে কিছুটা জানেন, অথচ বুঝতে পারছেন না
<!--of Wikidata, or if you've heard
of Wikidata but don't quite get -->

13
00:00:38,210 --> 00:00:41,360
উইকিডাটা কীসের জন্য প্রয়োজন, এবং আপনি হয়তো এইটিকে কেবল
<!-- it, don't know what it's
good for, have only used it -->

14
00:00:41,360 --> 00:00:43,880
আন্তঃউইকিসংযোগ স্থাপনের (inter-wiki links) ব্যবহার করে এসেছেন
<!-- for inter-wiki links-- -->

15
00:00:43,880 --> 00:00:46,247
যদি আপনার অবস্থান বা প্রশ্ন এর মধ্যে যে কোনো একটি হয়ে থাকে
<!-- if you're anywhere
on this range, -->

16
00:00:46,247 --> 00:00:47,330
আপনি সঠিক জায়গায় এসেছেন
<!-- you're in the right place. -->

17
00:00:47,330 --> 00:00:50,990


18
00:00:50,990 --> 00:00:52,040
আমার নাম আসফ বারতভ
<!-- My name is Asaf Bartov. -->

19
00:00:52,040 --> 00:00:54,590
আমি উইকিমিডিয়া ফাউন্ডেশনের কর্মী
<!-- I work for the
Wikimedia Foundation, -->

20
00:00:54,590 --> 00:00:59,790
এবং উইকিডাটা সম্পর্কে অত্যন্ত উৎসাহী
<!-- and I am a Wikidata enthusiast. -->

21
00:00:59,790 --> 00:01:05,620
এখন সবথেকে প্রথমে আমি আপনাদের যে কথাটি বলতে চাই সেইটি হলো আপনারা ভাগ্যবান 
<!-- So the first thing I want to
say is that you are lucky.-->


22
00:01:05,620 --> 00:01:10,540
আপনারা ভাগ্যবান কারণ, উইকিডাটা ইতোমধ্যে 
<!-- You are lucky because
Wikidata is already -->

23
00:01:10,540 --> 00:01:15,415
and is quickly becoming even
more of an important research

24
00:01:15,415 --> 00:01:21,730
tool for anyone who's
trying to ask questions

25
00:01:21,730 --> 00:01:25,030
about large amounts
of information.

26
00:01:25,030 --> 00:01:29,770
It will become more and more
used across the humanities,

27
00:01:29,770 --> 00:01:33,460
in particular, because of the
things that it's able to do,

28
00:01:33,460 --> 00:01:37,090
some of which we will
demonstrate shortly.

29
00:01:37,090 --> 00:01:40,750
And you are lucky because you
get to find out about it now

30
00:01:40,750 --> 00:01:43,400
before most of the world.

31
00:01:43,400 --> 00:01:49,120
So by the end of this talk,
you will be a Wikidata hipster

32
00:01:49,120 --> 00:01:51,250
because you'll be
able to say, oh yeah.

33
00:01:51,250 --> 00:01:53,470
I knew about Wikidata
before it was cool.

34
00:01:53,470 --> 00:01:56,090


35
00:01:56,090 --> 00:02:00,370
So before we actually
visit Wikidata,

36
00:02:00,370 --> 00:02:08,620
I want to share two key problems
that Wikidata seeks to solve

37
00:02:08,620 --> 00:02:12,940
and which would help us
understand why it exists.

38
00:02:12,940 --> 00:02:17,640
The first problem is that
have of dated data, that

39
00:02:17,640 --> 00:02:20,880
is data that is out of date.

40
00:02:20,880 --> 00:02:23,960
And this is apparent
on Wikipedia

41
00:02:23,960 --> 00:02:27,870
across our free
knowledge encyclopedias.

42
00:02:27,870 --> 00:02:32,160
Data on Wikipedia is
not always up to date.

43
00:02:32,160 --> 00:02:37,470
And the more obscure
it is, the more likely

44
00:02:37,470 --> 00:02:40,280
it is not to be up to date.

45
00:02:40,280 --> 00:02:49,360
So the Polish Wikipedia may have
an article about a small town

46
00:02:49,360 --> 00:02:55,480
in Argentina, and that article
will include information

47
00:02:55,480 --> 00:03:00,910
about that town like population
size, name of the mayor.

48
00:03:00,910 --> 00:03:04,580
And that information,
ideally, was

49
00:03:04,580 --> 00:03:08,540
correct at the time the article
was created on the Polish

50
00:03:08,540 --> 00:03:10,370
Wikipedia--

51
00:03:10,370 --> 00:03:13,760
maybe translated
from another wiki.

52
00:03:13,760 --> 00:03:17,900
But then how likely is
it to be kept up to date?

53
00:03:17,900 --> 00:03:20,960
How likely is it that the
Polish Wikipedia would give us

54
00:03:20,960 --> 00:03:25,880
the correct and latest numbers
or data about the population

55
00:03:25,880 --> 00:03:28,370
size of that town
or the mayor, right?

56
00:03:28,370 --> 00:03:31,720
So this is the kind of data
that does go out of date, right?

57
00:03:31,720 --> 00:03:34,250
Every few years--
five, 10 years--

58
00:03:34,250 --> 00:03:37,850
there is a census, and now there
are new population figures.

59
00:03:37,850 --> 00:03:42,440
Now the census in Argentina will
be made available in Argentina

60
00:03:42,440 --> 00:03:45,500
in Spanish, probably,
which brings us

61
00:03:45,500 --> 00:03:48,710
to another component of the
problem of dated data, which

62
00:03:48,710 --> 00:03:53,810
is there are no obvious
triggers for updating the data.

63
00:03:53,810 --> 00:03:58,520
So the Polish Wikipedian
is not sent an email

64
00:03:58,520 --> 00:04:00,680
by the Argentinean
government saying, hey,

65
00:04:00,680 --> 00:04:01,820
we have a new census.

66
00:04:01,820 --> 00:04:05,420
There are new population numbers
for you to update on Wikipedia.

67
00:04:05,420 --> 00:04:07,550
No such email is sent.

68
00:04:07,550 --> 00:04:10,146
So it's kind of
hard to notice when.

69
00:04:10,146 --> 00:04:12,770
And of course, multiply that by
all the different jurisdictions

70
00:04:12,770 --> 00:04:14,670
around the world.

71
00:04:14,670 --> 00:04:16,610
There's no easy
way and notice when

72
00:04:16,610 --> 00:04:17,790
your data goes out of date.

73
00:04:17,790 --> 00:04:20,620


74
00:04:20,620 --> 00:04:24,070
So that's difficult
to keep up to date.

75
00:04:24,070 --> 00:04:27,940
And even if we were to receive
some kind of indication--

76
00:04:27,940 --> 00:04:31,310
oh, there's a new
census in Argentina,

77
00:04:31,310 --> 00:04:33,100
so a whole bunch of
population figures

78
00:04:33,100 --> 00:04:34,960
have now gone out of date.

79
00:04:34,960 --> 00:04:37,240
Updating it on the
Polish Wikipedia

80
00:04:37,240 --> 00:04:40,090
and the French Wikipedia
and the Indonesian Wikipedia

81
00:04:40,090 --> 00:04:44,920
and the Arabic Wikipedia is a
whole bunch of repetitive work

82
00:04:44,920 --> 00:04:46,540
that a lot of
different volunteers

83
00:04:46,540 --> 00:04:49,900
will need to do just for
that one updated piece

84
00:04:49,900 --> 00:04:54,810
of information about Argentina.

85
00:04:54,810 --> 00:04:57,720
So I hope this is
clear and resonates

86
00:04:57,720 --> 00:05:01,920
with some of your experience
editing Wikipedia--

87
00:05:01,920 --> 00:05:04,170
data that is out of
date or that needs

88
00:05:04,170 --> 00:05:08,640
to be updated
manually, menially,

89
00:05:08,640 --> 00:05:16,190
on a fairly frequent schedule
across the different countries

90
00:05:16,190 --> 00:05:18,410
and data sources.

91
00:05:18,410 --> 00:05:22,340
The other-- and I think
maybe more interesting--

92
00:05:22,340 --> 00:05:26,210
shortcoming or problem
that I want to discuss

93
00:05:26,210 --> 00:05:30,260
is what I call the
inflexible ways

94
00:05:30,260 --> 00:05:36,020
of lateral queries, crosscutting
queries of knowledge.

95
00:05:36,020 --> 00:05:43,980
So if I want an answer to
the question, what countries

96
00:05:43,980 --> 00:05:48,740
in the world export rubber--

97
00:05:48,740 --> 00:05:52,300


98
00:05:52,300 --> 00:05:54,790
that's a reasonable
question, right?

99
00:05:54,790 --> 00:05:57,460
সেই তথ্য উইকিপিডিয়াতে রয়েছে
<!-- That information
is on Wikipedia.-->

100
00:05:57,460 --> 00:05:58,630
আপনি মানছেন কি?
<!-- Do you agree? -->

101
00:05:58,630 --> 00:06:00,640
If you go to
Wikipedia and read up

102
00:06:00,640 --> 00:06:05,560
about Brazil, about Peru, about
Germany, somewhere in there--

103
00:06:05,560 --> 00:06:09,010
maybe a sub-article called
Economics of Brazil--

104
00:06:09,010 --> 00:06:13,600
you will find the main
exports of that country.

105
00:06:13,600 --> 00:06:15,400
And you can find
out whether or not

106
00:06:15,400 --> 00:06:16,930
that country exports rubber.

107
00:06:16,930 --> 00:06:19,994
But what if I don't want
to go country by country

108
00:06:19,994 --> 00:06:21,160
looking for the word rubber?

109
00:06:21,160 --> 00:06:22,090
I just want an answer.

110
00:06:22,090 --> 00:06:25,540
What are the countries
that export rubber?

111
00:06:25,540 --> 00:06:28,360
Even though that
information is in Wikipedia,

112
00:06:28,360 --> 00:06:29,680
it's hard to get at.

113
00:06:29,680 --> 00:06:31,680
It's hard to query.

114
00:06:31,680 --> 00:06:35,770
Now, you may say, well, that's
what we have categories for,

115
00:06:35,770 --> 00:06:36,270
right?

116
00:06:36,270 --> 00:06:39,820
Categories are a way to
cut across Wikipedia.

117
00:06:39,820 --> 00:06:45,110
So if someone made a
category called rubber

118
00:06:45,110 --> 00:06:48,380
exporting countries, then
you can go to that category

119
00:06:48,380 --> 00:06:51,560
and see a list of countries
that export rubber.

120
00:06:51,560 --> 00:06:53,390
And if nobody has
made it yet, well, you

121
00:06:53,390 --> 00:06:56,990
can create that category and,
with a kind of one-time effort,

122
00:06:56,990 --> 00:06:59,730
populate that category,
and you're done.

123
00:06:59,730 --> 00:07:01,970
Well, yes.

124
00:07:01,970 --> 00:07:04,250
That's still not
very convenient.

125
00:07:04,250 --> 00:07:06,980
But also, it's still
very, very limited,

126
00:07:06,980 --> 00:07:12,380
because what if I only want
countries that export rubber

127
00:07:12,380 --> 00:07:15,950
and have a democratic
system of government,

128
00:07:15,950 --> 00:07:18,770
or any other kind of
additional condition

129
00:07:18,770 --> 00:07:20,510
that I would like
to add to this?

130
00:07:20,510 --> 00:07:22,230
Or take a completely
different example.

131
00:07:22,230 --> 00:07:26,750
What if I want to know
which Flemish town had

132
00:07:26,750 --> 00:07:31,510
the most painters born in it?

133
00:07:31,510 --> 00:07:34,480
There's a ton of
Flemish painters.

134
00:07:34,480 --> 00:07:37,870
Most of them were
born somewhere.

135
00:07:37,870 --> 00:07:39,685
We could theoretically,
just you know,

136
00:07:39,685 --> 00:07:43,900
look up all the birthplaces
of all the Flemish painters

137
00:07:43,900 --> 00:07:46,900
and tally up the
numbers and figure out

138
00:07:46,900 --> 00:07:51,610
what is the place where the
most Flemish painters come from?

139
00:07:51,610 --> 00:07:53,050
I don't know the answer to that.

140
00:07:53,050 --> 00:07:55,420
It would be nice to be
able to get that answer.

141
00:07:55,420 --> 00:07:57,610
Again, the data is in Wikipedia.

142
00:07:57,610 --> 00:08:00,400
Those birthplaces are
listed in the articles

143
00:08:00,400 --> 00:08:01,636
about those painters.

144
00:08:01,636 --> 00:08:05,710
But there's no easy way
to get that information.

145
00:08:05,710 --> 00:08:13,420
What if I want to ask, who are
some painters whose father was

146
00:08:13,420 --> 00:08:14,245
also a painter?

147
00:08:14,245 --> 00:08:16,840


148
00:08:16,840 --> 00:08:18,500
That's a thing
that exists, right?

149
00:08:18,500 --> 00:08:22,630
Some painters are
sons of painters.

150
00:08:22,630 --> 00:08:26,560
You know, Bruegel comes to
mind as an obvious example.

151
00:08:26,560 --> 00:08:28,240
But there's a bunch
of others, right?

152
00:08:28,240 --> 00:08:29,380
So who are those people?

153
00:08:29,380 --> 00:08:30,930
What if I want to
ask that question?

154
00:08:30,930 --> 00:08:33,400
That's the kind of question
that not only Wikipedia

155
00:08:33,400 --> 00:08:34,600
doesn't answer today.

156
00:08:34,600 --> 00:08:41,500
If you walk to your friendly
university library reference

157
00:08:41,500 --> 00:08:45,010
desk and say,
hello, I would like

158
00:08:45,010 --> 00:08:49,290
a list of painters whose
father was also a painter,

159
00:08:49,290 --> 00:08:52,820
how would that
librarian help you?

160
00:08:52,820 --> 00:08:57,960
There's no easy way to get an
answer to a question like that.

161
00:08:57,960 --> 00:09:01,100
What if you only want
a list of painters

162
00:09:01,100 --> 00:09:05,870
who were immigrants, painters
who lived somewhere else

163
00:09:05,870 --> 00:09:08,240
than where they were born?

164
00:09:08,240 --> 00:09:09,770
There's no book.

165
00:09:09,770 --> 00:09:11,720
I guess maybe there
is, but you know,

166
00:09:11,720 --> 00:09:15,590
it's not obvious that there's a
ready resource that says, list

167
00:09:15,590 --> 00:09:17,840
of painters who are immigrants.

168
00:09:17,840 --> 00:09:19,910
And the librarian would
probably refer you

169
00:09:19,910 --> 00:09:22,760
to a book on the shelf
called, I don't know,

170
00:09:22,760 --> 00:09:24,200
The Complete
Dictionary of Flemish

171
00:09:24,200 --> 00:09:26,300
Painters and go,
look up the index,

172
00:09:26,300 --> 00:09:28,520
you know, and if you
see a similar surname,

173
00:09:28,520 --> 00:09:29,910
maybe they're father and son.

174
00:09:29,910 --> 00:09:35,000
And kind of cobble together
the answer on your own.

175
00:09:35,000 --> 00:09:37,100
The reason I'm comparing
this to a library

176
00:09:37,100 --> 00:09:42,170
is to show you that this is a
kind of question that is not

177
00:09:42,170 --> 00:09:46,760
readily satisfiable today.

178
00:09:46,760 --> 00:09:50,240
Now, these questions may
sound contrived to you.

179
00:09:50,240 --> 00:09:52,460
You may say to
yourself, well, you

180
00:09:52,460 --> 00:09:54,860
know, painters who are also
sons of painters, yeah.

181
00:09:54,860 --> 00:09:57,680
You know, that
never occurred to me

182
00:09:57,680 --> 00:09:59,610
as a question I
might care about.

183
00:09:59,610 --> 00:10:01,850
But I want to invite
you to consider

184
00:10:01,850 --> 00:10:06,380
that this kind of question,
questions like that question,

185
00:10:06,380 --> 00:10:09,260
may well be questions
you do care about.

186
00:10:09,260 --> 00:10:12,740
And I also want to suggest
that the fact it is so nearly

187
00:10:12,740 --> 00:10:16,250
impossible, the fact that
there's no obvious way

188
00:10:16,250 --> 00:10:19,250
to ask that kind
of question today,

189
00:10:19,250 --> 00:10:21,200
is partly responsible
to your not

190
00:10:21,200 --> 00:10:22,970
coming up with those
questions, right?

191
00:10:22,970 --> 00:10:25,850
We tend to be limited
by the possible.

192
00:10:25,850 --> 00:10:30,080
You know, until human
flight was made possible,

193
00:10:30,080 --> 00:10:32,840
it did not occur to anyone
to say, oh yeah, by this time

194
00:10:32,840 --> 00:10:34,430
next week I will
be in Australia,

195
00:10:34,430 --> 00:10:36,630
because that was
just impossible.

196
00:10:36,630 --> 00:10:38,587
But when flight is
possible, there's

197
00:10:38,587 --> 00:10:40,670
all kinds of things that
suddenly become possible,

198
00:10:40,670 --> 00:10:42,740
and there's all
kinds of needs that

199
00:10:42,740 --> 00:10:46,430
arise based on the
availability of resources

200
00:10:46,430 --> 00:10:48,600
to fulfill those needs.

201
00:10:48,600 --> 00:10:54,120
So many of these research
questions, compound lateral

202
00:10:54,120 --> 00:10:58,520
cross-cutting queries, are not
being asked because people have

203
00:10:58,520 --> 00:11:00,410
internalized the fact
that there is no way

204
00:11:00,410 --> 00:11:05,750
to get an answer
to questions like,

205
00:11:05,750 --> 00:11:13,270
what is the most popular first
name among British politicians?

206
00:11:13,270 --> 00:11:14,520
I just made that up, you know?

207
00:11:14,520 --> 00:11:15,340
Is it John?

208
00:11:15,340 --> 00:11:16,510
Maybe.

209
00:11:16,510 --> 00:11:19,030
Maybe it's William,
for whatever reason.

210
00:11:19,030 --> 00:11:22,030
You know, these are the kinds
of questions we don't routinely

211
00:11:22,030 --> 00:11:25,855
ask because we know that it's
like, who are you going to ask?

212
00:11:25,855 --> 00:11:28,330
How are you going to
get an answer to that?

213
00:11:28,330 --> 00:11:36,040
So this problem of not having
very flexible ways of querying

214
00:11:36,040 --> 00:11:38,220
the data that we already have--

215
00:11:38,220 --> 00:11:41,230
in Wikipedia, in
Wikisource, elsewhere--

216
00:11:41,230 --> 00:11:45,060
is a significant limitation.

217
00:11:45,060 --> 00:11:50,880
So these two key problems
have one solution.

218
00:11:50,880 --> 00:11:55,500
And that is an editable,
central storage

219
00:11:55,500 --> 00:12:00,510
for structured and
linked data on a wiki,

220
00:12:00,510 --> 00:12:05,160
under a free license, which
is a very long way of saying

221
00:12:05,160 --> 00:12:07,290
Wikidata.

222
00:12:07,290 --> 00:12:08,470
That is Wikidata.

223
00:12:08,470 --> 00:12:11,190
Wikidata is an editable,
central storage

224
00:12:11,190 --> 00:12:15,840
for structured and
linked data on a wiki,

225
00:12:15,840 --> 00:12:17,700
under a free license.

226
00:12:17,700 --> 00:12:22,590
So let's take this
apart and unpack it.

227
00:12:22,590 --> 00:12:24,820
First of all, it's
a central storage.

228
00:12:24,820 --> 00:12:27,660
This relates to the
first problem, right?

229
00:12:27,660 --> 00:12:34,370
If we had one place containing
data like population size,

230
00:12:34,370 --> 00:12:38,270
we would be able to update
that one place and then have

231
00:12:38,270 --> 00:12:42,260
all of the different Wikipedias
draw the data from that one

232
00:12:42,260 --> 00:12:45,320
place so that we wouldn't
have to manually,

233
00:12:45,320 --> 00:12:49,980
repetitively update it across
our hundreds of projects.

234
00:12:49,980 --> 00:12:53,690
So having central storage
makes, I hope, kind

235
00:12:53,690 --> 00:12:57,230
of immediate, intuitive sense.

236
00:12:57,230 --> 00:13:02,840
But what do I mean by
structured and linked data?

237
00:13:02,840 --> 00:13:10,120
So structured data means
that each datum, each piece--

238
00:13:10,120 --> 00:13:15,880
individual piece-- of data
is managed on its own,

239
00:13:15,880 --> 00:13:19,660
is identified and
defined on its own,

240
00:13:19,660 --> 00:13:21,040
as distinct from Wikipedia.

241
00:13:21,040 --> 00:13:22,990
Wikipedia has articles.

242
00:13:22,990 --> 00:13:27,190
The article about Brazil
includes a ton of data,

243
00:13:27,190 --> 00:13:31,570
all kinds of information,
and it's presented as text,

244
00:13:31,570 --> 00:13:34,270
as several paragraphs--
several pages--

245
00:13:34,270 --> 00:13:36,540
of text, right?

246
00:13:36,540 --> 00:13:41,460
Now, we do have an
approximation of structured data

247
00:13:41,460 --> 00:13:43,580
on Wikipedia.

248
00:13:43,580 --> 00:13:45,300
If you've browsed
Wikipedia a little,

249
00:13:45,300 --> 00:13:49,100
you've noticed that we often
have an info box, what we

250
00:13:49,100 --> 00:13:50,750
call an info box on Wikipedia.

251
00:13:50,750 --> 00:13:55,220
That's the table on the right
side if it's a left to right

252
00:13:55,220 --> 00:13:57,200
language, the table
on the right side

253
00:13:57,200 --> 00:14:02,270
that has information that
is easy to tabulate, right?

254
00:14:02,270 --> 00:14:08,210
So you know, birth date, birth
place, death date, death place,

255
00:14:08,210 --> 00:14:09,710
nationality--

256
00:14:09,710 --> 00:14:16,670
or if it's about a country,
area, population, anthem,

257
00:14:16,670 --> 00:14:20,090
type of government, whatever
you are likely to find.

258
00:14:20,090 --> 00:14:23,150
If it's a movie, then
you know, starring,

259
00:14:23,150 --> 00:14:27,350
genre, box office receipts,
whatever pieces of data

260
00:14:27,350 --> 00:14:29,900
are relevant to an
article about a movie.

261
00:14:29,900 --> 00:14:34,940
So we do already kind of
group pieces of information

262
00:14:34,940 --> 00:14:40,160
on Wikipedia into this
kind of structured format.

263
00:14:40,160 --> 00:14:43,630
Those of you who have
ever looked at the source,

264
00:14:43,630 --> 00:14:45,970
at what the wiki code
under that looks like,

265
00:14:45,970 --> 00:14:49,640
know that it's only
semi-structured.

266
00:14:49,640 --> 00:14:52,370
It looks neat and
organized in a table,

267
00:14:52,370 --> 00:14:55,660
but really, it's just a bunch
of text that is put there.

268
00:14:55,660 --> 00:14:57,140
It is not centralized.

269
00:14:57,140 --> 00:15:00,100
Every Wikipedia has its
own copy of that data.

270
00:15:00,100 --> 00:15:02,930
And if I go and update
the population size

271
00:15:02,930 --> 00:15:07,070
on Spanish Wikipedia of
that Argentinean town,

272
00:15:07,070 --> 00:15:10,190
it does not get
updated automagically

273
00:15:10,190 --> 00:15:13,520
on the English Wikipedia or
the Arabic Wikipedia, right?

274
00:15:13,520 --> 00:15:17,150
So the structured data that
we already have on Wikipedia

275
00:15:17,150 --> 00:15:20,939
is not managed centrally.

276
00:15:20,939 --> 00:15:22,480
The other thing
about structured data

277
00:15:22,480 --> 00:15:29,250
is, when you have a notion of an
individual piece of data, that

278
00:15:29,250 --> 00:15:33,390
is the cornerstone of
allowing the kinds of queries

279
00:15:33,390 --> 00:15:34,770
that I was talking about.

280
00:15:34,770 --> 00:15:40,440
That is what will allow
me to ask questions like,

281
00:15:40,440 --> 00:15:43,470
what is the Flemish town where
the most painters were born,

282
00:15:43,470 --> 00:15:46,650
or what are the world's
largest cities that

283
00:15:46,650 --> 00:15:49,730
have a female mayor?

284
00:15:49,730 --> 00:15:52,430
I could come up with other
examples all day long, right?

285
00:15:52,430 --> 00:15:55,280
These are all questions
that you can ask,

286
00:15:55,280 --> 00:15:59,390
once you break down your data
into individual pieces, each

287
00:15:59,390 --> 00:16:02,300
of which is--

288
00:16:02,300 --> 00:16:06,950
you're able to refer to each
of those programmatically.

289
00:16:06,950 --> 00:16:10,430
The computer can
identify, isolate,

290
00:16:10,430 --> 00:16:14,700
and calculate based on each
of those pieces of data.

291
00:16:14,700 --> 00:16:17,060
So that's why the
structure is important.

292
00:16:17,060 --> 00:16:22,520
Now, Wikidata is also a
linked data repository.

293
00:16:22,520 --> 00:16:24,890
What does it mean that
the data is linked?

294
00:16:24,890 --> 00:16:29,700
Well, it means that a single
piece of data can point at,

295
00:16:29,700 --> 00:16:34,770
can link to another
whole bag of data.

296
00:16:34,770 --> 00:16:43,360
So if we are describing,
for example, a person,

297
00:16:43,360 --> 00:16:46,960
and we record the
single piece of data

298
00:16:46,960 --> 00:16:54,820
that this person was born
in Salem, Massachusetts,

299
00:16:54,820 --> 00:17:02,300
that single piece of data
links to the item about Salem,

300
00:17:02,300 --> 00:17:04,060
Massachusetts
because, of course,

301
00:17:04,060 --> 00:17:07,010
we know a lot of things
about that place, Salem,

302
00:17:07,010 --> 00:17:07,869
Massachusetts.

303
00:17:07,869 --> 00:17:09,245
So it's not just the text--

304
00:17:09,245 --> 00:17:13,450
S-A-L-E-M. It's not just,
that's where they were born.

305
00:17:13,450 --> 00:17:17,170
But it's a link to all
the data that we have

306
00:17:17,170 --> 00:17:19,270
about Salem, Massachusetts.

307
00:17:19,270 --> 00:17:24,940
If we say someone's
nationality is French,

308
00:17:24,940 --> 00:17:26,589
that is a link to France.

309
00:17:26,589 --> 00:17:30,700
That is a link to everything we
know about the country France.

310
00:17:30,700 --> 00:17:34,150
The fact that the data
is linked and structured

311
00:17:34,150 --> 00:17:37,630
allows not only humans,
but also computers

312
00:17:37,630 --> 00:17:41,620
to traverse information
and to bring

313
00:17:41,620 --> 00:17:44,950
us different pieces of
relevant information

314
00:17:44,950 --> 00:17:49,000
programmatically, automatically,
based on those links.

315
00:17:49,000 --> 00:17:52,000
Because it's not just
text, it's an actual link

316
00:17:52,000 --> 00:17:56,700
to another chunk of data.

317
00:17:56,700 --> 00:17:58,880
If this sounds a
little abstract,

318
00:17:58,880 --> 00:18:01,190
it will become much
clearer in just a second

319
00:18:01,190 --> 00:18:03,230
when we see it in action.

320
00:18:03,230 --> 00:18:06,200
But the other components of
this little definition are,

321
00:18:06,200 --> 00:18:09,650
of course, this central storage
of structured and linked data

322
00:18:09,650 --> 00:18:12,620
needs to be editable,
of course, because we

323
00:18:12,620 --> 00:18:14,370
need to keep it up to date.

324
00:18:14,370 --> 00:18:16,460
We need to correct mistakes.

325
00:18:16,460 --> 00:18:21,300
And we want it on a wiki
under a free license.

326
00:18:21,300 --> 00:18:23,940
The free license is, of
course, essential to enable

327
00:18:23,940 --> 00:18:30,910
reuse of that data, to enable
all kinds of reuse of the data.

328
00:18:30,910 --> 00:18:34,060
And Wikidata, unlike
Wikipedia, is released

329
00:18:34,060 --> 00:18:36,160
under a different free license.

330
00:18:36,160 --> 00:18:41,590
Wikidata is released
under CC0 waiver.

331
00:18:41,590 --> 00:18:44,920
That means unlike
Wikipedia, where

332
00:18:44,920 --> 00:18:51,160
you have to attribute Wikipedia
when you reuse information

333
00:18:51,160 --> 00:18:55,150
from Wikipedia, you do not
need to attribute Wikidata,

334
00:18:55,150 --> 00:18:57,040
and you do not need to
share alike your work.

335
00:18:57,040 --> 00:19:02,020
It's an unencumbered license to
reuse the data in any way you

336
00:19:02,020 --> 00:19:03,267
want, including commercially.

337
00:19:03,267 --> 00:19:05,350
You don't have to say that
it comes from Wikidata.

338
00:19:05,350 --> 00:19:07,390
I mean, it could be nice,
but you don't have to.

339
00:19:07,390 --> 00:19:09,280
You're under no
obligation to do it.

340
00:19:09,280 --> 00:19:14,080
And that is important to
allow certain kinds of reuse

341
00:19:14,080 --> 00:19:17,140
where, for example, if you're
building some kind of device,

342
00:19:17,140 --> 00:19:20,680
you may not have a practical
way to give attribution.

343
00:19:20,680 --> 00:19:23,920
And had we required
that to use Wikidata,

344
00:19:23,920 --> 00:19:27,250
we would have made
Wikidata less reusable.

345
00:19:27,250 --> 00:19:32,940
So Wikidata is unencumbered by
the requirement of attribution.

346
00:19:32,940 --> 00:19:35,730
And of course, because
it's on a wiki,

347
00:19:35,730 --> 00:19:40,421
we get all the benefits that we
are used to expect from a wiki,

348
00:19:40,421 --> 00:19:40,920
right?

349
00:19:40,920 --> 00:19:42,810
So it's a wiki,
which means, yes.

350
00:19:42,810 --> 00:19:44,910
It has discussion pages.

351
00:19:44,910 --> 00:19:46,500
It has revision histories.

352
00:19:46,500 --> 00:19:47,620
It remembers everything.

353
00:19:47,620 --> 00:19:50,610
So if you screw it up, you
can always go a version back.

354
00:19:50,610 --> 00:19:52,380
Or if someone else
vandalized the content,

355
00:19:52,380 --> 00:19:54,610
we can always go back,
just like Wikipedia.

356
00:19:54,610 --> 00:19:56,880
So we get all the
benefits we're used to--

357
00:19:56,880 --> 00:20:01,260
user talk pages, group
discussion pages, watch lists,

358
00:20:01,260 --> 00:20:03,755
all the features that
we expect in a wiki.

359
00:20:03,755 --> 00:20:06,740


360
00:20:06,740 --> 00:20:11,170
In short, Wikidata is love.

361
00:20:11,170 --> 00:20:14,100
I hope you agree with me
by the end of this talk.

362
00:20:14,100 --> 00:20:18,580
So let's zoom in and see
what this structured data

363
00:20:18,580 --> 00:20:21,420
looks like.

364
00:20:21,420 --> 00:20:29,460
So structured data on Wikidata
is collected in statements.

365
00:20:29,460 --> 00:20:31,930
And statements have
the general form

366
00:20:31,930 --> 00:20:39,490
of this triple, this
tripartite ascription--

367
00:20:39,490 --> 00:20:43,550
items, properties, and values.

368
00:20:43,550 --> 00:20:46,930
Now an item is the
subject, is the topic

369
00:20:46,930 --> 00:20:48,820
that we are trying to describe.

370
00:20:48,820 --> 00:20:52,164
It can be any topic that
Wikipedia can cover,

371
00:20:52,164 --> 00:20:53,830
and many others that
Wikipedia wouldn't.

372
00:20:53,830 --> 00:20:57,490
So the topic, the
item can be Germany,

373
00:20:57,490 --> 00:21:00,520
or it can be Salem,
Massachusetts,

374
00:21:00,520 --> 00:21:03,340
or it can be the
concept of redemption.

375
00:21:03,340 --> 00:21:04,610
It can be anything at all.

376
00:21:04,610 --> 00:21:10,000
Anything you can imagine
describing in any way with data

377
00:21:10,000 --> 00:21:11,990
can be the item.

378
00:21:11,990 --> 00:21:15,430
So the item, consider
it like the title

379
00:21:15,430 --> 00:21:17,480
of the rest of the data.

380
00:21:17,480 --> 00:21:20,860
And then what do we say
about Salem, Massachusetts

381
00:21:20,860 --> 00:21:22,330
or about Germany?

382
00:21:22,330 --> 00:21:26,770
Well, that's a series of
properties and values,

383
00:21:26,770 --> 00:21:28,450
properties and values.

384
00:21:28,450 --> 00:21:32,680
The property is
the kind of datum,

385
00:21:32,680 --> 00:21:39,770
like birth date or language
spoken or manner of death.

386
00:21:39,770 --> 00:21:42,640
These are all real properties.

387
00:21:42,640 --> 00:21:46,030
Or national anthem, if I'm
trying to describe a country--

388
00:21:46,030 --> 00:21:47,830
these are properties.

389
00:21:47,830 --> 00:21:49,880
And then they have
values, right?

390
00:21:49,880 --> 00:21:55,740
So this person, this
imaginary person's place

391
00:21:55,740 --> 00:21:59,640
of birth, the value of the
property place of birth

392
00:21:59,640 --> 00:22:02,430
is Salem, Massachusetts.

393
00:22:02,430 --> 00:22:06,690
So you can think about it
as like a government form--

394
00:22:06,690 --> 00:22:09,540
or not government, just any
form that you're filling out--

395
00:22:09,540 --> 00:22:12,420
where there are field names,
and then empty spaces for you

396
00:22:12,420 --> 00:22:13,110
to fill out.

397
00:22:13,110 --> 00:22:14,460
That's the value, OK?

398
00:22:14,460 --> 00:22:18,150
So the field names
or the categories

399
00:22:18,150 --> 00:22:19,350
are the properties, right?

400
00:22:19,350 --> 00:22:22,960
So name, language,
occupation, date of birth--

401
00:22:22,960 --> 00:22:24,420
these are all properties.

402
00:22:24,420 --> 00:22:26,640
And the values are
the actual piece

403
00:22:26,640 --> 00:22:31,391
of data, the actual
information that we have.

404
00:22:31,391 --> 00:22:33,870
And of course,
different kinds of data

405
00:22:33,870 --> 00:22:40,170
are relevant for describing
different kinds of items.

406
00:22:40,170 --> 00:22:45,030
And the key in the value is it
can be either a literal value--

407
00:22:45,030 --> 00:22:50,370
like if we're describing
the height of a mountain,

408
00:22:50,370 --> 00:22:55,826
we might say just
the number 8,848.

409
00:22:55,826 --> 00:22:57,325
That's the height
of which mountain?

410
00:22:57,325 --> 00:23:01,990


411
00:23:01,990 --> 00:23:04,070
Not everyone at once.

412
00:23:04,070 --> 00:23:07,430
Oh, because it's meters,
the metric system.

413
00:23:07,430 --> 00:23:08,270
Yeah, Mt.

414
00:23:08,270 --> 00:23:12,390
Everest is 8,848 meters.

415
00:23:12,390 --> 00:23:14,160
Yes.

416
00:23:14,160 --> 00:23:15,780
Get with it, America.

417
00:23:15,780 --> 00:23:17,630
The metric system.

418
00:23:17,630 --> 00:23:20,930
All right, so that
can be a literal value

419
00:23:20,930 --> 00:23:22,580
like an actual number.

420
00:23:22,580 --> 00:23:28,280
Or it can be a link to an
item, pointing at another item.

421
00:23:28,280 --> 00:23:30,890
But in this statement,
it is the value.

422
00:23:30,890 --> 00:23:35,150
So if I'm talking about
Germany, the item is Germany.

423
00:23:35,150 --> 00:23:39,680
And the property capital
city has the value Berlin.

424
00:23:39,680 --> 00:23:43,130
But the value is
not B-E-R-L-I-N.

425
00:23:43,130 --> 00:23:48,740
The value is a pointer to
the item Berlin, right?

426
00:23:48,740 --> 00:23:51,410
That's the link.

427
00:23:51,410 --> 00:23:56,671
So a single item is described
by a series of such statements,

428
00:23:56,671 --> 00:23:57,170
right?

429
00:23:57,170 --> 00:24:01,400
There's hundreds and hundreds of
things I can say about Germany.

430
00:24:01,400 --> 00:24:04,280
There's hundreds of things
I can say about a person.

431
00:24:04,280 --> 00:24:06,350
And these will
generally take the form

432
00:24:06,350 --> 00:24:08,330
of a property and a value.

433
00:24:08,330 --> 00:24:11,720
By the way, some properties
may have more than one value.

434
00:24:11,720 --> 00:24:15,920
Consider the property
languages spoken.

435
00:24:15,920 --> 00:24:18,050
People can speak more
than one language, right?

436
00:24:18,050 --> 00:24:20,330
So if I'm from
describing myself,

437
00:24:20,330 --> 00:24:22,400
we can say languages spoken--

438
00:24:22,400 --> 00:24:26,000
English, Hebrew,
Latin, whatever.

439
00:24:26,000 --> 00:24:27,860
So a property can have
more than one value.

440
00:24:27,860 --> 00:24:30,970


441
00:24:30,970 --> 00:24:34,010
So if the item is
about a country,

442
00:24:34,010 --> 00:24:38,890
it would have statements about
properties like population,

443
00:24:38,890 --> 00:24:43,180
land area, official languages,
borders with, anthem,

444
00:24:43,180 --> 00:24:45,070
capital city.

445
00:24:45,070 --> 00:24:48,580
If I'm describing a person, I
have a whole mostly different

446
00:24:48,580 --> 00:24:51,220
set of properties that
are relevant, right?

447
00:24:51,220 --> 00:24:54,160
Date of birth, place of birth,
citizenship, occupation,

448
00:24:54,160 --> 00:24:56,950
father, mother,
religion, notable works--

449
00:24:56,950 --> 00:24:59,780
now, are all of these
relevant for all people?

450
00:24:59,780 --> 00:25:00,970
No, of course not.

451
00:25:00,970 --> 00:25:02,140
It depends.

452
00:25:02,140 --> 00:25:05,220
And different items
about different people

453
00:25:05,220 --> 00:25:08,920
will either have or not
have these fields, right?

454
00:25:08,920 --> 00:25:12,640
So we wouldn't record religion
for absolutely every person.

455
00:25:12,640 --> 00:25:14,200
Some people manage
to do without.

456
00:25:14,200 --> 00:25:17,710
And also, it's not relevant
for a lot of people, like,

457
00:25:17,710 --> 00:25:20,320
what their religion
happens to be.

458
00:25:20,320 --> 00:25:22,840
Date of birth is generally
relevant for most people

459
00:25:22,840 --> 00:25:24,060
that we're documenting.

460
00:25:24,060 --> 00:25:29,390
So some properties kind of crop
up more commonly than others.

461
00:25:29,390 --> 00:25:33,220
A person's height, for
example, is not generally

462
00:25:33,220 --> 00:25:35,596
considered of
encyclopedic value, right?

463
00:25:35,596 --> 00:25:36,970
We don't, for
example, if we have

464
00:25:36,970 --> 00:25:40,840
an article about even a
really well-documented person

465
00:25:40,840 --> 00:25:45,610
like Winston Churchill, does
Wikipedia mention his height?

466
00:25:45,610 --> 00:25:47,620
I don't think it does.

467
00:25:47,620 --> 00:25:50,320
Even though I'm sure
we could probably

468
00:25:50,320 --> 00:25:52,810
find a source somewhere
that lists his height,

469
00:25:52,810 --> 00:25:55,570
it's just not a
very relevant piece

470
00:25:55,570 --> 00:25:57,506
of information about Churchill.

471
00:25:57,506 --> 00:25:59,380
With everything else
that's written about him

472
00:25:59,380 --> 00:26:00,796
and that we know
about him that we

473
00:26:00,796 --> 00:26:03,460
want to include in the
article, a person's height

474
00:26:03,460 --> 00:26:08,180
is not really something of
great value most of the time.

475
00:26:08,180 --> 00:26:14,420
But if we are describing
Michael Jordan, it is relevant.

476
00:26:14,420 --> 00:26:15,430
I'm dating myself.

477
00:26:15,430 --> 00:26:19,230
People still know
Michael Jordan, right?

478
00:26:19,230 --> 00:26:21,600
You know, a basketball
player, that's

479
00:26:21,600 --> 00:26:24,204
when height is very
relevant, right?

480
00:26:24,204 --> 00:26:25,620
That's one of the
first things you

481
00:26:25,620 --> 00:26:28,020
say when you're describing
a basketball player,

482
00:26:28,020 --> 00:26:31,380
is list their height.

483
00:26:31,380 --> 00:26:33,690
So even within the
class of person,

484
00:26:33,690 --> 00:26:36,480
some properties may be
more or less relevant,

485
00:26:36,480 --> 00:26:38,320
depending on the context.

486
00:26:38,320 --> 00:26:40,090
So let's look at some examples.

487
00:26:40,090 --> 00:26:42,870
These are examples
of statements.

488
00:26:42,870 --> 00:26:45,400
Each line is a statement.

489
00:26:45,400 --> 00:26:47,130
So here's the first one.

490
00:26:47,130 --> 00:26:53,270
I want to state, about the
item Earth, our planet.

491
00:26:53,270 --> 00:26:55,760
And what I want
to say about Earth

492
00:26:55,760 --> 00:27:00,980
is that the property
highest point on Earth

493
00:27:00,980 --> 00:27:03,310
has the value Mt.

494
00:27:03,310 --> 00:27:04,817
Everest.

495
00:27:04,817 --> 00:27:05,900
Would you agree with that?

496
00:27:05,900 --> 00:27:09,580
That is the highest
point on Earth.

497
00:27:09,580 --> 00:27:11,100
That's a statement.

498
00:27:11,100 --> 00:27:14,020
It says something
specific, one piece

499
00:27:14,020 --> 00:27:15,517
of information about Earth.

500
00:27:15,517 --> 00:27:17,350
Now of course, there's
a lot of other things

501
00:27:17,350 --> 00:27:18,820
we want to say about Earth--

502
00:27:18,820 --> 00:27:21,165
circumference,
average temperature,

503
00:27:21,165 --> 00:27:22,540
I don't know, all
kinds of things

504
00:27:22,540 --> 00:27:26,750
we can describe the planet
with, density, it's a galaxy,

505
00:27:26,750 --> 00:27:28,250
it belongs to, all that.

506
00:27:28,250 --> 00:27:30,400
But here's one piece
of information,

507
00:27:30,400 --> 00:27:37,370
one very specific field in
the detailed form about Earth.

508
00:27:37,370 --> 00:27:38,990
The highest point is Mt.

509
00:27:38,990 --> 00:27:39,590
Everest.

510
00:27:39,590 --> 00:27:41,570
Now here's a second statement.

511
00:27:41,570 --> 00:27:42,920
This time Mt.

512
00:27:42,920 --> 00:27:46,690
Everest itself is the item
that I'm describing, right?

513
00:27:46,690 --> 00:27:48,590
The topic has changed.

514
00:27:48,590 --> 00:27:50,120
Now I'm saying
something about Mt.

515
00:27:50,120 --> 00:27:52,340
Everest, and what
I'm saying about Mt.

516
00:27:52,340 --> 00:27:56,860
Everest is elevation
above sea level.

517
00:27:56,860 --> 00:28:01,190
Sounds the same but it
isn't, because the highest

518
00:28:01,190 --> 00:28:04,670
point on Earth answers
the question where,

519
00:28:04,670 --> 00:28:08,090
like on the planet, what
is the highest point?

520
00:28:08,090 --> 00:28:08,720
It's Mt.

521
00:28:08,720 --> 00:28:09,630
Everest.

522
00:28:09,630 --> 00:28:12,911
But how high is that highest
point is a different piece

523
00:28:12,911 --> 00:28:13,535
of information.

524
00:28:13,535 --> 00:28:14,710
Do you agree?

525
00:28:14,710 --> 00:28:16,790
It's the actual altitude.

526
00:28:16,790 --> 00:28:19,600
It's not where on
the planet it is.

527
00:28:19,600 --> 00:28:21,680
So it may sound similar,
but these are actually

528
00:28:21,680 --> 00:28:24,030
very different pieces
of information.

529
00:28:24,030 --> 00:28:27,800
So that highest
point, how high is it?

530
00:28:27,800 --> 00:28:31,790
Well, it's 8,848 meters high.

531
00:28:31,790 --> 00:28:36,550
Now the third statement gives
another piece of information

532
00:28:36,550 --> 00:28:37,960
about the first item.

533
00:28:37,960 --> 00:28:40,870
Same item-- I could have
grouped them together.

534
00:28:40,870 --> 00:28:42,400
Another thing I
know about the Earth

535
00:28:42,400 --> 00:28:46,480
is that the deepest
point on the planet

536
00:28:46,480 --> 00:28:53,050
is the Challenger Deep, part
of the so-called Mariana

537
00:28:53,050 --> 00:28:54,760
Trench in the ocean.

538
00:28:54,760 --> 00:28:56,530
So that is the deepest point.

539
00:28:56,530 --> 00:28:58,180
And how deep is it?

540
00:28:58,180 --> 00:29:01,384
I again use the elevation
above sea level.

541
00:29:01,384 --> 00:29:03,550
That's the name of the
property even though it's not

542
00:29:03,550 --> 00:29:04,750
above sea level.

543
00:29:04,750 --> 00:29:08,260
I have a negative value because
the elevation of the Challenger

544
00:29:08,260 --> 00:29:13,700
Deep is minus 11
kilometers, more or less.

545
00:29:13,700 --> 00:29:14,200
All right?

546
00:29:14,200 --> 00:29:15,620
So these are statements.

547
00:29:15,620 --> 00:29:18,820
These are four individual
pieces of data.

548
00:29:18,820 --> 00:29:21,160
And I could also
look at it this way.

549
00:29:21,160 --> 00:29:25,210
Maybe that's closer to the
government form example

550
00:29:25,210 --> 00:29:26,620
that I was giving, right?

551
00:29:26,620 --> 00:29:29,190
So I want to say
something about Earth.

552
00:29:29,190 --> 00:29:30,760
What do I want to say?

553
00:29:30,760 --> 00:29:33,580
Two things-- highest point.

554
00:29:33,580 --> 00:29:36,760
That's the field,
that's the property,

555
00:29:36,760 --> 00:29:37,780
and this is the value.

556
00:29:37,780 --> 00:29:39,190
The highest point is Mt.

557
00:29:39,190 --> 00:29:40,240
Everest.

558
00:29:40,240 --> 00:29:42,880
The deepest point
is Challenger Deep.

559
00:29:42,880 --> 00:29:46,450
And then I have things to
say about Challenger Deep--

560
00:29:46,450 --> 00:29:49,630
the property of elevation
above sea level, the value

561
00:29:49,630 --> 00:29:52,280
is minus 11 kilometers.

562
00:29:52,280 --> 00:29:55,900


563
00:29:55,900 --> 00:30:00,600
Now here's yet another
view of the same data

564
00:30:00,600 --> 00:30:04,530
once more, with numeric IDs.

565
00:30:04,530 --> 00:30:08,150
So this is the same information,
the same four statements.

566
00:30:08,150 --> 00:30:13,020
But this time, in
addition to using words,

567
00:30:13,020 --> 00:30:21,270
I'm also including weird
numbers following either Q or P.

568
00:30:21,270 --> 00:30:25,890
So P stands for property.

569
00:30:25,890 --> 00:30:30,330
So the highest point
property is P610.

570
00:30:30,330 --> 00:30:34,216
And the deepest point
property is P1589.

571
00:30:34,216 --> 00:30:35,340
What do these numbers mean?

572
00:30:35,340 --> 00:30:36,985
They don't mean anything at all.

573
00:30:36,985 --> 00:30:37,860
They're just numbers.

574
00:30:37,860 --> 00:30:39,760
They're just sequential numbers.

575
00:30:39,760 --> 00:30:42,600
And if I create a new
Wikidata item right now,

576
00:30:42,600 --> 00:30:46,020
it'll get just the
next available number.

577
00:30:46,020 --> 00:30:47,790
So they're just numbers.

578
00:30:47,790 --> 00:30:49,080
So P stands for property.

579
00:30:49,080 --> 00:30:51,480
What does Q stand for?

580
00:30:51,480 --> 00:30:53,460
Does anyone know?

581
00:30:53,460 --> 00:30:58,500
It's a trick question
because it's hard to guess.

582
00:30:58,500 --> 00:31:01,896
But the principal
architect of Wikidata,

583
00:31:01,896 --> 00:31:07,860
a Wikipedian named Danny
[INAUDIBLE] and data scientist,

584
00:31:07,860 --> 00:31:10,950
is married to a lovely
lady named [INAUDIBLE]

585
00:31:10,950 --> 00:31:16,320
spelled with a Q. And
this is a loving tribute.

586
00:31:16,320 --> 00:31:21,780
And she's also a Wikipedian and
an admin of Uzbek Wikipedia.

587
00:31:21,780 --> 00:31:31,650
So Q2 is just the numeric
identifier of the item Earth.

588
00:31:31,650 --> 00:31:36,190
And Q513 is the
identifier of Mt.

589
00:31:36,190 --> 00:31:37,310
Everest.

590
00:31:37,310 --> 00:31:42,950
You notice that we use that ID
across the statement, right?

591
00:31:42,950 --> 00:31:48,520
So from Wikidata's
perspective, this

592
00:31:48,520 --> 00:31:53,290
is actually what the
database actually contains.

593
00:31:53,290 --> 00:31:55,030
What we were saying with words--

594
00:31:55,030 --> 00:31:57,650
the Earth, highest
point, whatever--

595
00:31:57,650 --> 00:31:58,540
never mind that.

596
00:31:58,540 --> 00:32:03,250
Q2 has P610 with a value Q513.

597
00:32:03,250 --> 00:32:06,190
That's what Wikidata
cares about, OK?

598
00:32:06,190 --> 00:32:09,770
Now that, you'll agree,
is a little inaccessible.

599
00:32:09,770 --> 00:32:13,120
Just these lists of numbers,
that's a little hard.

600
00:32:13,120 --> 00:32:16,240
So Wikidata
understands and allows

601
00:32:16,240 --> 00:32:19,690
us to continue using our words.

602
00:32:19,690 --> 00:32:23,650
But actually, it gets
translated into numeric IDs.

603
00:32:23,650 --> 00:32:25,050
Now why is this a good idea?

604
00:32:25,050 --> 00:32:30,070


605
00:32:30,070 --> 00:32:33,070
Why can't we just
say Earth or Mt.

606
00:32:33,070 --> 00:32:35,120
Everest?

607
00:32:35,120 --> 00:32:36,170
Any thoughts?

608
00:32:36,170 --> 00:32:39,530
This is an open question.

609
00:32:39,530 --> 00:32:41,540
Why is this a good
idea to use numbers

610
00:32:41,540 --> 00:32:43,260
instead of the names of things?

611
00:32:43,260 --> 00:32:47,000


612
00:32:47,000 --> 00:32:51,750
Yes, because more than one
thing can have the same name.

613
00:32:51,750 --> 00:32:52,590
What do you mean?

614
00:32:52,590 --> 00:32:53,460
There's only one Mt.

615
00:32:53,460 --> 00:32:54,480
Everest.

616
00:32:54,480 --> 00:32:55,510
Well, yeah.

617
00:32:55,510 --> 00:32:58,710
But there there's also a
movie called-- and probably

618
00:32:58,710 --> 00:33:00,000
more than one-- called Mt.

619
00:33:00,000 --> 00:33:04,080
Everest, or a TV documentary
literally called Mt.

620
00:33:04,080 --> 00:33:06,590
Everest.

621
00:33:06,590 --> 00:33:09,960
And of course, if I'm
describing a person named

622
00:33:09,960 --> 00:33:14,930
Frank Johnson, not the only
Frank Johnson on the planet,

623
00:33:14,930 --> 00:33:16,180
right?

624
00:33:16,180 --> 00:33:17,760
But wait, you say.

625
00:33:17,760 --> 00:33:20,640
On Wikipedia we deal
with that problem, right?

626
00:33:20,640 --> 00:33:23,490
How do we deal with that
problem on Wikipedia?

627
00:33:23,490 --> 00:33:26,270
Does anyone in
the audience know?

628
00:33:26,270 --> 00:33:27,969
The standard way to
deal with the fact

629
00:33:27,969 --> 00:33:30,260
that there is more than one
Frank Johnson in the world,

630
00:33:30,260 --> 00:33:35,600
on Wikipedia, is to use
parentheses after the name.

631
00:33:35,600 --> 00:33:39,200
So there is Frank
Johnson (actor)

632
00:33:39,200 --> 00:33:42,620
and Frank Johnson
(politician), for example,

633
00:33:42,620 --> 00:33:44,700
if that's the distinction
we need to make.

634
00:33:44,700 --> 00:33:48,140
So you put in parentheses
kind of the minimal amount

635
00:33:48,140 --> 00:33:51,840
of information you need to tell
apart these Frank Johnsons.

636
00:33:51,840 --> 00:33:54,530
What if there's two
politician Frank Johnsons?

637
00:33:54,530 --> 00:33:58,880
Well, then you would say Frank
Johnson, (Delaware politician)

638
00:33:58,880 --> 00:34:01,960
versus Frank Johnson
(California politician), right?

639
00:34:01,960 --> 00:34:05,210
You just put in that bit of
context to tell them apart.

640
00:34:05,210 --> 00:34:07,640
So that's the solution
that Wikipedians came up

641
00:34:07,640 --> 00:34:12,469
with years and years ago
because they did need

642
00:34:12,469 --> 00:34:15,560
a unique name for the article.

643
00:34:15,560 --> 00:34:18,170
You can't have two
articles literally called

644
00:34:18,170 --> 00:34:20,790
Frank Johnson on Wikipedia.

645
00:34:20,790 --> 00:34:23,570
So that's the
solution on Wikipedia.

646
00:34:23,570 --> 00:34:28,429
But Wikidata was designed
much later, more than a decade

647
00:34:28,429 --> 00:34:31,340
after Wikipedia, and was
able to kind of learn

648
00:34:31,340 --> 00:34:34,520
from the experience
of Wikipedia, which

649
00:34:34,520 --> 00:34:39,380
has tremendous experience
with multilingualism, much

650
00:34:39,380 --> 00:34:42,870
more than most sites and
projects, as we know.

651
00:34:42,870 --> 00:34:44,659
And so the Wikidata
team understood

652
00:34:44,659 --> 00:34:47,840
from the get go that
this will be an issue,

653
00:34:47,840 --> 00:34:50,989
and it's better to use
numbers that are unequivocally

654
00:34:50,989 --> 00:34:54,800
different from each
other instead of labels,

655
00:34:54,800 --> 00:34:57,290
instead of the actual
name, the actual text,

656
00:34:57,290 --> 00:34:59,630
because names are not unique.

657
00:34:59,630 --> 00:35:03,260
Names can change, right?

658
00:35:03,260 --> 00:35:08,960
Just last year, there was a
big naming reform in Ukraine

659
00:35:08,960 --> 00:35:13,610
and a whole bunch of towns
and districts were renamed.

660
00:35:13,610 --> 00:35:17,330
Does that mean we should change
all the data that we have, like

661
00:35:17,330 --> 00:35:19,550
lose all the data that we
have about the old name?

662
00:35:19,550 --> 00:35:22,130
No, we ideally just
want to change the name

663
00:35:22,130 --> 00:35:24,020
without breaking links.

664
00:35:24,020 --> 00:35:28,550
So having the links actually
refer to the numbers

665
00:35:28,550 --> 00:35:32,090
is one way to ensure the
integrity of the data,

666
00:35:32,090 --> 00:35:35,360
of the links, when
renaming happens.

667
00:35:35,360 --> 00:35:39,230
Another reason is well, even
if the name doesn't change,

668
00:35:39,230 --> 00:35:42,230
not all humans call
everything the same, right?

669
00:35:42,230 --> 00:35:46,180
So Earth is Earth
in English, but it's

670
00:35:46,180 --> 00:35:48,210
[SPEAKING ARABIC] in Arabic.

671
00:35:48,210 --> 00:35:49,585
It's [SPEAKING HEBREW]
in Hebrew.

672
00:35:49,585 --> 00:35:53,480


673
00:35:53,480 --> 00:35:56,570
So obviously, Earth--
even that is not

674
00:35:56,570 --> 00:36:01,920
as unambiguous or unequivocal
as you might think.

675
00:36:01,920 --> 00:36:03,500
And so that is the
reason Wikidata,

676
00:36:03,500 --> 00:36:07,640
which is built to be
multilingual from the start,

677
00:36:07,640 --> 00:36:11,230
talks about numbers
rather than labels.

678
00:36:11,230 --> 00:36:12,150
OK.

679
00:36:12,150 --> 00:36:15,370
Ha, I had a whole slide
about that and I forgot.

680
00:36:15,370 --> 00:36:17,830
Yes, so even London,
again, is not

681
00:36:17,830 --> 00:36:20,710
just London, England, which is
what you were thinking about.

682
00:36:20,710 --> 00:36:22,030
It's also a city in Canada.

683
00:36:22,030 --> 00:36:26,260
And it's also a family
name, like Jack London.

684
00:36:26,260 --> 00:36:27,430
It's also a movie company.

685
00:36:27,430 --> 00:36:32,230
There must be some hotel
named London somewhere.

686
00:36:32,230 --> 00:36:36,070
This is a good opportunity
to remind everyone

687
00:36:36,070 --> 00:36:41,110
that the vast
majority of humankind

688
00:36:41,110 --> 00:36:45,700
does not speak a
word of English.

689
00:36:45,700 --> 00:36:48,790
That's a statistic
worth remembering.

690
00:36:48,790 --> 00:36:55,240
The vast majority of the planet
does not speak English at all.

691
00:36:55,240 --> 00:36:57,070
That does not
contradict the datum

692
00:36:57,070 --> 00:37:00,070
that English is the most
widely spoken language.

693
00:37:00,070 --> 00:37:02,860
And yet, in aggregate,
a majority of people

694
00:37:02,860 --> 00:37:07,180
speak other languages,
and not English at all.

695
00:37:07,180 --> 00:37:13,150
So moving swiftly on, this
is a pause for questions

696
00:37:13,150 --> 00:37:15,610
about what I've covered so far.

697
00:37:15,610 --> 00:37:17,390
Any questions in the audience?

698
00:37:17,390 --> 00:37:19,450
If not, we moved to IRC.

699
00:37:19,450 --> 00:37:21,042
If there are any questions--

700
00:37:21,042 --> 00:37:23,880


701
00:37:23,880 --> 00:37:26,891
Any questions?

702
00:37:26,891 --> 00:37:27,390
No?

703
00:37:27,390 --> 00:37:28,305
IRC?

704
00:37:28,305 --> 00:37:29,490
Any questions?

705
00:37:29,490 --> 00:37:33,580


706
00:37:33,580 --> 00:37:34,180
OK.

707
00:37:34,180 --> 00:37:38,170
We will have additional
pauses for questions later.

708
00:37:38,170 --> 00:37:41,470
But enough of my hand-waving.

709
00:37:41,470 --> 00:37:44,590
Let's go explore Wikidata.

710
00:37:44,590 --> 00:37:49,730
So Wikidata lives
at wikidata.org.

711
00:37:49,730 --> 00:37:59,570
And Wikidata already has
more than 25 million items.

712
00:37:59,570 --> 00:38:05,570
That is, it collects
statements about more than 25

713
00:38:05,570 --> 00:38:08,270
million topics.

714
00:38:08,270 --> 00:38:12,170
It has many, many more
than 25 million statements

715
00:38:12,170 --> 00:38:14,660
because many of these items
have dozens or hundreds

716
00:38:14,660 --> 00:38:16,370
of statements.

717
00:38:16,370 --> 00:38:20,720
So it documents 25
million things--

718
00:38:20,720 --> 00:38:23,153
people, books, rivers, whatever.

719
00:38:23,153 --> 00:38:26,010


720
00:38:26,010 --> 00:38:28,800
Just to give us a sense
of how big that number is,

721
00:38:28,800 --> 00:38:32,430
how many articles do we
have on English Wikipedia?

722
00:38:32,430 --> 00:38:35,610
More than-- yes, more
than 5 million articles.

723
00:38:35,610 --> 00:38:37,990
And that's the
largest Wikipedia.

724
00:38:37,990 --> 00:38:41,100
So Wikidata is
already describing

725
00:38:41,100 --> 00:38:45,450
more than five times, or
about five times as many items

726
00:38:45,450 --> 00:38:48,460
as even our largest Wikipedia.

727
00:38:48,460 --> 00:38:50,840
So obviously,
Wikidata contains data

728
00:38:50,840 --> 00:38:56,900
about things that have no
article on any Wikipedia.

729
00:38:56,900 --> 00:39:01,980
It is a much, much larger,
more comprehensive project.

730
00:39:01,980 --> 00:39:04,250
All right, the second
thing we might notice

731
00:39:04,250 --> 00:39:07,610
is, well, this looks kind
of like Wikipedia, right?

732
00:39:07,610 --> 00:39:11,210
If we've never visited, it
looks kind of like Wikipedia.

733
00:39:11,210 --> 00:39:13,490
It has this sidebar.

734
00:39:13,490 --> 00:39:15,290
It has these buttons at the top.

735
00:39:15,290 --> 00:39:17,810
It looks like it's
from the '90s.

736
00:39:17,810 --> 00:39:18,770
Yeah.

737
00:39:18,770 --> 00:39:20,900
So the reason it
looks like Wikipedia

738
00:39:20,900 --> 00:39:24,410
is that it is a wiki running
on Mediawiki software.

739
00:39:24,410 --> 00:39:28,430
It is running on software
very much like Wikipedia.

740
00:39:28,430 --> 00:39:32,180
But it is running on
a kind of modification

741
00:39:32,180 --> 00:39:34,010
of the standard wiki software.

742
00:39:34,010 --> 00:39:36,170
It has an additional,
very important component

743
00:39:36,170 --> 00:39:38,630
named Wikibase,
which gives it all

744
00:39:38,630 --> 00:39:42,700
of its structured and
linked data power.

745
00:39:42,700 --> 00:39:46,763
So let's start
exploring Wikidata.

746
00:39:46,763 --> 00:39:52,830


747
00:39:52,830 --> 00:39:55,770
Let's take something local--

748
00:39:55,770 --> 00:39:57,530
Harvey Milk.

749
00:39:57,530 --> 00:40:00,190
Harvey Milk.

750
00:40:00,190 --> 00:40:03,460
What does Wikidata
know about Harvey Milk?

751
00:40:03,460 --> 00:40:06,730
For those on YouTube
who may not be local,

752
00:40:06,730 --> 00:40:15,580
he's a San Francisco politician
and gay rights activist

753
00:40:15,580 --> 00:40:18,380
who was murdered in the '70s.

754
00:40:18,380 --> 00:40:21,280
It was very significant in
the history of those struggles

755
00:40:21,280 --> 00:40:22,710
in this country.

756
00:40:22,710 --> 00:40:27,220
So what does Wikidata
tell us about Harvey Milk?

757
00:40:27,220 --> 00:40:29,770
Well, the first
thing is it knows

758
00:40:29,770 --> 00:40:34,562
that Harvey Milk is Q17141.

759
00:40:34,562 --> 00:40:36,520
That's the most important
piece of information,

760
00:40:36,520 --> 00:40:38,770
is first of all, that
is the identifier.

761
00:40:38,770 --> 00:40:42,490
That is the item
number of all the data

762
00:40:42,490 --> 00:40:46,150
that we will collect
about Harvey Milk.

763
00:40:46,150 --> 00:40:50,020
The second thing you see
right under the title

764
00:40:50,020 --> 00:40:54,730
is this line, this very,
very brief summary, right?

765
00:40:54,730 --> 00:40:59,620
"American politician who became
a martyr in the gay community."

766
00:40:59,620 --> 00:41:02,080
This line is the
description line.

767
00:41:02,080 --> 00:41:04,640
So the name of the item--

768
00:41:04,640 --> 00:41:05,980
this is the label.

769
00:41:05,980 --> 00:41:07,450
We call it label on Wikidata.

770
00:41:07,450 --> 00:41:08,740
That's the label.

771
00:41:08,740 --> 00:41:10,990
And this line is
the description.

772
00:41:10,990 --> 00:41:13,480
Now why is this
description important?

773
00:41:13,480 --> 00:41:16,990
This is the description that
helps us tell this Harvey

774
00:41:16,990 --> 00:41:23,230
Milk from any other Harvey
Milk that may exist, all right?

775
00:41:23,230 --> 00:41:26,530
So again, this would
be useful if I'm

776
00:41:26,530 --> 00:41:30,190
looking up someone with a
slightly more generic name.

777
00:41:30,190 --> 00:41:33,910
That line will help me tell
apart the item about Harvey

778
00:41:33,910 --> 00:41:38,860
Milk the gay activist rather
than Harvey Milk the film

779
00:41:38,860 --> 00:41:41,750
actor, OK?

780
00:41:41,750 --> 00:41:43,100
And where is it coming from?

781
00:41:43,100 --> 00:41:48,690
Well, Wikidata has
this whole table,

782
00:41:48,690 --> 00:41:52,790
as you can see, with
descriptions and labels

783
00:41:52,790 --> 00:41:54,750
in other languages.

784
00:41:54,750 --> 00:41:59,600
So Wikidata is able to refer
to Harvey Milk in Arabic which,

785
00:41:59,600 --> 00:42:04,010
don't panic, is written
from right to left.

786
00:42:04,010 --> 00:42:07,730
It also knows what to
call him in Bulgarian.

787
00:42:07,730 --> 00:42:11,030
I mean, it's the same name,
but it's in a different script.

788
00:42:11,030 --> 00:42:13,640
In French, in Hebrew,
and that's it?

789
00:42:13,640 --> 00:42:17,960
Does it not know a name
for Harvey Milk in Italian?

790
00:42:17,960 --> 00:42:19,760
Of course it does.

791
00:42:19,760 --> 00:42:22,250
It actually has
labels for this person

792
00:42:22,250 --> 00:42:24,435
in many, many, many languages.

793
00:42:24,435 --> 00:42:30,080
It doesn't have descriptions in
every language, as you can see.

794
00:42:30,080 --> 00:42:30,800
OK?

795
00:42:30,800 --> 00:42:36,240
So why was Wikidata showing me
these languages and not others?

796
00:42:36,240 --> 00:42:39,260
I mean, why this somewhat
arbitrary collection--

797
00:42:39,260 --> 00:42:42,860
English, Arabic, Bulgarian,
German, French, and Hebrew?

798
00:42:42,860 --> 00:42:45,300
Because I told it to.

799
00:42:45,300 --> 00:42:50,390
So if we briefly click
over to my user page--

800
00:42:50,390 --> 00:42:52,730
again, like every wiki,
you have user accounts.

801
00:42:52,730 --> 00:42:53,960
You have user pages.

802
00:42:53,960 --> 00:42:55,380
This is my user page.

803
00:42:55,380 --> 00:42:59,750
And as you can see,
there's this little user

804
00:42:59,750 --> 00:43:03,230
information box here called
a Babel box by Wikipedians,

805
00:43:03,230 --> 00:43:06,610
where I list the
languages that I speak.

806
00:43:06,610 --> 00:43:11,000
And Wikidata uses this box
just to kind of helpfully

807
00:43:11,000 --> 00:43:12,944
show me these languages.

808
00:43:12,944 --> 00:43:14,360
Of course, all the
other languages

809
00:43:14,360 --> 00:43:19,580
are still available, as you saw,
by clicking the more languages.

810
00:43:19,580 --> 00:43:22,940
But this is just a
useful little way

811
00:43:22,940 --> 00:43:27,590
of getting the languages I
care about up there first.

812
00:43:27,590 --> 00:43:29,060
By the way, this is a lie.

813
00:43:29,060 --> 00:43:31,170
I don't actually
speak Bulgarian.

814
00:43:31,170 --> 00:43:33,740
That stayed on my user page
because I was demonstrating

815
00:43:33,740 --> 00:43:37,010
this in Bulgaria and I wanted
that label to show up there

816
00:43:37,010 --> 00:43:38,420
during the talk--

817
00:43:38,420 --> 00:43:40,250
just in case you
were going to tell me

818
00:43:40,250 --> 00:43:43,840
a really good Bulgarian joke.

819
00:43:43,840 --> 00:43:48,470
OK so for example, Hebrew
is my mother tongue.

820
00:43:48,470 --> 00:43:51,730
And we have a Hebrew
label for Harvey Milk.

821
00:43:51,730 --> 00:43:53,810
But we don't have a description.

822
00:43:53,810 --> 00:44:00,950
So let's fix that right now by
clicking the edit button right

823
00:44:00,950 --> 00:44:01,960
here.

824
00:44:01,960 --> 00:44:05,930
I click edit, and this
table became editable.

825
00:44:05,930 --> 00:44:09,661
And now I can very briefly
type a description.

826
00:44:09,661 --> 00:44:22,899


827
00:44:22,899 --> 00:44:24,440
AUDIENCE: Online in
about 20 seconds.

828
00:44:24,440 --> 00:44:25,400
But can we hold it?

829
00:44:25,400 --> 00:44:26,066
ASAF BARTOV: OK.

830
00:44:26,066 --> 00:44:28,454


831
00:44:28,454 --> 00:44:30,430
That was good timing
for the screen to crash.

832
00:44:30,430 --> 00:44:53,642


833
00:44:53,642 --> 00:44:54,142
OK?

834
00:44:54,142 --> 00:44:59,082


835
00:44:59,082 --> 00:45:01,800
Are we back?

836
00:45:01,800 --> 00:45:02,850
OK.

837
00:45:02,850 --> 00:45:03,690
Sorry about that.

838
00:45:03,690 --> 00:45:07,500
So this was all about what to
call him in different languages

839
00:45:07,500 --> 00:45:09,930
and scripts and how to
tell this person apart

840
00:45:09,930 --> 00:45:13,590
from other people with
potentially the same name.

841
00:45:13,590 --> 00:45:17,930
Let's scroll down and see
what else does Wikidata

842
00:45:17,930 --> 00:45:19,680
know about this person?

843
00:45:19,680 --> 00:45:24,060
So as you can see, this is
a list of statements, right?

844
00:45:24,060 --> 00:45:25,500
This is a list of statements.

845
00:45:25,500 --> 00:45:27,900
And the properties
are on the left,

846
00:45:27,900 --> 00:45:30,340
the values are on the right.

847
00:45:30,340 --> 00:45:33,870
So the first thing Wikidata
knows about Harvey Milk

848
00:45:33,870 --> 00:45:38,520
is a very important
property called instance of.

849
00:45:38,520 --> 00:45:39,910
Instance of.

850
00:45:39,910 --> 00:45:44,690
And the property instance of
answers the very basic question

851
00:45:44,690 --> 00:45:49,460
what kind of thing is
this that I'm describing?

852
00:45:49,460 --> 00:45:50,870
Is it a book?

853
00:45:50,870 --> 00:45:51,980
Is it a poem?

854
00:45:51,980 --> 00:45:53,570
Is it a mountain?

855
00:45:53,570 --> 00:45:55,520
Is it a theological concept?

856
00:45:55,520 --> 00:45:57,800
No, it's a human.

857
00:45:57,800 --> 00:46:00,020
It's a person, OK?

858
00:46:00,020 --> 00:46:01,880
The item about Mt.

859
00:46:01,880 --> 00:46:07,070
Everest will say
instance of mountain, OK?

860
00:46:07,070 --> 00:46:10,790
This is a very
important property.

861
00:46:10,790 --> 00:46:12,500
Why is it important?

862
00:46:12,500 --> 00:46:14,630
Wouldn't anyone looking
at this know that this is

863
00:46:14,630 --> 00:46:15,550
a human being?

864
00:46:15,550 --> 00:46:16,310
Yes.

865
00:46:16,310 --> 00:46:18,720
Anyone looking at
this will know.

866
00:46:18,720 --> 00:46:23,780
But if I want a computer to
be able to pull information

867
00:46:23,780 --> 00:46:28,160
about people, I want to
be able to easily exclude

868
00:46:28,160 --> 00:46:30,680
all the mountains and
poems and other things that

869
00:46:30,680 --> 00:46:33,440
are not people from my query.

870
00:46:33,440 --> 00:46:37,400
So this single datum,
this single piece of data,

871
00:46:37,400 --> 00:46:41,720
is what tells computers and
algorithms very clearly,

872
00:46:41,720 --> 00:46:42,890
this is a human.

873
00:46:42,890 --> 00:46:47,340
Things that aren't instance
of human are other things.

874
00:46:47,340 --> 00:46:48,230
OK?

875
00:46:48,230 --> 00:46:50,145
So it may sound very
trivial, but it's not.

876
00:46:50,145 --> 00:46:51,770
It's very important
to have an instance

877
00:46:51,770 --> 00:46:54,077
of field for Wikidata items.

878
00:46:54,077 --> 00:46:55,410
All right, what else do we know?

879
00:46:55,410 --> 00:46:59,360
Well, Wikidata knows about
an image for Harvey Milk.

880
00:46:59,360 --> 00:47:02,982
Again, we can find a ton of
images-- or maybe not a ton,

881
00:47:02,982 --> 00:47:04,940
but we can find dozens
of images of Harvey Milk

882
00:47:04,940 --> 00:47:10,430
on Commons, on our Wikimedia
multimedia repository.

883
00:47:10,430 --> 00:47:13,430
So why should we have a
single image here on Wikidata?

884
00:47:13,430 --> 00:47:16,280
Again, this is
mostly for reusers.

885
00:47:16,280 --> 00:47:18,920
If I'm building some kind of
tool that pulls information

886
00:47:18,920 --> 00:47:21,680
from Wikidata, it's
nice if there's

887
00:47:21,680 --> 00:47:24,680
at least one representative
image to kind of use

888
00:47:24,680 --> 00:47:30,300
as the default or immediate
image for Harvey Milk

889
00:47:30,300 --> 00:47:33,120
in some other reused context.

890
00:47:33,120 --> 00:47:34,770
All right, sex or gender--

891
00:47:34,770 --> 00:47:35,670
male.

892
00:47:35,670 --> 00:47:38,790
Country of citizenship--
United States of America.

893
00:47:38,790 --> 00:47:39,910
Given name is Harvey.

894
00:47:39,910 --> 00:47:41,580
The date of birth is so and so.

895
00:47:41,580 --> 00:47:44,340
The place of birth is Woodmere.

896
00:47:44,340 --> 00:47:45,870
The place of death
is San Francisco.

897
00:47:45,870 --> 00:47:48,640
The manner of death is homicide.

898
00:47:48,640 --> 00:47:50,930
Wikidata knows that.

899
00:47:50,930 --> 00:47:55,700
Now again, every
little datum like that

900
00:47:55,700 --> 00:48:02,210
is the basis for later querying
and answering questions.

901
00:48:02,210 --> 00:48:07,390
So the fact that we record the
manner of death of people--

902
00:48:07,390 --> 00:48:09,230
or at least of some people--

903
00:48:09,230 --> 00:48:11,900
will allow us later
to go, you know,

904
00:48:11,900 --> 00:48:17,120
who are some people from
Belgium who died by homicide?

905
00:48:17,120 --> 00:48:24,650
That's a question Wikidata can
answer, thanks to this field.

906
00:48:24,650 --> 00:48:27,680
The other thing I mentioned
is that things are links.

907
00:48:27,680 --> 00:48:29,680
So the place of
birth is Woodmere.

908
00:48:29,680 --> 00:48:31,900
I don't know where
Woodmere is, but I

909
00:48:31,900 --> 00:48:34,390
can click that and find out.

910
00:48:34,390 --> 00:48:38,270
Here is the Wikidata item
about Woodmere, right?

911
00:48:38,270 --> 00:48:41,230
It was the value in the
statement about Harvey Milk,

912
00:48:41,230 --> 00:48:43,900
but now I'm looking at
the item about Woodmere.

913
00:48:43,900 --> 00:48:48,047
And it turns out it's in
Nassau County, New York, right?

914
00:48:48,047 --> 00:48:50,380
And of course, Wikidata has
a whole bunch of information

915
00:48:50,380 --> 00:48:55,450
for me about Woodmere--

916
00:48:55,450 --> 00:48:59,720
what country it's in and the
coordinates and the population

917
00:48:59,720 --> 00:49:06,230
and the area, all the things you
would expect about a place, OK?

918
00:49:06,230 --> 00:49:07,512
Let's get back to Harvey Milk.

919
00:49:07,512 --> 00:49:10,370


920
00:49:10,370 --> 00:49:13,260
So the manner of death,
the cause of death--

921
00:49:13,260 --> 00:49:16,880
now here, Wikidata gives
us excellent information.

922
00:49:16,880 --> 00:49:20,390
The actual cause of death
is ballistic trauma.

923
00:49:20,390 --> 00:49:22,160
That's a professional term.

924
00:49:22,160 --> 00:49:27,560
And this statement
has qualifiers.

925
00:49:27,560 --> 00:49:30,650
So until now, I was talking
about triples, right?

926
00:49:30,650 --> 00:49:33,260
The item has a property
with a certain value.

927
00:49:33,260 --> 00:49:35,270
Actually, each
statement can also

928
00:49:35,270 --> 00:49:38,030
have a number of
qualifiers which

929
00:49:38,030 --> 00:49:45,424
add aspects of information,
still about that one question

930
00:49:45,424 --> 00:49:46,590
that we're answering, right?

931
00:49:46,590 --> 00:49:49,904
So if this property
answers cause of death,

932
00:49:49,904 --> 00:49:51,320
it's not discussing
anything else.

933
00:49:51,320 --> 00:49:52,880
It's not discussing languages.

934
00:49:52,880 --> 00:49:54,920
It's not discussing
date of birth, right?

935
00:49:54,920 --> 00:49:56,930
It's talking about
the cause of death.

936
00:49:56,930 --> 00:49:59,300
But we're not just
saying ballistic trauma.

937
00:49:59,300 --> 00:50:04,550
We're saying ballistic trauma
with the quantity attribute

938
00:50:04,550 --> 00:50:05,660
being five.

939
00:50:05,660 --> 00:50:07,550
What does that mean?

940
00:50:07,550 --> 00:50:08,870
Five bullets, right?

941
00:50:08,870 --> 00:50:12,780
There are five
ballistic traumas.

942
00:50:12,780 --> 00:50:15,300
He was he was shot five times.

943
00:50:15,300 --> 00:50:18,210
And he was shot by this
person named Dan White.

944
00:50:18,210 --> 00:50:25,020
And this ballistic trauma,
like this actual shooting,

945
00:50:25,020 --> 00:50:28,420
is itself the subject
of this other thing.

946
00:50:28,420 --> 00:50:31,440
This is a link to a
whole other Wikidata

947
00:50:31,440 --> 00:50:35,510
item about the Moscone-Milk
assassinations.

948
00:50:35,510 --> 00:50:38,610
Moscone was the San
Francisco mayor at the time.

949
00:50:38,610 --> 00:50:43,540


950
00:50:43,540 --> 00:50:47,510
We'll see slightly better or
easier to understand examples

951
00:50:47,510 --> 00:50:49,460
of qualifiers in a bit.

952
00:50:49,460 --> 00:50:54,440
So if this was
confusing, hang on.

953
00:50:54,440 --> 00:50:55,970
So he was killed by Dan White.

954
00:50:55,970 --> 00:50:57,800
He spoke English.

955
00:50:57,800 --> 00:50:59,960
His occupation--
here's an example

956
00:50:59,960 --> 00:51:03,140
of a property with more
than one value, right?

957
00:51:03,140 --> 00:51:06,260
So Milk was a politician.

958
00:51:06,260 --> 00:51:09,710
But he was also a Navy
officer, at least for a while.

959
00:51:09,710 --> 00:51:12,980
That was another thing that
he did during his life.

960
00:51:12,980 --> 00:51:15,350
And he was a human
rights activist, right?

961
00:51:15,350 --> 00:51:20,600
So some people are
writers and translators.

962
00:51:20,600 --> 00:51:22,610
So people can have more
than one occupation.

963
00:51:22,610 --> 00:51:26,310
People can speak more
than one language.

964
00:51:26,310 --> 00:51:29,130
Here's a better
example of a qualifier.

965
00:51:29,130 --> 00:51:35,090
So the property award received
has the value Presidential

966
00:51:35,090 --> 00:51:37,560
Medal of Freedom.

967
00:51:37,560 --> 00:51:42,570
And that award has an
attribute called point in time,

968
00:51:42,570 --> 00:51:44,070
like when was this?

969
00:51:44,070 --> 00:51:46,580
This was in 2009.

970
00:51:46,580 --> 00:51:50,510
Do you see that
this piece of data--

971
00:51:50,510 --> 00:52:04,780
2009-- is a sub-statement
or is subjugated

972
00:52:04,780 --> 00:52:09,621
to the context of this award,
was the Presidential Medal

973
00:52:09,621 --> 00:52:10,120
of Freedom?

974
00:52:10,120 --> 00:52:13,430
It can't just kind of
free float in the article.

975
00:52:13,430 --> 00:52:17,650
It's not that 2009 is itself
a meaningful thing, right?

976
00:52:17,650 --> 00:52:21,550
This medal was awarded in 2009.

977
00:52:21,550 --> 00:52:22,170
If

978
00:52:22,170 --> 00:52:24,070
Wikidata doesn't
tell us, for example,

979
00:52:24,070 --> 00:52:27,130
when he was a Navy officer, OK?

980
00:52:27,130 --> 00:52:30,100
But if we were, for example,
to look that up right now

981
00:52:30,100 --> 00:52:33,820
and find out that Milk was
a Navy officer between 1962

982
00:52:33,820 --> 00:52:39,542
and 1964, we could go back
here to the Navy officer bit

983
00:52:39,542 --> 00:52:41,010
and click edit.

984
00:52:41,010 --> 00:52:44,190
This is how I edit this
particular little piece

985
00:52:44,190 --> 00:52:45,360
of information.

986
00:52:45,360 --> 00:52:49,350
And add a qualifier like this.

987
00:52:49,350 --> 00:52:51,300
I click Add Qualifier.

988
00:52:51,300 --> 00:52:57,660
And I could pick start
time and end time, right?

989
00:52:57,660 --> 00:53:04,990
And then I could
type 1962 to 1964,

990
00:53:04,990 --> 00:53:08,000
and that would be
teaching Wikidata.

991
00:53:08,000 --> 00:53:10,660
Oh, I'm sorry, I meant to
do that for Navy officer.

992
00:53:10,660 --> 00:53:11,230
OK.

993
00:53:11,230 --> 00:53:14,800
But, you know,
that is the exact--

994
00:53:14,800 --> 00:53:18,400
the accurate time span
of that statement.

995
00:53:18,400 --> 00:53:22,850
So it's true to say about a
person, he was a Navy officer,

996
00:53:22,850 --> 00:53:25,990
even if of course he wasn't a
Navy officer his entire life.

997
00:53:25,990 --> 00:53:28,120
But it's better and
it's more accurate,

998
00:53:28,120 --> 00:53:32,260
to say he was a Navy officer
between 1962 and 1964.

999
00:53:32,260 --> 00:53:35,380
Don't worry, I'm
not saving this.

1000
00:53:35,380 --> 00:53:39,150
No vandalizing of
Wikidata in this session.

1001
00:53:39,150 --> 00:53:40,450
OK.

1002
00:53:40,450 --> 00:53:41,140
Moving on.

1003
00:53:41,140 --> 00:53:42,430
What else does Wikidata know?

1004
00:53:42,430 --> 00:53:43,960
He was educated at
this university.

1005
00:53:43,960 --> 00:53:46,970
He was a member of
this political party.

1006
00:53:46,970 --> 00:53:47,470
Right?

1007
00:53:47,470 --> 00:53:49,428
That's of course if
they're a relevant property

1008
00:53:49,428 --> 00:53:52,270
for a politician.

1009
00:53:52,270 --> 00:53:56,500
Religion, military branch,
what is the category on commons

1010
00:53:56,500 --> 00:53:58,720
that discusses this
item, is something

1011
00:53:58,720 --> 00:54:00,790
that Wikidata can tell us.

1012
00:54:00,790 --> 00:54:02,200
And that's it.

1013
00:54:02,200 --> 00:54:04,570
Now, is that everything
that we could possibly

1014
00:54:04,570 --> 00:54:07,780
say in a structured
way about Harvey Milk?

1015
00:54:07,780 --> 00:54:08,680
No.

1016
00:54:08,680 --> 00:54:13,570
We could probably find at
least a few more things to say.

1017
00:54:13,570 --> 00:54:17,170
We will see how to contribute
new information to Wikidata

1018
00:54:17,170 --> 00:54:19,990
in just a minute with
a different example.

1019
00:54:19,990 --> 00:54:23,360
But this-- all this was
a set of statements.

1020
00:54:23,360 --> 00:54:23,860
Right?

1021
00:54:23,860 --> 00:54:25,927
This was the title
statements here.

1022
00:54:25,927 --> 00:54:28,840


1023
00:54:28,840 --> 00:54:31,160
But at the bottom of the
list of statements is

1024
00:54:31,160 --> 00:54:34,300
another section
called identifiers.

1025
00:54:34,300 --> 00:54:36,960
And I want to spend a minute
talking about what that is.

1026
00:54:36,960 --> 00:54:43,630
So identifiers is a
collection of keys.

1027
00:54:43,630 --> 00:54:47,980
A collection of
IDs, or codes, that

1028
00:54:47,980 --> 00:54:52,890
are keys to other
information sources.

1029
00:54:52,890 --> 00:54:58,560
And a lot of Wikidata items
have a whole series of keys

1030
00:54:58,560 --> 00:55:03,030
to other databases, other
sites, other repositories,

1031
00:55:03,030 --> 00:55:08,340
that help you or a computer
be able to access not just

1032
00:55:08,340 --> 00:55:12,240
some database and look for
information about Harvey Milk,

1033
00:55:12,240 --> 00:55:16,950
but access the exact record
relevant to Harvey Milk.

1034
00:55:16,950 --> 00:55:20,280
And again, if you imagine
someone named John Smith,

1035
00:55:20,280 --> 00:55:21,690
that is really valuable, right?

1036
00:55:21,690 --> 00:55:23,250
If you're not just
told, oh yeah,

1037
00:55:23,250 --> 00:55:24,875
you can look at the
Library of Congress

1038
00:55:24,875 --> 00:55:27,840
for John Smith,
good luck with that.

1039
00:55:27,840 --> 00:55:30,240
Or if I tell you, go to
the Library of Congress

1040
00:55:30,240 --> 00:55:35,810
to this record for this John
Smith, you see the difference.

1041
00:55:35,810 --> 00:55:42,080
So Wikidata tells us that on
VIAF, which is the Virtual

1042
00:55:42,080 --> 00:55:44,570
International Authority File.

1043
00:55:44,570 --> 00:55:50,140
It's an aggregated master
index built by bibliographers,

1044
00:55:50,140 --> 00:55:52,831
by librarians, of people.

1045
00:55:52,831 --> 00:55:53,330
Right?

1046
00:55:53,330 --> 00:55:56,720
It tries to kind of aggregate
information about people

1047
00:55:56,720 --> 00:55:59,270
across library
catalogs everywhere.

1048
00:55:59,270 --> 00:56:05,120
So the VIAF ID for Harvey
Milk is this number.

1049
00:56:05,120 --> 00:56:07,340
And conveniently,
if I click that,

1050
00:56:07,340 --> 00:56:10,160
I'm not taking to
some Wikidata item.

1051
00:56:10,160 --> 00:56:13,010
I'm actually taken
to the relevant site.

1052
00:56:13,010 --> 00:56:16,760
So this took me right
to viaf.org, the Virtual

1053
00:56:16,760 --> 00:56:21,770
International Authority File,
directly to their record

1054
00:56:21,770 --> 00:56:23,310
about Harvey Milk.

1055
00:56:23,310 --> 00:56:23,810
All right?

1056
00:56:23,810 --> 00:56:27,290
And that itself leads
me to national catalogs

1057
00:56:27,290 --> 00:56:29,630
of national libraries
all over the world.

1058
00:56:29,630 --> 00:56:32,360
We won't get into the
things you can do with VIAF.

1059
00:56:32,360 --> 00:56:37,220
The point is Wikidata
contained the piece of thread

1060
00:56:37,220 --> 00:56:40,820
that I could tug on
to arrive directly

1061
00:56:40,820 --> 00:56:44,840
to that information
in other databases.

1062
00:56:44,840 --> 00:56:45,680
Yes.

1063
00:56:45,680 --> 00:56:49,670
And it has that for many,
many kinds of databases.

1064
00:56:49,670 --> 00:56:53,150
The BNF, for example, that's
the National Library of France.

1065
00:56:53,150 --> 00:56:56,270
And that will take me
to that index card.

1066
00:56:56,270 --> 00:56:57,320
IMDB.

1067
00:56:57,320 --> 00:56:58,620
We all know IMDB, right?

1068
00:56:58,620 --> 00:57:03,320
So here I have the key
to Harvey Milk in IMDB.

1069
00:57:03,320 --> 00:57:05,810
And this is what IMDB says
about Harvey Milk, right?

1070
00:57:05,810 --> 00:57:08,480
They have their own piece
of information about him,

1071
00:57:08,480 --> 00:57:11,590
of course, with filmography
and everything else.

1072
00:57:11,590 --> 00:57:15,140
And see, I did not have
to search IMDB for it.

1073
00:57:15,140 --> 00:57:19,070
I just had the key right
there waiting for me.

1074
00:57:19,070 --> 00:57:21,080
Now, again, this is
very convenient for me

1075
00:57:21,080 --> 00:57:24,590
as I just showed you the
human use case for this.

1076
00:57:24,590 --> 00:57:27,530
But it's even more
powerful in aggregate

1077
00:57:27,530 --> 00:57:35,450
when we allow computers to
traverse this network of links

1078
00:57:35,450 --> 00:57:36,110
between--

1079
00:57:36,110 --> 00:57:41,690
not just within wiki data, but
between data storage facilities

1080
00:57:41,690 --> 00:57:43,850
and repositories.

1081
00:57:43,850 --> 00:57:49,790
This is sometimes referred to
as the linked data open cloud.

1082
00:57:49,790 --> 00:57:52,670
Cloud, because it's multiple
different repositories

1083
00:57:52,670 --> 00:57:54,740
that are interlinked.

1084
00:57:54,740 --> 00:58:02,210
And Wikidata is already, and
to a growing extent, the Nexus,

1085
00:58:02,210 --> 00:58:04,460
the connection
point between a lot

1086
00:58:04,460 --> 00:58:06,780
of these different databases.

1087
00:58:06,780 --> 00:58:09,230
So IMDB, for example,
it's a good example

1088
00:58:09,230 --> 00:58:11,300
because it's site
almost everyone knows,

1089
00:58:11,300 --> 00:58:14,000
IMDB has information
about Harvey Milk.

1090
00:58:14,000 --> 00:58:16,670
But that information
does not include a link

1091
00:58:16,670 --> 00:58:19,140
to the French National Library.

1092
00:58:19,140 --> 00:58:19,645
Right?

1093
00:58:19,645 --> 00:58:20,770
Do you see what I'm saying?

1094
00:58:20,770 --> 00:58:25,550
So IMDB is a data repository
with IDs and allows linking.

1095
00:58:25,550 --> 00:58:28,100
But it does not give you
what Wikidata gives you which

1096
00:58:28,100 --> 00:58:32,850
is this kind of collection of--

1097
00:58:32,850 --> 00:58:36,330
it's like a junction of all
these different data sources.

1098
00:58:36,330 --> 00:58:37,910
So Wikidata is the
place where you

1099
00:58:37,910 --> 00:58:40,730
can document these
interrelationships

1100
00:58:40,730 --> 00:58:41,640
or equivalencies.

1101
00:58:41,640 --> 00:58:42,140
Right?

1102
00:58:42,140 --> 00:58:48,770
So ID, you know, 587548 on IMDB
is discussing the same topic

1103
00:58:48,770 --> 00:58:52,260
as French National
Library ID whatever.

1104
00:58:52,260 --> 00:58:55,210
Wikidata contains that
piece of information.

1105
00:58:55,210 --> 00:58:59,090
that this ID in this database
is about the same person

1106
00:58:59,090 --> 00:59:04,050
as that ID in that database.

1107
00:59:04,050 --> 00:59:05,290
OK.

1108
00:59:05,290 --> 00:59:07,420
So that's what
identifiers are about.

1109
00:59:07,420 --> 00:59:11,320
Still scrolling down the
Wikidata item about Harvey

1110
00:59:11,320 --> 00:59:15,500
Milk, we have the site links.

1111
00:59:15,500 --> 00:59:20,840
The site links are links
to Wikimedia projects

1112
00:59:20,840 --> 00:59:22,770
that are related to this item.

1113
00:59:22,770 --> 00:59:25,250
So of course there
are Wikipedia articles

1114
00:59:25,250 --> 00:59:28,880
about Harvey Milk in many,
many different wikipedias.

1115
00:59:28,880 --> 00:59:31,700
Quite a few language versions.

1116
00:59:31,700 --> 00:59:34,960
And there are
pages on Wikiquote,

1117
00:59:34,960 --> 00:59:36,680
one of the sister projects.

1118
00:59:36,680 --> 00:59:38,630
There are pages on
Wikiquote with some quotes

1119
00:59:38,630 --> 00:59:40,130
from Harvey Milk.

1120
00:59:40,130 --> 00:59:45,060
And there is even a page for
Harvey Milk on Wikisource.

1121
00:59:45,060 --> 00:59:45,560
Right?

1122
00:59:45,560 --> 00:59:47,840
So this is a collection
of those links.

1123
00:59:47,840 --> 00:59:52,760
And those of you who have maybe
only dealt with Wikidata data

1124
00:59:52,760 --> 00:59:57,290
for inter-wiki links, which
we used to do in the old days

1125
00:59:57,290 --> 00:59:59,600
manually within
the article text,

1126
00:59:59,600 --> 01:00:01,716
now we do it through
Wikidata, so maybe that's

1127
01:00:01,716 --> 01:00:03,590
the only thing you didn't
know about Wikidata

1128
01:00:03,590 --> 01:00:10,130
is how to update these
inter-wiki tables on Wikidata.

1129
01:00:10,130 --> 01:00:11,430
All right.

1130
01:00:11,430 --> 01:00:14,090
So that concludes
our little tour

1131
01:00:14,090 --> 01:00:18,560
of the anatomy of
a Wikidata page.

1132
01:00:18,560 --> 01:00:22,370
I will just remind you that
it's a wiki page, which

1133
01:00:22,370 --> 01:00:26,120
means it has a discussion
page, a talk page.

1134
01:00:26,120 --> 01:00:27,960
This one happens to be empty.

1135
01:00:27,960 --> 01:00:30,092
But, you know, if we have
concerns or arguments

1136
01:00:30,092 --> 01:00:31,550
about some of the
data here that is

1137
01:00:31,550 --> 01:00:33,290
what we would use
to discuss this

1138
01:00:33,290 --> 01:00:36,830
and to arrive at consensus.

1139
01:00:36,830 --> 01:00:41,760
It also has a history view just
like every Wikipedia article.

1140
01:00:41,760 --> 01:00:47,402
So you can see here
a list of edits.

1141
01:00:47,402 --> 01:00:48,860
Maybe some of you
have never looked

1142
01:00:48,860 --> 01:00:51,710
at a history page on Wikipedia,
so this looks overwhelming.

1143
01:00:51,710 --> 01:00:55,040
But every line here,
every entry here,

1144
01:00:55,040 --> 01:00:58,240
is a single edit, a single
revision, a single change

1145
01:00:58,240 --> 01:01:00,440
to this Wikidata item.

1146
01:01:00,440 --> 01:01:01,670
Just Harvey Milk.

1147
01:01:01,670 --> 01:01:04,250
And you can see at the very
top this edit that I just

1148
01:01:04,250 --> 01:01:06,680
made-- this is my
volunteer account

1149
01:01:06,680 --> 01:01:09,650
and I just made this edit,
and in parentheses you

1150
01:01:09,650 --> 01:01:10,790
can see what I did.

1151
01:01:10,790 --> 01:01:14,640
I added an HE,
Hebrew, description.

1152
01:01:14,640 --> 01:01:16,930
And this is the text
that I added in Hebrew.

1153
01:01:16,930 --> 01:01:17,430
Right?

1154
01:01:17,430 --> 01:01:21,470
So we can see who added
what to the Wikidata item,

1155
01:01:21,470 --> 01:01:24,960
just like we can do
the same on Wikipedia.

1156
01:01:24,960 --> 01:01:26,390
So we have the revision history.

1157
01:01:26,390 --> 01:01:27,560
We can undo edits.

1158
01:01:27,560 --> 01:01:30,320
We can revert, just
like on Wikipedia.

1159
01:01:30,320 --> 01:01:34,420


1160
01:01:34,420 --> 01:01:36,940
And what else did I
want to show here?

1161
01:01:36,940 --> 01:01:40,930
We can add an item to my
watch list using the star,

1162
01:01:40,930 --> 01:01:42,020
just like on Wikipedia.

1163
01:01:42,020 --> 01:01:46,670
So we have all these
standard wiki features

1164
01:01:46,670 --> 01:01:47,878
that we would come to expect.

1165
01:01:47,878 --> 01:01:50,440


1166
01:01:50,440 --> 01:01:54,270
Let's pause for questions.

1167
01:01:54,270 --> 01:01:58,412
Any questions about what
we've covered so far?

1168
01:01:58,412 --> 01:02:02,573


1169
01:02:02,573 --> 01:02:03,073
Yes.

1170
01:02:03,073 --> 01:02:06,950


1171
01:02:06,950 --> 01:02:11,345
Are attributes of statements
precept for the specific value?

1172
01:02:11,345 --> 01:02:16,640


1173
01:02:16,640 --> 01:02:19,830
No they're not reset.

1174
01:02:19,830 --> 01:02:29,760
And generally Wikidata data does
not enforce by default logic.

1175
01:02:29,760 --> 01:02:32,130
So, I mean, there's
nothing to prevent you

1176
01:02:32,130 --> 01:02:38,700
from editing the
item about Brazil,

1177
01:02:38,700 --> 01:02:42,990
and adding the property height.

1178
01:02:42,990 --> 01:02:46,690


1179
01:02:46,690 --> 01:02:50,430
Now height is not a relevant
property for a country.

1180
01:02:50,430 --> 01:02:50,970
Right?

1181
01:02:50,970 --> 01:02:53,880
I mean, maybe average
elevation, maybe.

1182
01:02:53,880 --> 01:02:56,400
But not just height,
which is used for humans

1183
01:02:56,400 --> 01:02:59,040
or for physical things.

1184
01:02:59,040 --> 01:03:02,400
So you could add that
property to Brazil and save it

1185
01:03:02,400 --> 01:03:04,650
and the wiki would not complain.

1186
01:03:04,650 --> 01:03:07,590
Now in the background
there are kind

1187
01:03:07,590 --> 01:03:13,020
of extra wiki outside the
wiki prostheses for constraint

1188
01:03:13,020 --> 01:03:13,710
validation.

1189
01:03:13,710 --> 01:03:16,050
So there are bots and
other processes that

1190
01:03:16,050 --> 01:03:17,940
run, and occasionally,
for example,

1191
01:03:17,940 --> 01:03:26,570
identify non-living things
with a date of birth field.

1192
01:03:26,570 --> 01:03:27,720
That's nonsensical.

1193
01:03:27,720 --> 01:03:29,010
That should not exist.

1194
01:03:29,010 --> 01:03:31,710
If someone mistakenly added
that there are processes

1195
01:03:31,710 --> 01:03:34,350
that would flag
that to be fixed.

1196
01:03:34,350 --> 01:03:36,690
But the wiki itself,
Wikidata, will not

1197
01:03:36,690 --> 01:03:38,550
prevent you from adding that.

1198
01:03:38,550 --> 01:03:41,940
And that is by design
to keep things flexible.

1199
01:03:41,940 --> 01:03:43,930
So that people don't
run into, oh wait,

1200
01:03:43,930 --> 01:03:46,560
but I can't add this
because nobody thought

1201
01:03:46,560 --> 01:03:49,830
that I would need this, maybe.

1202
01:03:49,830 --> 01:03:54,530
I hope that answers
your question.

1203
01:03:54,530 --> 01:03:57,290
You say helpful
answer, question mark.

1204
01:03:57,290 --> 01:03:59,510
So was it a helpful answer, or?

1205
01:03:59,510 --> 01:04:03,940


1206
01:04:03,940 --> 01:04:04,440
OK.

1207
01:04:04,440 --> 01:04:05,426
Yes, Eleanor.

1208
01:04:05,426 --> 01:04:10,707
AUDIENCE: [INAUDIBLE]

1209
01:04:10,707 --> 01:04:12,040
ASAF BARTOV: Excellent question.

1210
01:04:12,040 --> 01:04:13,030
I'll repeat it.

1211
01:04:13,030 --> 01:04:16,180
You ask how do I find
the wiki data item

1212
01:04:16,180 --> 01:04:18,370
number from Wikipedia.

1213
01:04:18,370 --> 01:04:21,580
If I'm reading about Harvey Milk
and I want to look at the data

1214
01:04:21,580 --> 01:04:23,600
how do I do that?

1215
01:04:23,600 --> 01:04:27,400
That is an excellent question
and let's skip to Wikipedia.

1216
01:04:27,400 --> 01:04:32,030
Conveniently I have the
link right here on English.

1217
01:04:32,030 --> 01:04:35,600
So this is the Wikipedia
article about Harvey Milk

1218
01:04:35,600 --> 01:04:42,740
and every item on Wikipedia
should have a wiki data

1219
01:04:42,740 --> 01:04:47,660
item associated with it, but it
doesn't happen automatically.

1220
01:04:47,660 --> 01:04:51,470
So if I just created
a page on Wikipedia

1221
01:04:51,470 --> 01:04:55,010
I also need to create a
Wikidata entity for it

1222
01:04:55,010 --> 01:04:57,170
if it doesn't already exist.

1223
01:04:57,170 --> 01:04:59,420
It could already exist
because it was already

1224
01:04:59,420 --> 01:05:01,970
covered in a different
language, for example.

1225
01:05:01,970 --> 01:05:05,390
So that was parenthetical.

1226
01:05:05,390 --> 01:05:09,020
But every article on Wikipedia
should have, here on the side,

1227
01:05:09,020 --> 01:05:14,270
on the side are under Tools,
a link called Wikidata item.

1228
01:05:14,270 --> 01:05:15,450
Right here.

1229
01:05:15,450 --> 01:05:16,160
OK.

1230
01:05:16,160 --> 01:05:18,110
That Wikidata data
item is a link

1231
01:05:18,110 --> 01:05:21,710
that takes you to
Wikidata, to the entity,

1232
01:05:21,710 --> 01:05:23,510
and there you find the number.

1233
01:05:23,510 --> 01:05:25,370
You can-- you don't
even have to click it.

1234
01:05:25,370 --> 01:05:27,830
I mean, the URL itself
tells you the number.

1235
01:05:27,830 --> 01:05:34,620
The number, you see, it's
wikidata.org/wiki/q17141.

1236
01:05:34,620 --> 01:05:35,444
OK.

1237
01:05:35,444 --> 01:05:36,860
So that was an
excellent question.

1238
01:05:36,860 --> 01:05:37,686
Other questions?

1239
01:05:37,686 --> 01:05:38,185
Yes.

1240
01:05:38,185 --> 01:05:41,470


1241
01:05:41,470 --> 01:05:44,430
Yeah, about the additional
attributes, the qualifiers.

1242
01:05:44,430 --> 01:05:46,920
So, yes, I answered
more generically.

1243
01:05:46,920 --> 01:05:49,370
But just like the
properties themselves

1244
01:05:49,370 --> 01:05:53,390
are not limited per item,
the qualifiers per statement

1245
01:05:53,390 --> 01:05:57,750
are also not
entirely preordained.

1246
01:05:57,750 --> 01:05:59,570
But there is some
structure to it.

1247
01:05:59,570 --> 01:06:03,140
I don't want to go into it
at great length right now.

1248
01:06:03,140 --> 01:06:06,320
If we have time in the end
we can get back to that.

1249
01:06:06,320 --> 01:06:09,590
But some qualifiers are again
relevant for some things,

1250
01:06:09,590 --> 01:06:13,180
start time, end time,
and others won't be.

1251
01:06:13,180 --> 01:06:16,280
Wikidata does try to offer you--

1252
01:06:16,280 --> 01:06:18,710
you may remember when I
clicked add qualifier,

1253
01:06:18,710 --> 01:06:22,170
it gave me kind of drop down
of some relevant qualifiers.

1254
01:06:22,170 --> 01:06:24,475
So it does try to
help you in that way.

1255
01:06:24,475 --> 01:06:27,280


1256
01:06:27,280 --> 01:06:28,160
Other question?

1257
01:06:28,160 --> 01:06:31,180
Are the values for
instance of already

1258
01:06:31,180 --> 01:06:33,310
mappable to external ontologies?

1259
01:06:33,310 --> 01:06:36,500


1260
01:06:36,500 --> 01:06:41,310
That is a complicated question.

1261
01:06:41,310 --> 01:06:43,490
I'll help people understand
the question first.

1262
01:06:43,490 --> 01:06:48,570
So an ontology is a
structure, some kind

1263
01:06:48,570 --> 01:06:52,350
of hierarchy or
cloud, of entities

1264
01:06:52,350 --> 01:06:54,510
and their interrelationships.

1265
01:06:54,510 --> 01:06:56,920
An ontology would
say, for example,

1266
01:06:56,920 --> 01:06:58,710
a person is a living thing.

1267
01:06:58,710 --> 01:06:59,670
So is a dog.

1268
01:06:59,670 --> 01:07:02,340
They're both living things,
but they're different things.

1269
01:07:02,340 --> 01:07:09,910
And then, you know, say
things about those entities

1270
01:07:09,910 --> 01:07:11,350
and their interrelationships.

1271
01:07:11,350 --> 01:07:13,300
Now there are many,
many competing,

1272
01:07:13,300 --> 01:07:17,230
or coexisting models
of ontology's.

1273
01:07:17,230 --> 01:07:19,840
Many of them were created
for specific needs.

1274
01:07:19,840 --> 01:07:25,170
Many of them want to be
a universal ontology.

1275
01:07:25,170 --> 01:07:27,790
But of course it's
impossible to quite

1276
01:07:27,790 --> 01:07:32,150
agree on one complete
and simple ontology.

1277
01:07:32,150 --> 01:07:34,240
And so there are
many ontology's.

1278
01:07:34,240 --> 01:07:38,520
Which brings up your question,
can we map across ontology's?

1279
01:07:38,520 --> 01:07:43,840
Can we say that when wiki data
says instance of book that

1280
01:07:43,840 --> 01:07:47,260
is equivalent to some other
ontology saying instance

1281
01:07:47,260 --> 01:07:49,940
of bibliographic record?

1282
01:07:49,940 --> 01:07:50,860
And the answer is yes.

1283
01:07:50,860 --> 01:07:52,360
There are some such mappings.

1284
01:07:52,360 --> 01:07:54,420
They are incomplete.

1285
01:07:54,420 --> 01:07:58,240
And there's no kind of
auto magic thing happening

1286
01:07:58,240 --> 01:08:01,180
in the wiki vis-a-vis
those other ontology's.

1287
01:08:01,180 --> 01:08:03,250
That's kind of
left as an exercise

1288
01:08:03,250 --> 01:08:06,280
for those dealing with those
other ontology's, and for tool

1289
01:08:06,280 --> 01:08:09,880
builders and other
platform improvements

1290
01:08:09,880 --> 01:08:13,050
beyond Wikidata itself.

1291
01:08:13,050 --> 01:08:13,750
OK.

1292
01:08:13,750 --> 01:08:15,190
Other questions?

1293
01:08:15,190 --> 01:08:17,430
Yeah, we have one from
the YouTube stream.

1294
01:08:17,430 --> 01:08:21,160
Someone asked, why can't I
link Howard Carter's occupation

1295
01:08:21,160 --> 01:08:26,439
to archeologists when I use
an info box that fetches info

1296
01:08:26,439 --> 01:08:28,960
from Wikidata?

1297
01:08:28,960 --> 01:08:33,160
Why can't I link it
from the info box?

1298
01:08:33,160 --> 01:08:35,500
So, someone on the
stream answered

1299
01:08:35,500 --> 01:08:37,659
saying, because it's
an improper connection,

1300
01:08:37,659 --> 01:08:39,700
because the target is not
about the subject only.

1301
01:08:39,700 --> 01:08:43,020


1302
01:08:43,020 --> 01:08:46,710
The target is not
about the subject?

1303
01:08:46,710 --> 01:08:48,479
If I understand the
question correctly,

1304
01:08:48,479 --> 01:08:53,130
what you would want to be able
to do is from within Wikipedia

1305
01:08:53,130 --> 01:08:59,130
be able to say occupation
and link to a Wikidata entry

1306
01:08:59,130 --> 01:09:01,050
about archeology.

1307
01:09:01,050 --> 01:09:03,569
That doesn't quite
work that way.

1308
01:09:03,569 --> 01:09:05,430
We will get to a
little discussion

1309
01:09:05,430 --> 01:09:08,460
of that in an upcoming
section of this talk.

1310
01:09:08,460 --> 01:09:13,260
So I will defer the rest
of my answer to then.

1311
01:09:13,260 --> 01:09:15,319
OK.

1312
01:09:15,319 --> 01:09:19,160
So we're done with
questions for this phase,

1313
01:09:19,160 --> 01:09:22,850
and my browser got
tired of waiting for me.

1314
01:09:22,850 --> 01:09:26,551
So, yes.

1315
01:09:26,551 --> 01:09:27,050
All right.

1316
01:09:27,050 --> 01:09:36,850
So we took a look at Wikidata,
and we took questions.

1317
01:09:36,850 --> 01:09:41,020
So now, let's teach
Wikidata some new things.

1318
01:09:41,020 --> 01:09:44,020
Some things it
doesn't already know.

1319
01:09:44,020 --> 01:09:47,109
Let's look at this item here.

1320
01:09:47,109 --> 01:09:50,950
So this item is about one
of my favorite writers,

1321
01:09:50,950 --> 01:09:53,840
an American writer
named Helen Dewitt.

1322
01:09:53,840 --> 01:10:01,570
Wikidata, of course, fondly
refers to her as q54674,

1323
01:10:01,570 --> 01:10:03,070
but we can call
her Helen Dewitt.

1324
01:10:03,070 --> 01:10:05,740
And what can we contribute here?

1325
01:10:05,740 --> 01:10:10,600
So Wikidata has far less
information about Helen Dewitt.

1326
01:10:10,600 --> 01:10:13,144
Most of you probably haven't
heard of her, that's OK.

1327
01:10:13,144 --> 01:10:14,560
What does Wikidata
know about her?

1328
01:10:14,560 --> 01:10:16,450
Well instance of human.

1329
01:10:16,450 --> 01:10:17,800
We have a photo of her.

1330
01:10:17,800 --> 01:10:18,780
She's female.

1331
01:10:18,780 --> 01:10:20,530
She's an American.

1332
01:10:20,530 --> 01:10:21,790
Her name is Helen.

1333
01:10:21,790 --> 01:10:22,630
Date of birth.

1334
01:10:22,630 --> 01:10:23,650
Place of birth.

1335
01:10:23,650 --> 01:10:25,970
She's an author, a
novelist, a writer.

1336
01:10:25,970 --> 01:10:28,840
She was educated at the
University of Oxford.

1337
01:10:28,840 --> 01:10:33,160
And Wikidata knows what
her official website is.

1338
01:10:33,160 --> 01:10:35,780
That's useful, but that's it.

1339
01:10:35,780 --> 01:10:37,780
Now we can contribute
information here.

1340
01:10:37,780 --> 01:10:43,120
For example, she's an American
author writing in English.

1341
01:10:43,120 --> 01:10:45,550
So we could add
that information.

1342
01:10:45,550 --> 01:10:48,430
We could click the
Add button here.

1343
01:10:48,430 --> 01:10:50,200
And this is a good
moment to acknowledge

1344
01:10:50,200 --> 01:10:54,830
that the user interface of
Wikidata is a work in progress.

1345
01:10:54,830 --> 01:10:56,740
It's not as intuitive
as it might be.

1346
01:10:56,740 --> 01:10:58,570
So you need to
understand that click--

1347
01:10:58,570 --> 01:11:01,630
to add a completely
new property,

1348
01:11:01,630 --> 01:11:04,060
You need to click
this Add button.

1349
01:11:04,060 --> 01:11:08,020
If you want to add an additional
value to the property official

1350
01:11:08,020 --> 01:11:11,530
website, you need to
click this Add button.

1351
01:11:11,530 --> 01:11:13,780
It makes a kind of
sense with a shaded box.

1352
01:11:13,780 --> 01:11:15,880
But, you know, you need
to kind of pay attention,

1353
01:11:15,880 --> 01:11:18,901
and it's not as
friendly as it might be.

1354
01:11:18,901 --> 01:11:20,650
[COUGHING] Excuse me.

1355
01:11:20,650 --> 01:11:23,380
So, let's add a property here.

1356
01:11:23,380 --> 01:11:25,690
Click the Add button.

1357
01:11:25,690 --> 01:11:29,740
Again, Wikidata tries to
be useful by suggesting

1358
01:11:29,740 --> 01:11:32,760
some relevant
properties for humans.

1359
01:11:32,760 --> 01:11:36,640
A bit more morbidly it suggests,
how about date of death?

1360
01:11:36,640 --> 01:11:38,700
That's not cool, Wikidata.

1361
01:11:38,700 --> 01:11:40,480
Helen Dewitt is still alive.

1362
01:11:40,480 --> 01:11:42,700
So I will not add
date of death, but I

1363
01:11:42,700 --> 01:11:46,140
can add languages spoken,
written, or signed.

1364
01:11:46,140 --> 01:11:48,370
OK, so I click that.

1365
01:11:48,370 --> 01:11:51,670
And she writes in English.

1366
01:11:51,670 --> 01:11:54,450
I just type English-- whoops.

1367
01:11:54,450 --> 01:11:56,750
Not in Hebrew.

1368
01:11:56,750 --> 01:11:58,380
Don't panic.

1369
01:11:58,380 --> 01:12:01,010
I type English here.

1370
01:12:01,010 --> 01:12:04,250
And, oh, and of course Wikidata
has auto-complete, right?

1371
01:12:04,250 --> 01:12:06,080
So it tries to help me along.

1372
01:12:06,080 --> 01:12:10,100
But you will notice that
it has all kinds of things

1373
01:12:10,100 --> 01:12:10,940
called English.

1374
01:12:10,940 --> 01:12:14,030
I mean, it turns out that
there is a place in Indiana

1375
01:12:14,030 --> 01:12:16,370
called English, Indiana.

1376
01:12:16,370 --> 01:12:17,150
Did I mean that?

1377
01:12:17,150 --> 01:12:20,210
No, of course I didn't mean
that she writes her books

1378
01:12:20,210 --> 01:12:21,961
in English, Indiana.

1379
01:12:21,961 --> 01:12:22,460
Right?

1380
01:12:22,460 --> 01:12:26,180
But, you know, Wikidata gives me
the option of linking to that.

1381
01:12:26,180 --> 01:12:30,530
I also don't mean the botanist
Carl Schwartz English.

1382
01:12:30,530 --> 01:12:32,870
No, no I mean the
west Germanic language

1383
01:12:32,870 --> 01:12:34,029
originating in England.

1384
01:12:34,029 --> 01:12:34,820
That's what I mean.

1385
01:12:34,820 --> 01:12:36,110
So I click that.

1386
01:12:36,110 --> 01:12:37,760
And I click Save.

1387
01:12:37,760 --> 01:12:38,450
And that's it.

1388
01:12:38,450 --> 01:12:41,780
Again I have just made
an edit to Wikidata.

1389
01:12:41,780 --> 01:12:47,750
I have just taught Wikidata
that this author speaks English.

1390
01:12:47,750 --> 01:12:50,370
Now, again, this
may be very obvious.

1391
01:12:50,370 --> 01:12:52,280
She's American.

1392
01:12:52,280 --> 01:12:54,560
Of course not all
Americans write in English.

1393
01:12:54,560 --> 01:12:56,930
It may be obvious if
you look at her books.

1394
01:12:56,930 --> 01:12:59,060
The important thing
is that now Wikidata

1395
01:12:59,060 --> 01:13:02,090
knows this as a piece of data.

1396
01:13:02,090 --> 01:13:04,610
And, again, think ahead
to queries, which we will

1397
01:13:04,610 --> 01:13:06,980
demonstrate in a little bit.

1398
01:13:06,980 --> 01:13:09,000
Without this piece
of information

1399
01:13:09,000 --> 01:13:14,060
that I just added, if I were to
ask Wikidata five minutes ago,

1400
01:13:14,060 --> 01:13:19,760
give me a list of novelists
writing in English, OK,

1401
01:13:19,760 --> 01:13:22,730
Wikidata would have returned
thousands of results.

1402
01:13:22,730 --> 01:13:27,600
But Helen Dewitt would
not have been among them.

1403
01:13:27,600 --> 01:13:32,000
Because up until two
minutes ago Wikidata

1404
01:13:32,000 --> 01:13:35,640
didn't know that Helen Dewitt
writes in English and not

1405
01:13:35,640 --> 01:13:37,520
in Spanish.

1406
01:13:37,520 --> 01:13:38,730
Do you see?

1407
01:13:38,730 --> 01:13:42,570
It is this explicit
statement that will now

1408
01:13:42,570 --> 01:13:46,560
make her be included in any
future queries that asks,

1409
01:13:46,560 --> 01:13:48,700
who are novelists
writing in English?

1410
01:13:48,700 --> 01:13:53,250


1411
01:13:53,250 --> 01:13:54,500
OK.

1412
01:13:54,500 --> 01:13:58,560
By the way, she's
a PhD in Classics.

1413
01:13:58,560 --> 01:14:05,590
She speaks-- or at least reads
and writes Latin and Greek,

1414
01:14:05,590 --> 01:14:07,270
ancient Greek, and I could--

1415
01:14:07,270 --> 01:14:09,610
I can-- I mean, I
happen to know that.

1416
01:14:09,610 --> 01:14:12,420
But wait, wait, wait,
wait, wait, you say.

1417
01:14:12,420 --> 01:14:14,130
What about original research?

1418
01:14:14,130 --> 01:14:18,890
I mean, you can't just add
stuff like that to Wikidata.

1419
01:14:18,890 --> 01:14:19,920
Don't you need sources?

1420
01:14:19,920 --> 01:14:22,860
Citations?

1421
01:14:22,860 --> 01:14:23,890
Of course I do.

1422
01:14:23,890 --> 01:14:25,020
Yes.

1423
01:14:25,020 --> 01:14:27,720
Let's add some sources to this.

1424
01:14:27,720 --> 01:14:31,410
So on Wikidata,
just like Wikipedia,

1425
01:14:31,410 --> 01:14:34,980
things should generally
be supported by citations,

1426
01:14:34,980 --> 01:14:36,990
by references.

1427
01:14:36,990 --> 01:14:43,290
And just like Wikipedia,
they aren't always supported

1428
01:14:43,290 --> 01:14:44,650
in that way.

1429
01:14:44,650 --> 01:14:48,870
OK so, I mean, I can
just add it to Wikidata.

1430
01:14:48,870 --> 01:14:49,442
Watch me.

1431
01:14:49,442 --> 01:14:50,400
I just did that, right?

1432
01:14:50,400 --> 01:14:54,450
I just added English and
Latin without any citation,

1433
01:14:54,450 --> 01:14:56,850
and I will not be
arrested for it.

1434
01:14:56,850 --> 01:14:59,520
Just like I could edit
a Wikipedia article

1435
01:14:59,520 --> 01:15:02,610
and add some information
without a citation.

1436
01:15:02,610 --> 01:15:03,600
It may stick.

1437
01:15:03,600 --> 01:15:06,810
It may stay in the article,
or it may be reverted.

1438
01:15:06,810 --> 01:15:11,010
It depends on the kind of
information I'm adding.

1439
01:15:11,010 --> 01:15:13,740
It depends how many people
are paying attention

1440
01:15:13,740 --> 01:15:15,060
to the article on Wikipedia.

1441
01:15:15,060 --> 01:15:18,420
And it works the
same way on Wikidata.

1442
01:15:18,420 --> 01:15:21,780
OK, so, you can add some
things without references.

1443
01:15:21,780 --> 01:15:23,970
Ideally, when you
add, information you

1444
01:15:23,970 --> 01:15:25,570
should include references.

1445
01:15:25,570 --> 01:15:30,990
So let's be good Wikidata
citizens and add a source.

1446
01:15:30,990 --> 01:15:34,395
Here is an article that
I prepared in advance.

1447
01:15:34,395 --> 01:15:38,100


1448
01:15:38,100 --> 01:15:39,370
This is Helen Dewitt.

1449
01:15:39,370 --> 01:15:44,450
And in this article,
somewhere, it actually

1450
01:15:44,450 --> 01:15:51,770
says right at the
bottom here, see,

1451
01:15:51,770 --> 01:15:54,990
Dewitt knows, in descending
order of proficiency, Latin,

1452
01:15:54,990 --> 01:15:57,010
ancient Greek, French,
German, Spanish,

1453
01:15:57,010 --> 01:15:59,460
and Portuguese, Dutch, Danish,
Norwegian, Swedish, Arabic,

1454
01:15:59,460 --> 01:16:01,680
Hebrew and Japanese.

1455
01:16:01,680 --> 01:16:04,770
This may sound
excessive, but it's true.

1456
01:16:04,770 --> 01:16:06,330
I met this woman.

1457
01:16:06,330 --> 01:16:09,670
So anyway, we don't have
to include all of that.

1458
01:16:09,670 --> 01:16:13,050
The point is this article from
a reasonably reliable source,

1459
01:16:13,050 --> 01:16:15,840
this magazine,
this interview, can

1460
01:16:15,840 --> 01:16:19,270
count as a source for
the languages she speaks.

1461
01:16:19,270 --> 01:16:20,700
So I copy the URL.

1462
01:16:20,700 --> 01:16:23,130
I just copied off my browser.

1463
01:16:23,130 --> 01:16:27,530
And, whoops-- that's not--

1464
01:16:27,530 --> 01:16:28,580
here we go.

1465
01:16:28,580 --> 01:16:31,610
And I can just add
a reference here

1466
01:16:31,610 --> 01:16:34,670
to the information that I
just added to Wikidata, right?

1467
01:16:34,670 --> 01:16:38,300
I can click Add Reference.

1468
01:16:38,300 --> 01:16:45,800
And then just say the reference
URL is, and I just paste.

1469
01:16:45,800 --> 01:16:48,840
I paste this URL.

1470
01:16:48,840 --> 01:16:50,160
Hit Enter.

1471
01:16:50,160 --> 01:16:51,060
And that's it.

1472
01:16:51,060 --> 01:16:55,380
And now the fact that she
speaks Latin has a reference.

1473
01:16:55,380 --> 01:16:58,320
If you look at the other
things here on Wikidata,

1474
01:16:58,320 --> 01:17:02,660
you can see that these IDs, for
example, have references, too.

1475
01:17:02,660 --> 01:17:03,420
Right?

1476
01:17:03,420 --> 01:17:06,570
In this case, the reference
just says, excuse me--

1477
01:17:06,570 --> 01:17:14,760


1478
01:17:14,760 --> 01:17:18,600
In this case it just as
imported from English Wikipedia.

1479
01:17:18,600 --> 01:17:24,970
But wait, you say, can
Wikipedia be a source?

1480
01:17:24,970 --> 01:17:26,620
Not properly, no.

1481
01:17:26,620 --> 01:17:30,100
I mean, just like Wikipedia
itself doesn't cite itself.

1482
01:17:30,100 --> 01:17:33,790
We don't say, this person
was born in this city

1483
01:17:33,790 --> 01:17:34,870
how do we know?

1484
01:17:34,870 --> 01:17:37,210
We read it on Wikipedia
in another language.

1485
01:17:37,210 --> 01:17:39,610
That's not a good citation.

1486
01:17:39,610 --> 01:17:41,400
It's not a good
citation for Wikidata

1487
01:17:41,400 --> 01:17:45,040
either so why do we put it here?

1488
01:17:45,040 --> 01:17:49,240
Well you can see the qualifier
here is different, right?

1489
01:17:49,240 --> 01:17:53,535
It's not reference URL, which
is what I put in for Latin here.

1490
01:17:53,535 --> 01:18:17,020


1491
01:18:17,020 --> 01:18:20,320
It's not reference URL here,
it's a different qualifier.

1492
01:18:20,320 --> 01:18:23,020
It says-- saying, imported from.

1493
01:18:23,020 --> 01:18:25,960
So this is not an
actual reference that

1494
01:18:25,960 --> 01:18:27,610
supports this piece of data.

1495
01:18:27,610 --> 01:18:30,730
It just shows where did
this data come from.

1496
01:18:30,730 --> 01:18:33,670
It's a slightly different
thing, because this data was

1497
01:18:33,670 --> 01:18:37,210
mass imported into Wikidata.

1498
01:18:37,210 --> 01:18:40,960
So it wasn't input by
hand by some volunteer.

1499
01:18:40,960 --> 01:18:44,770
It was imported into Wikidata
en masse by a script,

1500
01:18:44,770 --> 01:18:46,180
by a program.

1501
01:18:46,180 --> 01:18:49,820
And we want to know, where
did this number come from?

1502
01:18:49,820 --> 01:18:51,440
Well it came from
English Wikipedia.

1503
01:18:51,440 --> 01:18:54,130
So again, that's not
a proper reference

1504
01:18:54,130 --> 01:18:56,200
for the validity
of the information,

1505
01:18:56,200 --> 01:18:59,200
but it does at least tell us
it came from English Wikipedia.

1506
01:18:59,200 --> 01:19:03,460
We can click and look on
English Wikipedia and find out.

1507
01:19:03,460 --> 01:19:05,230
Maybe there's a
footnote there that

1508
01:19:05,230 --> 01:19:08,970
says where it did come from.

1509
01:19:08,970 --> 01:19:11,000
OK.

1510
01:19:11,000 --> 01:19:15,320
So this was an example of
teaching Wikidata something

1511
01:19:15,320 --> 01:19:16,910
that it didn't know.

1512
01:19:16,910 --> 01:19:18,512
Something about the languages.

1513
01:19:18,512 --> 01:19:20,720
And of course I could add
this reference for English.

1514
01:19:20,720 --> 01:19:23,210
I could add all the other
languages that she speaks.

1515
01:19:23,210 --> 01:19:26,060
And I won't bore you with
that, but that is basically

1516
01:19:26,060 --> 01:19:27,050
how it's done.

1517
01:19:27,050 --> 01:19:29,720
So you click this Add to
add a completely new--

1518
01:19:29,720 --> 01:19:32,650


1519
01:19:32,650 --> 01:19:34,030
completely new statement.

1520
01:19:34,030 --> 01:19:36,250
Now, by the way, the fact
that these are the only two

1521
01:19:36,250 --> 01:19:39,220
suggestions that
Wikidata can think of,

1522
01:19:39,220 --> 01:19:42,100
doesn't mean these
are the only options.

1523
01:19:42,100 --> 01:19:46,750
OK, you can just type
anything that may be relevant.

1524
01:19:46,750 --> 01:19:50,950
We could add, for
example, award.

1525
01:19:50,950 --> 01:19:52,570
Just start typing award.

1526
01:19:52,570 --> 01:19:54,910
And here I have I have
a bunch of properties

1527
01:19:54,910 --> 01:19:56,510
that are relevant for awards.

1528
01:19:56,510 --> 01:20:00,100
Awards received, together
with, conferred by, right?

1529
01:20:00,100 --> 01:20:05,790
There's all kinds of properties
that I could rely on.

1530
01:20:05,790 --> 01:20:09,600
And of course there is a list of
all the properties of Wikidata.

1531
01:20:09,600 --> 01:20:11,580
And that list is
also sorted by type.

1532
01:20:11,580 --> 01:20:15,480
So yes, there is a list of
properties relevant to people

1533
01:20:15,480 --> 01:20:17,130
so that you don't have to guess.

1534
01:20:17,130 --> 01:20:18,660
But a surprising
amount of the time

1535
01:20:18,660 --> 01:20:22,760
you can just start typing
and get the right properties

1536
01:20:22,760 --> 01:20:25,340
suggested to you.

1537
01:20:25,340 --> 01:20:27,230
OK.

1538
01:20:27,230 --> 01:20:33,050
So we taught Wikidata
something new,

1539
01:20:33,050 --> 01:20:38,980
and now let's teach Wikidata
something completely new.

1540
01:20:38,980 --> 01:20:39,480
Right?

1541
01:20:39,480 --> 01:20:42,480
So how do we create
a new Wikidata item?

1542
01:20:42,480 --> 01:20:46,880
So, like I said, if I
created a Wikipedia article

1543
01:20:46,880 --> 01:20:49,520
about something that was
not previously covered

1544
01:20:49,520 --> 01:20:53,540
on any other
Wikipedia, chances are

1545
01:20:53,540 --> 01:20:57,170
there would not be an already
existing Wikidata item.

1546
01:20:57,170 --> 01:21:03,190
Sometimes there might
be, because Wikidata

1547
01:21:03,190 --> 01:21:06,857
does have 25 million entities.

1548
01:21:06,857 --> 01:21:08,190
But sometimes there wouldn't be.

1549
01:21:08,190 --> 01:21:10,148
So, first of all, I could
search for it, right?

1550
01:21:10,148 --> 01:21:14,210
So I could go to Wikidata
to the search box

1551
01:21:14,210 --> 01:21:17,390
here and just start typing, and
search for what I want, right?

1552
01:21:17,390 --> 01:21:20,690
So if I'm searching for Helen
Dewitt I just say Helen,

1553
01:21:20,690 --> 01:21:25,590
and I can see whether
or not it exists.

1554
01:21:25,590 --> 01:21:29,240
And there's a detailed search
results page, et cetera,

1555
01:21:29,240 --> 01:21:33,074
where I can where I can find out
if the item does exist or not.

1556
01:21:33,074 --> 01:21:35,240
Excuse me, this reminds me
of a very important thing

1557
01:21:35,240 --> 01:21:36,620
I wanted to
demonstrate, and that

1558
01:21:36,620 --> 01:21:42,710
is the multilingualism
of Wikidata.

1559
01:21:42,710 --> 01:21:49,340
So remember all these
labels in other languages.

1560
01:21:49,340 --> 01:21:54,390
Wikidata knows what to call
Helen Dewitt in Hebrew.

1561
01:21:54,390 --> 01:22:00,800
And it will show it to Wikidata
users whose language is Hebrew.

1562
01:22:00,800 --> 01:22:04,220
Mine is set to
English, for your sake.

1563
01:22:04,220 --> 01:22:08,830
But if I change this I go to
Preferences here and change

1564
01:22:08,830 --> 01:22:09,740
my language.

1565
01:22:09,740 --> 01:22:15,475
[INAUDIBLE] All
right, and I hit Save.

1566
01:22:15,475 --> 01:22:20,350
Wikidata will start
talking to me in Hebrew.

1567
01:22:20,350 --> 01:22:23,090
Now brace yourselves.

1568
01:22:23,090 --> 01:22:24,620
Are you ready?

1569
01:22:24,620 --> 01:22:28,430
Don't panic, it's right to left.

1570
01:22:28,430 --> 01:22:32,630
Oh my god everything
is topsy-turvy.

1571
01:22:32,630 --> 01:22:36,590
So this is the same
article in Hebrew.

1572
01:22:36,590 --> 01:22:39,290
So the sidebar has
switched direction,

1573
01:22:39,290 --> 01:22:41,300
and I know most of
you cannot read it.

1574
01:22:41,300 --> 01:22:42,480
Bear with me.

1575
01:22:42,480 --> 01:22:44,750
This is the label
that we previously

1576
01:22:44,750 --> 01:22:46,840
saw in the label box.

1577
01:22:46,840 --> 01:22:49,580
This is how you spell
Helen Dewitt in Hebrew.

1578
01:22:49,580 --> 01:22:52,550
And here is the
description in Hebrew.

1579
01:22:52,550 --> 01:22:54,980
It's not the description in
English, this description,

1580
01:22:54,980 --> 01:22:57,380
American writer, which
I was shown previously.

1581
01:22:57,380 --> 01:23:00,740
Now I'm shown the Hebrew
description, appropriately.

1582
01:23:00,740 --> 01:23:03,500
But more interestingly,
oh my god!

1583
01:23:03,500 --> 01:23:07,640
All these statements
are suddenly in Hebrew.

1584
01:23:07,640 --> 01:23:08,940
How did that happen?

1585
01:23:08,940 --> 01:23:11,570


1586
01:23:11,570 --> 01:23:15,560
Well this tiny word here
is the very concise way

1587
01:23:15,560 --> 01:23:22,450
to say in Hebrew, instance of,
and this word here means human.

1588
01:23:22,450 --> 01:23:25,960
So these are links to
the same things, right?

1589
01:23:25,960 --> 01:23:28,100
It still links to Q5.

1590
01:23:28,100 --> 01:23:31,780
Q5 is the Wikidata
entity for human.

1591
01:23:31,780 --> 01:23:33,370
These are still the same things.

1592
01:23:33,370 --> 01:23:37,600
But because Wikidata has
multiple labels for everything,

1593
01:23:37,600 --> 01:23:39,580
it has multiple
labels for items.

1594
01:23:39,580 --> 01:23:42,760
And it also has multiple
labels for property names.

1595
01:23:42,760 --> 01:23:46,450
So Wikidata knows how
to say, instance of,

1596
01:23:46,450 --> 01:23:50,140
and award received,
in other languages.

1597
01:23:50,140 --> 01:23:54,490
That is why it is able to show
me all this data in Hebrew

1598
01:23:54,490 --> 01:23:59,890
even if none of that data was
actually input into Wikidata

1599
01:23:59,890 --> 01:24:01,870
by a Hebrew speaker.

1600
01:24:01,870 --> 01:24:04,900
That data could have been
input by English speakers,

1601
01:24:04,900 --> 01:24:08,230
but thanks to the
fact that someone once

1602
01:24:08,230 --> 01:24:12,760
translated the word
photo into Hebrew,

1603
01:24:12,760 --> 01:24:14,830
I can see this field in Hebrew.

1604
01:24:14,830 --> 01:24:17,750


1605
01:24:17,750 --> 01:24:21,230
So one of the things you
can do to help Wikidata,

1606
01:24:21,230 --> 01:24:23,600
right now, without
any special knowledge

1607
01:24:23,600 --> 01:24:26,210
is to help translate
those labels.

1608
01:24:26,210 --> 01:24:29,030
Every label only needs to
be translated just once.

1609
01:24:29,030 --> 01:24:31,310
So you can see that all
of these properties, date

1610
01:24:31,310 --> 01:24:34,720
of birth, name et cetera,
they all have Hebrew labels.

1611
01:24:34,720 --> 01:24:36,760
Maybe one of these would not.

1612
01:24:36,760 --> 01:24:38,361
No, they all have Hebrew labels.

1613
01:24:38,361 --> 01:24:39,110
Doing pretty good.

1614
01:24:39,110 --> 01:24:42,960


1615
01:24:42,960 --> 01:24:45,810
And I'm able to search
in my own language.

1616
01:24:45,810 --> 01:24:48,210
I'm able to click Add.

1617
01:24:48,210 --> 01:24:49,890
This word is Add,
so I click this,

1618
01:24:49,890 --> 01:24:51,780
and now I have the Add screen.

1619
01:24:51,780 --> 01:24:55,860
It all speaks my language,
and it's awesome.

1620
01:24:55,860 --> 01:25:00,330
And now for your sake I
will switch back to English,

1621
01:25:00,330 --> 01:25:03,090
but it is important
to know you can

1622
01:25:03,090 --> 01:25:05,740
edit Wikidata in any language.

1623
01:25:05,740 --> 01:25:09,050
And it is far more multi-lingual
and multi-lingual friendly

1624
01:25:09,050 --> 01:25:13,260
than, for example commons, which
is also a project we all share.

1625
01:25:13,260 --> 01:25:17,730
But commons has some limitations
on how multi-lingual it is.

1626
01:25:17,730 --> 01:25:21,410
For example, the category
names, et cetera.

1627
01:25:21,410 --> 01:25:23,270
OK.

1628
01:25:23,270 --> 01:25:25,670
So we were beginning
to discuss creating

1629
01:25:25,670 --> 01:25:27,140
something completely new.

1630
01:25:27,140 --> 01:25:29,360
AUDIENCE: Quick
questions, if that's OK?

1631
01:25:29,360 --> 01:25:30,980
So there's two questions on IRC.

1632
01:25:30,980 --> 01:25:33,890
The first one is, can you
show search for something

1633
01:25:33,890 --> 01:25:35,420
like getting the list of things?

1634
01:25:35,420 --> 01:25:38,360
I want to learn how to search
for something properly like,

1635
01:25:38,360 --> 01:25:43,705
show me all the items with
this value of this property.

1636
01:25:43,705 --> 01:25:45,080
ASAF BARTOV: Yes.

1637
01:25:45,080 --> 01:25:47,540
That is part of
this talk, but I'll

1638
01:25:47,540 --> 01:25:49,250
get to that in a
little bit later.

1639
01:25:49,250 --> 01:25:52,010
There's a whole section where I
will demonstrate the very, very

1640
01:25:52,010 --> 01:25:55,190
powerful query
system of Wikidata

1641
01:25:55,190 --> 01:25:57,170
where I will cash
that check that I gave

1642
01:25:57,170 --> 01:25:59,090
at the beginning of
all these painters

1643
01:25:59,090 --> 01:26:01,029
who are sons of painters
queries et cetera

1644
01:26:01,029 --> 01:26:02,570
So I will demonstrate
how to do that.

1645
01:26:02,570 --> 01:26:04,190
AUDIENCE: Other question.

1646
01:26:04,190 --> 01:26:07,250
How does Wikidata data deal
with link rot, and other issues

1647
01:26:07,250 --> 01:26:09,680
streaming from their URL refs.

1648
01:26:09,680 --> 01:26:13,528


1649
01:26:13,528 --> 01:26:16,290
ASAF BARTOV: URLs break.

1650
01:26:16,290 --> 01:26:18,730
We call that link rot.

1651
01:26:18,730 --> 01:26:22,470
Wikidata doesn't have
any particular magic

1652
01:26:22,470 --> 01:26:24,730
around link rot,
just like Wikipedia.

1653
01:26:24,730 --> 01:26:29,100
So if you do use a bare
URL it may well rot.

1654
01:26:29,100 --> 01:26:34,230
But you can add qualifiers
with back up URLs else

1655
01:26:34,230 --> 01:26:37,680
on the Internet Archive, or
another mirroring service.

1656
01:26:37,680 --> 01:26:42,780
And potentially that could be
a software feature for Wikidata

1657
01:26:42,780 --> 01:26:46,590
to automatically save
or ensure that something

1658
01:26:46,590 --> 01:26:48,660
is saved on Internet
Archive, but I don't

1659
01:26:48,660 --> 01:26:50,670
know that it is doing so now.

1660
01:26:50,670 --> 01:26:56,040
So, just like Wikipedia, if
it is a bear URL it may rot.

1661
01:26:56,040 --> 01:27:00,240
And may need to be
replaced, possibly by bot.

1662
01:27:00,240 --> 01:27:01,390
Other questions?

1663
01:27:01,390 --> 01:27:09,840


1664
01:27:09,840 --> 01:27:12,650
All right, so let's
talk about how you

1665
01:27:12,650 --> 01:27:15,090
create a completely new item.

1666
01:27:15,090 --> 01:27:16,300
It's very simple.

1667
01:27:16,300 --> 01:27:21,810
You go to Wikidata and you
click here on the side.

1668
01:27:21,810 --> 01:27:30,180
There's a link, create new item,
which gives you this screen.

1669
01:27:30,180 --> 01:27:35,030
And let's create an
item about a book

1670
01:27:35,030 --> 01:27:39,500
that I'm reading right now
by this Bulgarian writer.

1671
01:27:39,500 --> 01:27:43,950
So we have an article about this
writer guy named Deyan Enev.

1672
01:27:43,950 --> 01:27:48,530
But we don't have an
article or a Wikidata item

1673
01:27:48,530 --> 01:28:07,980
about one of his famous
books called Circus Bulgaria.

1674
01:28:07,980 --> 01:28:10,050
That's the book I'm reading,
his first collection

1675
01:28:10,050 --> 01:28:11,216
of short stories in English.

1676
01:28:11,216 --> 01:28:14,280
Circus Bulgaria came out
in 2010, Portobello Books,

1677
01:28:14,280 --> 01:28:17,099
translated by Kapka Kassabova.

1678
01:28:17,099 --> 01:28:18,390
So that's the book I'm reading.

1679
01:28:18,390 --> 01:28:20,520
As you can see it's not
a link on Wikipedia.

1680
01:28:20,520 --> 01:28:23,370
There's no article about
it, and there's not even

1681
01:28:23,370 --> 01:28:26,310
a Wikidata entity item about it.

1682
01:28:26,310 --> 01:28:32,220
But we can totally create
it, even without a Wikipedia

1683
01:28:32,220 --> 01:28:33,090
article.

1684
01:28:33,090 --> 01:28:34,980
So let's create this new item.

1685
01:28:34,980 --> 01:28:37,260
Let's create it in
English for the purposes

1686
01:28:37,260 --> 01:28:38,880
of our demonstration.

1687
01:28:38,880 --> 01:28:44,910
The name of the item
is Circus Bulgaria.

1688
01:28:44,910 --> 01:28:47,520
Circus Bulgaria,
that's the name.

1689
01:28:47,520 --> 01:28:50,670
Not Circus Bulgaria
parentheses book,

1690
01:28:50,670 --> 01:28:53,520
or anything you may be
used to from Wikipedia.

1691
01:28:53,520 --> 01:28:56,520
It's the actual
name of the book,

1692
01:28:56,520 --> 01:29:00,450
and the description,
again, remember,

1693
01:29:00,450 --> 01:29:03,270
the description field
is just to kind of help

1694
01:29:03,270 --> 01:29:08,681
tell apart this Circus Bulgaria
from any other potential Circus

1695
01:29:08,681 --> 01:29:09,180
Bulgaria.

1696
01:29:09,180 --> 01:29:11,280
Maybe there's a
film or something.

1697
01:29:11,280 --> 01:29:20,480
So it's enough to just say
something like short story

1698
01:29:20,480 --> 01:29:23,270
collection.

1699
01:29:23,270 --> 01:29:27,830
I might add by Deyan Enev
and if just in case, again,

1700
01:29:27,830 --> 01:29:31,910
some future other short story
collection by some other author

1701
01:29:31,910 --> 01:29:33,560
happens to have that same name.

1702
01:29:33,560 --> 01:29:36,391
That should be
disambiguating enough.

1703
01:29:36,391 --> 01:29:36,890
OK.

1704
01:29:36,890 --> 01:29:39,770
Short story collection
by Deyan Enev.

1705
01:29:39,770 --> 01:29:42,050
I could have aliases for this.

1706
01:29:42,050 --> 01:29:47,240
The aliases assist find-ability.

1707
01:29:47,240 --> 01:29:51,020
This particular book has just
this one name, so that's fine.

1708
01:29:51,020 --> 01:29:52,260
And I click Create.

1709
01:29:52,260 --> 01:29:52,760
That's it.

1710
01:29:52,760 --> 01:29:55,990
I just start with a
label, and a description.

1711
01:29:55,990 --> 01:29:58,740
I click Create.

1712
01:29:58,740 --> 01:30:03,890
I have a brand new queue number
for my new Wikidata item.

1713
01:30:03,890 --> 01:30:05,960
And Wikidata knows
what to call it.

1714
01:30:05,960 --> 01:30:09,320
And a description in
one language at least.

1715
01:30:09,320 --> 01:30:11,930
And that's it, and I
can start populating it.

1716
01:30:11,930 --> 01:30:15,050
As it can see, it it
has no site links,

1717
01:30:15,050 --> 01:30:17,450
but it's ready to be taught.

1718
01:30:17,450 --> 01:30:20,450
So, for example, I
can start by teaching

1719
01:30:20,450 --> 01:30:24,610
it the name of the book
in another language

1720
01:30:24,610 --> 01:30:25,870
that I happened to speak.

1721
01:30:25,870 --> 01:30:29,050


1722
01:30:29,050 --> 01:30:31,720
Now it has two labels
in English and Hebrew.

1723
01:30:31,720 --> 01:30:36,880
I could also look
up the book Areon,

1724
01:30:36,880 --> 01:30:39,510
the original Bulgarian
label for this book.

1725
01:30:39,510 --> 01:30:41,550
Seems relevant.

1726
01:30:41,550 --> 01:30:43,320
Again, I do not speak Bulgarian.

1727
01:30:43,320 --> 01:30:49,860
But I can go to the Bulgarian
Wikipedia through into Wiki.

1728
01:30:49,860 --> 01:30:51,510
This is this gentleman.

1729
01:30:51,510 --> 01:30:54,510
And I could find--

1730
01:30:54,510 --> 01:30:59,190
I can read Cyrillic so
I could easily find--

1731
01:30:59,190 --> 01:31:00,030
when I say easily--

1732
01:31:00,030 --> 01:31:02,940


1733
01:31:02,940 --> 01:31:05,710
when I say easily--

1734
01:31:05,710 --> 01:31:12,731
maybe not so easy, but
I can search for it.

1735
01:31:12,731 --> 01:31:21,070


1736
01:31:21,070 --> 01:31:22,180
Here we go.

1737
01:31:22,180 --> 01:31:25,190
Tsirk Bulgaria.

1738
01:31:25,190 --> 01:31:27,510
That is the name of the book.

1739
01:31:27,510 --> 01:31:28,910
Tsirk, as in circus.

1740
01:31:28,910 --> 01:31:30,440
No problem.

1741
01:31:30,440 --> 01:31:32,725
So I just copy this right here.

1742
01:31:32,725 --> 01:31:35,240


1743
01:31:35,240 --> 01:31:38,090
And I go back to my new item.

1744
01:31:38,090 --> 01:31:45,725
My new item, which is here,
and I edit the Bulgarian field.

1745
01:31:45,725 --> 01:31:48,260


1746
01:31:48,260 --> 01:31:49,950
And here it is.

1747
01:31:49,950 --> 01:31:50,720
Awesome.

1748
01:31:50,720 --> 01:31:51,220
All right.

1749
01:31:51,220 --> 01:31:55,420
But I still haven't told
Wikidata anything about this.

1750
01:31:55,420 --> 01:31:56,920
I know I'm talking about a book.

1751
01:31:56,920 --> 01:31:59,110
Wikidata that doesn't
know that yet.

1752
01:31:59,110 --> 01:32:02,630
So let's start by
adding some statements.

1753
01:32:02,630 --> 01:32:05,390
First of all, I click Add.

1754
01:32:05,390 --> 01:32:07,190
Wikidata sensibly
says, how about we

1755
01:32:07,190 --> 01:32:08,630
start with instance of.

1756
01:32:08,630 --> 01:32:11,090
Tell me what kind of animal--
no, not kind of animal.

1757
01:32:11,090 --> 01:32:13,940
What kind of thing are you
trying to describe here?

1758
01:32:13,940 --> 01:32:18,130
Well it's an instance of a book.

1759
01:32:18,130 --> 01:32:20,930
Not in Hebrew, please.

1760
01:32:20,930 --> 01:32:22,180
So it's an instance of a book.

1761
01:32:22,180 --> 01:32:23,763
I could even be a
little more specific

1762
01:32:23,763 --> 01:32:31,920
and say it's an instance of
a short story collection.

1763
01:32:31,920 --> 01:32:34,620
There we go, short
story collection.

1764
01:32:34,620 --> 01:32:36,800
I hit Save.

1765
01:32:36,800 --> 01:32:37,430
Awesome.

1766
01:32:37,430 --> 01:32:39,680
So now we know what
kind of thing it is.

1767
01:32:39,680 --> 01:32:42,860
It's not a human, it's not a
mountain, it's not a concept.

1768
01:32:42,860 --> 01:32:44,760
It's a short story collection.

1769
01:32:44,760 --> 01:32:46,400
Now I can add some other things.

1770
01:32:46,400 --> 01:32:48,770
See, Wikidata is
already working for me.

1771
01:32:48,770 --> 01:32:51,020
Because it's a short
story collection

1772
01:32:51,020 --> 01:32:53,960
it's offering me to populate
these properties, and not

1773
01:32:53,960 --> 01:32:54,890
other ones.

1774
01:32:54,890 --> 01:32:56,990
Publication date,
original language,

1775
01:32:56,990 --> 01:33:00,350
genre, country of origin,
these are all relevant, right?

1776
01:33:00,350 --> 01:33:04,220
So let's start with original
language of the work

1777
01:33:04,220 --> 01:33:07,410
is Bulgarian.

1778
01:33:07,410 --> 01:33:09,810
Not Bulgaria, Bulgarian.

1779
01:33:09,810 --> 01:33:12,040
This is the item I want to link.

1780
01:33:12,040 --> 01:33:21,570
Hit Save, and whatever.

1781
01:33:21,570 --> 01:33:22,890
Author.

1782
01:33:22,890 --> 01:33:26,540
Let's identify the author.

1783
01:33:26,540 --> 01:33:29,350
So the author, the main
creator of the work,

1784
01:33:29,350 --> 01:33:32,470
is that gentleman Deyan Enev.

1785
01:33:32,470 --> 01:33:34,750
And remember, he has
a Wikipedia article.

1786
01:33:34,750 --> 01:33:37,210
He also has a Wikidata entity.

1787
01:33:37,210 --> 01:33:39,640
So Wikidata does know about him.

1788
01:33:39,640 --> 01:33:48,930
So I hit Save, and I can add
something about the translator.

1789
01:33:48,930 --> 01:33:52,530


1790
01:33:52,530 --> 01:33:54,390
And what was that lady's name?

1791
01:33:54,390 --> 01:33:57,990


1792
01:33:57,990 --> 01:34:00,120
Kapka Kassabova.

1793
01:34:00,120 --> 01:34:05,430
Now it so happens that Wikidata
already knows about this lady.

1794
01:34:05,430 --> 01:34:08,330


1795
01:34:08,330 --> 01:34:08,840
See?

1796
01:34:08,840 --> 01:34:12,290
So I can just start typing
and then just link to it.

1797
01:34:12,290 --> 01:34:12,840
Awesome.

1798
01:34:12,840 --> 01:34:13,824
But what if it didn't?

1799
01:34:13,824 --> 01:34:15,740
What if it was translated
by someone who isn't

1800
01:34:15,740 --> 01:34:17,690
already covered on Wikidata?

1801
01:34:17,690 --> 01:34:22,190
Well I could just type
the name as a string,

1802
01:34:22,190 --> 01:34:25,760
but ideally I could
create a Wikidata entity

1803
01:34:25,760 --> 01:34:28,940
about this translator so
that there is a possibility

1804
01:34:28,940 --> 01:34:30,350
to link to her.

1805
01:34:30,350 --> 01:34:33,560


1806
01:34:33,560 --> 01:34:36,920
Now I might actually
add a qualifier here

1807
01:34:36,920 --> 01:34:40,310
because, she's not the
translator of the book, right?

1808
01:34:40,310 --> 01:34:43,620
She's the translator of
the book into English.

1809
01:34:43,620 --> 01:34:44,440
Right.

1810
01:34:44,440 --> 01:34:50,151
So the language that she
translated into is English.

1811
01:34:50,151 --> 01:34:50,650
Right?

1812
01:34:50,650 --> 01:34:53,620
This book-- remember
I'm describing the book.

1813
01:34:53,620 --> 01:34:55,376
The item is about the book.

1814
01:34:55,376 --> 01:34:57,250
So the book would have
a different translator

1815
01:34:57,250 --> 01:34:58,510
into Polish.

1816
01:34:58,510 --> 01:35:02,320
So this is an example of
a property or a statement

1817
01:35:02,320 --> 01:35:06,430
that doesn't make sense without
one of those qualifiers.

1818
01:35:06,430 --> 01:35:08,140
It's just not correct.

1819
01:35:08,140 --> 01:35:11,320
It doesn't make sense to
say that translator is.

1820
01:35:11,320 --> 01:35:14,950
The English translator, or
even this English translator.

1821
01:35:14,950 --> 01:35:17,770
In 50 years maybe there would
be an additional English

1822
01:35:17,770 --> 01:35:18,940
translation.

1823
01:35:18,940 --> 01:35:24,774
So that's an example of
needing that qualifier.

1824
01:35:24,774 --> 01:35:27,190
And of course I could go on
and populate the other fields.

1825
01:35:27,190 --> 01:35:29,710
We don't have to
do that right now.

1826
01:35:29,710 --> 01:35:32,960
Publication date, country
of origin, et cetera.

1827
01:35:32,960 --> 01:35:35,440
So this is already beginning
to look like all those items

1828
01:35:35,440 --> 01:35:38,440
that we already saw, but just
a moment ago it didn't exist.

1829
01:35:38,440 --> 01:35:43,920
Just a moment ago Wikidata
had no concept of this work.

1830
01:35:43,920 --> 01:35:46,500
This happens to be one
of his notable works.

1831
01:35:46,500 --> 01:35:52,080
So I could actually go to the
item about Deyan Enev which

1832
01:35:52,080 --> 01:35:56,190
has all this information
already, occupation, languages,

1833
01:35:56,190 --> 01:35:59,170
and add a property.

1834
01:35:59,170 --> 01:36:01,050
Remember, I'm not
limited to these.

1835
01:36:01,050 --> 01:36:06,180
I can add a property
called notable works,

1836
01:36:06,180 --> 01:36:08,670
and mention my new item.

1837
01:36:08,670 --> 01:36:12,120
Circus Bulgaria.

1838
01:36:12,120 --> 01:36:12,750
See?

1839
01:36:12,750 --> 01:36:15,180
My new item is
showing up, and thanks

1840
01:36:15,180 --> 01:36:18,660
to this description that I
wrote, short story collection,

1841
01:36:18,660 --> 01:36:22,650
it's already appearing here in
the dropdown very conveniently.

1842
01:36:22,650 --> 01:36:24,270
So I linked to this.

1843
01:36:24,270 --> 01:36:25,154
I hit Save.

1844
01:36:25,154 --> 01:36:28,680


1845
01:36:28,680 --> 01:36:32,310
Ideally again I should find
some references showing

1846
01:36:32,310 --> 01:36:34,620
that this is a
notable work by him,

1847
01:36:34,620 --> 01:36:37,000
but we won't spend
time on that right now.

1848
01:36:37,000 --> 01:36:39,010
But the point is we
created a new item.

1849
01:36:39,010 --> 01:36:40,410
We populated it a little bit.

1850
01:36:40,410 --> 01:36:44,400
We linked to it so that it's
more discoverable by mentioning

1851
01:36:44,400 --> 01:36:47,760
it in the author name, and
of course the book item

1852
01:36:47,760 --> 01:36:50,710
itself mentions the author
and links to the author.

1853
01:36:50,710 --> 01:36:52,770
So that's all good.

1854
01:36:52,770 --> 01:36:57,780
One last thing we shall do is
give it some useful identifier

1855
01:36:57,780 --> 01:37:02,880
so let's add, say, the
Library of Congress record

1856
01:37:02,880 --> 01:37:03,940
for this book.

1857
01:37:03,940 --> 01:37:04,440
OK.

1858
01:37:04,440 --> 01:37:07,710
So I have prepared
this in advance.

1859
01:37:07,710 --> 01:37:08,760
Ooh.

1860
01:37:08,760 --> 01:37:12,720
Just in time, with 80 seconds to
go before it's giving up on me.

1861
01:37:12,720 --> 01:37:14,310
Oh it has already
given up on me.

1862
01:37:14,310 --> 01:37:15,490
That is very unfortunate.

1863
01:37:15,490 --> 01:37:23,300


1864
01:37:23,300 --> 01:37:29,110
So I go to the Library of
Congress and I find this book.

1865
01:37:29,110 --> 01:37:33,050
I find this entry, right?

1866
01:37:33,050 --> 01:37:37,320
In the Library of Congress
database about this book.

1867
01:37:37,320 --> 01:37:39,120
And it has a permalink.

1868
01:37:39,120 --> 01:37:42,570
It has a kind of guaranteed
to be permanent link.

1869
01:37:42,570 --> 01:37:47,950
I can just copy that link,
go back to my little book,

1870
01:37:47,950 --> 01:37:55,770
and say the Library of Congress.

1871
01:37:55,770 --> 01:38:01,070
Yeah, LCCN, that's what they
call their IDs, the call

1872
01:38:01,070 --> 01:38:02,120
number.

1873
01:38:02,120 --> 01:38:06,502
And I paste it here.

1874
01:38:06,502 --> 01:38:08,210
I actually don't need the URL.

1875
01:38:08,210 --> 01:38:09,136
I need just a number.

1876
01:38:09,136 --> 01:38:12,440


1877
01:38:12,440 --> 01:38:13,520
And there we go.

1878
01:38:13,520 --> 01:38:16,550
I have added it,
and now Wikidata

1879
01:38:16,550 --> 01:38:20,630
knows how to find bibliographic
information about this book.

1880
01:38:20,630 --> 01:38:24,710
And any re-user of
Wikidata, some program,

1881
01:38:24,710 --> 01:38:28,950
some tool that connects
books to authors

1882
01:38:28,950 --> 01:38:32,870
or does statistical analysis or
whatever, some future yet to be

1883
01:38:32,870 --> 01:38:35,090
imagined tool
could automatically

1884
01:38:35,090 --> 01:38:39,170
find additional metadata on the
Library of Congress site thanks

1885
01:38:39,170 --> 01:38:41,840
to this connection
that I just made.

1886
01:38:41,840 --> 01:38:44,150
And of course I could
add many other IDs

1887
01:38:44,150 --> 01:38:46,460
to other catalogs
around the world,

1888
01:38:46,460 --> 01:38:48,150
and we won't do that right now.

1889
01:38:48,150 --> 01:38:51,840
You can see that it's now
showing up under identifiers.

1890
01:38:51,840 --> 01:38:56,330
So this is how we created
a brand new piece of data.

1891
01:38:56,330 --> 01:38:59,632
Questions about this,
about creating new items?

1892
01:38:59,632 --> 01:39:18,100


1893
01:39:18,100 --> 01:39:19,180
Yeah, all right.

1894
01:39:19,180 --> 01:39:25,510
So we've seen how to contribute
to Wikidata on our own,

1895
01:39:25,510 --> 01:39:26,350
kind of through--

1896
01:39:26,350 --> 01:39:27,840
directly through Wikidata.

1897
01:39:27,840 --> 01:39:30,680


1898
01:39:30,680 --> 01:39:35,220
Now you may you may be
thinking, but Asaf, this

1899
01:39:35,220 --> 01:39:39,880
sounds like a ton
of work recording

1900
01:39:39,880 --> 01:39:44,500
all of these little tiny bits of
information about every person

1901
01:39:44,500 --> 01:39:47,410
and every book and every town.

1902
01:39:47,410 --> 01:39:50,520
And if you think that
you would be correct.

1903
01:39:50,520 --> 01:39:52,730
That is a ton of work.

1904
01:39:52,730 --> 01:39:54,600
It's a lot of work.

1905
01:39:54,600 --> 01:39:59,930
However, it is centralized, so
it is reusable on other wikis

1906
01:39:59,930 --> 01:40:03,860
and we will show in just a
moment how we pull information

1907
01:40:03,860 --> 01:40:07,296
from Wikidata into
Wikipedia or other projects.

1908
01:40:07,296 --> 01:40:10,860


1909
01:40:10,860 --> 01:40:13,780
We will show that
in just a moment.

1910
01:40:13,780 --> 01:40:18,660
But here's an
awesome little game

1911
01:40:18,660 --> 01:40:23,205
that we Wikidata
volunteer, Magnis Monska,

1912
01:40:23,205 --> 01:40:30,900
has authored called the
Wikidata game, in which he

1913
01:40:30,900 --> 01:40:31,920
tricks people--

1914
01:40:31,920 --> 01:40:35,730
sorry, helps people
make contributions

1915
01:40:35,730 --> 01:40:41,500
to Wikidata in a very,
very easy and pleasant way.

1916
01:40:41,500 --> 01:40:44,410
Let's look at the Wikidata game.

1917
01:40:44,410 --> 01:40:47,840
So the first thing you need
to do in that Wikidata game

1918
01:40:47,840 --> 01:40:50,660
is to log in,
because the Wikidata

1919
01:40:50,660 --> 01:40:53,150
game makes edits in your name.

1920
01:40:53,150 --> 01:40:54,980
So we need to authorize it.

1921
01:40:54,980 --> 01:40:57,250
It's perfectly safe.

1922
01:40:57,250 --> 01:41:01,090
And after you do that you
can go to the Wikidata game.

1923
01:41:01,090 --> 01:41:02,020
So this is the game.

1924
01:41:02,020 --> 01:41:03,520
Now I'm logged in.

1925
01:41:03,520 --> 01:41:05,230
And the Wikidata game
actually includes

1926
01:41:05,230 --> 01:41:06,970
a number of different games.

1927
01:41:06,970 --> 01:41:09,310
Let's start with a person game.

1928
01:41:09,310 --> 01:41:14,170
So Wikidata shows you--

1929
01:41:14,170 --> 01:41:20,800
shows you an item, and asks
you a very simple question.

1930
01:41:20,800 --> 01:41:23,200
Person, or not a person?

1931
01:41:23,200 --> 01:41:26,410


1932
01:41:26,410 --> 01:41:30,550
So Wikidata goes through
Wikidata entities

1933
01:41:30,550 --> 01:41:35,540
that don't even have the
instance of property.

1934
01:41:35,540 --> 01:41:37,520
Which is why Wikidata
doesn't know,

1935
01:41:37,520 --> 01:41:41,120
literally doesn't know, if this
is a person, or a mountain,

1936
01:41:41,120 --> 01:41:44,390
or a city, or a country,
or anything else.

1937
01:41:44,390 --> 01:41:47,150
So it asks you, because this
is the kind of question that

1938
01:41:47,150 --> 01:41:50,300
Wikidata cannot
decide on its own,

1939
01:41:50,300 --> 01:41:54,800
but for us humans it's generally
trivial to be able to say

1940
01:41:54,800 --> 01:41:58,220
whether something that we're
looking at is a person or not.

1941
01:41:58,220 --> 01:42:03,590
It gets slightly trickier when
the information is in Javanese,

1942
01:42:03,590 --> 01:42:06,470
as it is here,
rather than English.

1943
01:42:06,470 --> 01:42:10,010
So this item happens to
be described in Javanese.

1944
01:42:10,010 --> 01:42:14,360
My Javanese, spoken in
Indonesia, is very weak.

1945
01:42:14,360 --> 01:42:19,620
However, I can tell that
this is not a person.

1946
01:42:19,620 --> 01:42:20,730
How can I tell?

1947
01:42:20,730 --> 01:42:23,220
Without understanding
a word of Japanese

1948
01:42:23,220 --> 01:42:25,950
I see that it mentions
1000 kilometers

1949
01:42:25,950 --> 01:42:28,860
and square kilometers, see?

1950
01:42:28,860 --> 01:42:32,520
So this is about a
place, or an area,

1951
01:42:32,520 --> 01:42:36,090
or a region, or whatever,
but not a person.

1952
01:42:36,090 --> 01:42:39,060
So this is an
example of how even

1953
01:42:39,060 --> 01:42:41,100
without understanding
language you can sometimes

1954
01:42:41,100 --> 01:42:42,400
make a determination.

1955
01:42:42,400 --> 01:42:45,030
However, of course,
you should be sure.

1956
01:42:45,030 --> 01:42:47,700
This is definitely not
what the Wikipedia article

1957
01:42:47,700 --> 01:42:49,150
about a person looks like.

1958
01:42:49,150 --> 01:42:50,430
So this is not a person.

1959
01:42:50,430 --> 01:42:52,780
I just click it and I'm
shown the next item.

1960
01:42:52,780 --> 01:42:56,600


1961
01:42:56,600 --> 01:42:59,660
This item is in another
language I do not speak,

1962
01:42:59,660 --> 01:43:00,950
and I just don't know.

1963
01:43:00,950 --> 01:43:03,740
I do not know if this is
about a person or not.

1964
01:43:03,740 --> 01:43:07,350
So I click Not Sure.

1965
01:43:07,350 --> 01:43:11,190
This is in Swedish, and
it's about Sulawesi, still

1966
01:43:11,190 --> 01:43:13,770
Indonesia.

1967
01:43:13,770 --> 01:43:16,530
And it is not about a person.

1968
01:43:16,530 --> 01:43:18,150
I have enough Swedish for that.

1969
01:43:18,150 --> 01:43:21,750
So I click not a person.

1970
01:43:21,750 --> 01:43:24,420
Now, you may say,
well, do I really

1971
01:43:24,420 --> 01:43:28,350
have to deal with all these
languages that I don't speak?

1972
01:43:28,350 --> 01:43:29,190
The answer is no.

1973
01:43:29,190 --> 01:43:30,630
You don't have to.

1974
01:43:30,630 --> 01:43:32,580
Here at the bottom
of the Wikidata game

1975
01:43:32,580 --> 01:43:33,840
there are settings.

1976
01:43:33,840 --> 01:43:38,270
You can click that
and tell Wikidata,

1977
01:43:38,270 --> 01:43:41,840
I cannot even read
Chinese or Japanese,

1978
01:43:41,840 --> 01:43:44,600
so please don't show me
items in those languages.

1979
01:43:44,600 --> 01:43:47,060
Because I wouldn't
even be able to guess.

1980
01:43:47,060 --> 01:43:50,000
I prefer these languages in
which I can relatively easily

1981
01:43:50,000 --> 01:43:51,380
make determinations.

1982
01:43:51,380 --> 01:43:54,601
And I can even tell Wikidata to
only show me these languages.

1983
01:43:54,601 --> 01:43:55,100
You see?

1984
01:43:55,100 --> 01:43:57,350
This was not selected,
which is why I

1985
01:43:57,350 --> 01:44:00,600
was shown some other languages.

1986
01:44:00,600 --> 01:44:04,240
I could say, only use
these languages, and save.

1987
01:44:04,240 --> 01:44:06,100
And now I can try
this game again.

1988
01:44:06,100 --> 01:44:07,980
However, that can
slow it down a little.

1989
01:44:07,980 --> 01:44:09,000
So here we go.

1990
01:44:09,000 --> 01:44:11,640
Here's a Spanish-- which
is one of the languages I

1991
01:44:11,640 --> 01:44:14,640
told Wikidata game it can use.

1992
01:44:14,640 --> 01:44:16,480
This is a Spanish item.

1993
01:44:16,480 --> 01:44:19,265
Now is it about a person or not?

1994
01:44:19,265 --> 01:44:22,120


1995
01:44:22,120 --> 01:44:23,230
It is not about a person.

1996
01:44:23,230 --> 01:44:25,906


1997
01:44:25,906 --> 01:44:26,780
Is it about a person?

1998
01:44:26,780 --> 01:44:29,155


1999
01:44:29,155 --> 01:44:29,655
No.

2000
01:44:29,655 --> 01:44:32,900


2001
01:44:32,900 --> 01:44:35,180
Yes, it is right?

2002
01:44:35,180 --> 01:44:38,550
Monk Cistercian, Pedro
de Ovideo Falconi.

2003
01:44:38,550 --> 01:44:40,890
That sounds like a person.

2004
01:44:40,890 --> 01:44:42,680
Frau Pedro Nasser.

2005
01:44:42,680 --> 01:44:44,960
Yeah, he was born
in Madrid 1577.

2006
01:44:44,960 --> 01:44:46,280
This is a person.

2007
01:44:46,280 --> 01:44:47,060
OK.

2008
01:44:47,060 --> 01:44:49,730
So I click person.

2009
01:44:49,730 --> 01:44:52,100
Again, if you're not
sure, click not sure.

2010
01:44:52,100 --> 01:44:55,100
The point is, just by clicking
person and as you can see

2011
01:44:55,100 --> 01:44:57,780
this would work
very well on mobile,

2012
01:44:57,780 --> 01:45:01,430
which is why I said you can
contribute on your commute.

2013
01:45:01,430 --> 01:45:04,100
You can just hold your
phone or tablet or whatever,

2014
01:45:04,100 --> 01:45:05,840
and just tap.

2015
01:45:05,840 --> 01:45:07,040
Person, not a person.

2016
01:45:07,040 --> 01:45:08,900
Person, not a person.

2017
01:45:08,900 --> 01:45:12,500
The amazing thing is that just
tapping person has actually

2018
01:45:12,500 --> 01:45:15,830
made an edit to Wikidata
on my behalf, which

2019
01:45:15,830 --> 01:45:21,560
I can find out, like every
wiki, by clicking contributions.

2020
01:45:21,560 --> 01:45:24,200
And as you can see in addition
to the stuff about circus

2021
01:45:24,200 --> 01:45:28,340
Bulgaria, my latest edit is in
fact about this Pedro de Ovideo

2022
01:45:28,340 --> 01:45:30,130
Falconi person.

2023
01:45:30,130 --> 01:45:32,000
And the edit was, you can--

2024
01:45:32,000 --> 01:45:38,030
I hope you can see this, created
the claim instance of human.

2025
01:45:38,030 --> 01:45:39,110
So I added--

2026
01:45:39,110 --> 01:45:43,100
I mean Wikidata game
added for me the statement

2027
01:45:43,100 --> 01:45:44,180
instance of human.

2028
01:45:44,180 --> 01:45:47,780
Now, the awesome thing is
that it was super easy to do.

2029
01:45:47,780 --> 01:45:51,890
I didn't have to go into that
entity, click the Add button,

2030
01:45:51,890 --> 01:45:57,080
choose the instance of property,
choose human, hit Save.

2031
01:45:57,080 --> 01:45:59,210
Instead of all these
operations I just

2032
01:45:59,210 --> 01:46:04,250
tapped on my screen,
person, not a person.

2033
01:46:04,250 --> 01:46:10,280
And I can do hundreds of
edits during my daily commute.

2034
01:46:10,280 --> 01:46:12,410
There are other games,
like the gender game.

2035
01:46:12,410 --> 01:46:14,810
So this is about--

2036
01:46:14,810 --> 01:46:17,240
this is when Wikidata
already knows

2037
01:46:17,240 --> 01:46:19,760
that this item is a
person, but it doesn't

2038
01:46:19,760 --> 01:46:21,710
know the gender of this person.

2039
01:46:21,710 --> 01:46:25,340
Which is another one of
the more basic items.

2040
01:46:25,340 --> 01:46:27,770
And this is taking a long
time because of the language

2041
01:46:27,770 --> 01:46:29,870
limitations that I set on it.

2042
01:46:29,870 --> 01:46:32,660
I guess the less exotic
languages have already

2043
01:46:32,660 --> 01:46:35,130
been exhausted in the game.

2044
01:46:35,130 --> 01:46:36,880
We don't have to
wait all this time.

2045
01:46:36,880 --> 01:46:40,280


2046
01:46:40,280 --> 01:46:44,970
We can try something else.

2047
01:46:44,970 --> 01:46:45,950
How about occupation?

2048
01:46:45,950 --> 01:46:46,850
The occupation game.

2049
01:46:46,850 --> 01:46:49,400
Here we go, this is in Russian.

2050
01:46:49,400 --> 01:46:55,540
And what is the occupation
of this gentleman?

2051
01:46:55,540 --> 01:46:58,630
Well he is an [INAUDIBLE].

2052
01:46:58,630 --> 01:47:00,700
He's a church person.

2053
01:47:00,700 --> 01:47:04,300
However, so the
occupation game is

2054
01:47:04,300 --> 01:47:06,490
where Wikidata game
will automatically

2055
01:47:06,490 --> 01:47:10,990
pull likely occupations
from the article text

2056
01:47:10,990 --> 01:47:13,810
and ask for confirmation.

2057
01:47:13,810 --> 01:47:16,840
So if he-- if this person
really is a deacon,

2058
01:47:16,840 --> 01:47:17,770
I should click that.

2059
01:47:17,770 --> 01:47:19,990
But I'm not sure.

2060
01:47:19,990 --> 01:47:24,950
I'm not clear on the Russian
church's distinctions between--

2061
01:47:24,950 --> 01:47:26,620
I mean [INAUDIBLE]
is pretty senior,

2062
01:47:26,620 --> 01:47:28,690
but I don't know if that
automatically also means

2063
01:47:28,690 --> 01:47:30,100
he's a deacon or not.

2064
01:47:30,100 --> 01:47:32,720
And [INAUDIBLE] is
not listed here.

2065
01:47:32,720 --> 01:47:36,380
So I will click not listed.

2066
01:47:36,380 --> 01:47:39,540
Also, these guesses
are not always correct.

2067
01:47:39,540 --> 01:47:42,680
So, this guy for
example, is in Russian.

2068
01:47:42,680 --> 01:47:43,430
I can read this.

2069
01:47:43,430 --> 01:47:44,470
He's a philologist.

2070
01:47:44,470 --> 01:47:45,380
He's a linguist.

2071
01:47:45,380 --> 01:47:48,510
So I can confirm it
and click linguist.

2072
01:47:48,510 --> 01:47:49,010
All right?

2073
01:47:49,010 --> 01:47:51,950
And again, if we look
at my contributions

2074
01:47:51,950 --> 01:47:55,700
we can see the Wikidata
game on my behalf

2075
01:47:55,700 --> 01:47:59,930
created occupation linguist.

2076
01:47:59,930 --> 01:48:02,450
OK.

2077
01:48:02,450 --> 01:48:04,370
Just by typing linguist there.

2078
01:48:04,370 --> 01:48:07,040
Now if it's taken
from the article,

2079
01:48:07,040 --> 01:48:09,860
why would it ever be wrong?

2080
01:48:09,860 --> 01:48:15,970
Well Jesus was the
son of a carpenter.

2081
01:48:15,970 --> 01:48:18,870
The word carpenter
appears in the text.

2082
01:48:18,870 --> 01:48:22,840
That doesn't mean it's correct
to say Jesus was a carpenter.

2083
01:48:22,840 --> 01:48:23,340
OK?

2084
01:48:23,340 --> 01:48:24,660
Just a trivial example, right?

2085
01:48:24,660 --> 01:48:30,250
So many, many articles will say,
you know, born to a physician.

2086
01:48:30,250 --> 01:48:32,850
And so the word physician
could be guessed,

2087
01:48:32,850 --> 01:48:36,030
but it wouldn't be correct
unless the son is also

2088
01:48:36,030 --> 01:48:38,090
a physician.

2089
01:48:38,090 --> 01:48:43,540
So I hope it gives
you the gist of it.

2090
01:48:43,540 --> 01:48:47,500
There is also a
distributed Wikidata game,

2091
01:48:47,500 --> 01:48:48,774
which is pretty awesome.

2092
01:48:48,774 --> 01:48:51,450


2093
01:48:51,450 --> 01:48:54,320
Here we go, which
has additional games.

2094
01:48:54,320 --> 01:49:02,610
So, for example, the
key on game gives you,

2095
01:49:02,610 --> 01:49:06,940
maybe it gives you,
some items to play with.

2096
01:49:06,940 --> 01:49:16,610


2097
01:49:16,610 --> 01:49:17,110
Yes?

2098
01:49:17,110 --> 01:49:17,610
No?

2099
01:49:17,610 --> 01:49:18,430
OK.

2100
01:49:18,430 --> 01:49:20,830
So it gives you
this little card,

2101
01:49:20,830 --> 01:49:27,940
and asks you to confirm is this
instance of human settlement?

2102
01:49:27,940 --> 01:49:30,480
That is, is it a village,
town, city, whatever.

2103
01:49:30,480 --> 01:49:33,310
Is it a kind of human
settlement or not?

2104
01:49:33,310 --> 01:49:34,340
Or maybe it's a book.

2105
01:49:34,340 --> 01:49:35,540
Maybe it's a poem.

2106
01:49:35,540 --> 01:49:38,980
Again, so, is it an
English settlement?

2107
01:49:38,980 --> 01:49:41,500
And you can click the languages
here to see the information.

2108
01:49:41,500 --> 01:49:43,270
So I can click English.

2109
01:49:43,270 --> 01:49:44,572
And indeed the article--

2110
01:49:44,572 --> 01:49:46,030
I mean the actual
Wikipedia article

2111
01:49:46,030 --> 01:49:49,360
says Camigji is a
town and territory

2112
01:49:49,360 --> 01:49:51,370
in this district in the Congo.

2113
01:49:51,370 --> 01:49:54,640
So yes, this is an instance
of human settlement.

2114
01:49:54,640 --> 01:49:57,580
So I clicked yes.

2115
01:49:57,580 --> 01:50:00,460
And just clicking yes
again went to that item,

2116
01:50:00,460 --> 01:50:02,740
and added property
of human settlement.

2117
01:50:02,740 --> 01:50:05,560
Now the point of
all these games is

2118
01:50:05,560 --> 01:50:08,140
these are tools,
written by programmers,

2119
01:50:08,140 --> 01:50:12,490
making kind of semi educated
guesses about these fairly

2120
01:50:12,490 --> 01:50:14,120
basic properties.

2121
01:50:14,120 --> 01:50:17,770
And they are meant to
semi automate, to assist,

2122
01:50:17,770 --> 01:50:23,730
in the accumulation of all
these important pieces of data.

2123
01:50:23,730 --> 01:50:26,640
Now every single
click here helps

2124
01:50:26,640 --> 01:50:31,000
Wikidata give better
results, richer results

2125
01:50:31,000 --> 01:50:32,380
in future queries.

2126
01:50:32,380 --> 01:50:38,130
Again, as of right now
Wikidata can include Camigji

2127
01:50:38,130 --> 01:50:42,690
if I ask it, you know, what
are some towns in Congo?

2128
01:50:42,690 --> 01:50:44,220
Until now it could not.

2129
01:50:44,220 --> 01:50:46,830
Because it literally
didn't know.

2130
01:50:46,830 --> 01:50:51,950
So every time we click male,
female, person, not a person,

2131
01:50:51,950 --> 01:50:56,640
make these decisions,
we help improve Wikidata

2132
01:50:56,640 --> 01:51:01,560
and enrich the results
that we could receive.

2133
01:51:01,560 --> 01:51:04,590
Any questions about this, about
kind of micro contributions

2134
01:51:04,590 --> 01:51:07,010
through the Wikidata game?

2135
01:51:07,010 --> 01:51:09,890
If that looks
appealing I encourage

2136
01:51:09,890 --> 01:51:12,860
you to go and visit
the Wikidata game

2137
01:51:12,860 --> 01:51:15,205
and start contributing
in that way.

2138
01:51:15,205 --> 01:51:19,580


2139
01:51:19,580 --> 01:51:21,650
There is a question here.

2140
01:51:21,650 --> 01:51:24,650
If I make an article about
Circus Bulgaria how should

2141
01:51:24,650 --> 01:51:26,630
I correctly connect them?

2142
01:51:26,630 --> 01:51:28,740
That is an excellent question.

2143
01:51:28,740 --> 01:51:33,090
So once-- so now there is a
Wikidata item about that book,

2144
01:51:33,090 --> 01:51:37,650
but there is no Wikipedia
article anywhere.

2145
01:51:37,650 --> 01:51:41,460
Now suppose I write one
in, Bulgarian maybe,

2146
01:51:41,460 --> 01:51:42,870
you go to Wikidata.

2147
01:51:42,870 --> 01:51:45,180
You find the item by searching.

2148
01:51:45,180 --> 01:51:49,170
You find the item, and then
the empty site links section

2149
01:51:49,170 --> 01:51:50,850
right at the bottom there--

2150
01:51:50,850 --> 01:51:52,020
where are we?

2151
01:51:52,020 --> 01:51:53,100
We have this?

2152
01:51:53,100 --> 01:51:55,050
Circus Bulgaria.

2153
01:51:55,050 --> 01:51:56,010
Let's demonstrate this.

2154
01:51:56,010 --> 01:51:58,000
So here is the item
about the book.

2155
01:51:58,000 --> 01:52:01,030
Let's say that now
there is an article

2156
01:52:01,030 --> 01:52:03,670
because I just created it.

2157
01:52:03,670 --> 01:52:07,450
I can go here to the empty
Wikipedia link section,

2158
01:52:07,450 --> 01:52:11,760
click Edit, type the
name of the wiki,

2159
01:52:11,760 --> 01:52:16,430
let's say English, and then
type the name of the page

2160
01:52:16,430 --> 01:52:18,230
that I just created.

2161
01:52:18,230 --> 01:52:20,790
Circus-- right?

2162
01:52:20,790 --> 01:52:23,400
And again, it offers
me auto-complete

2163
01:52:23,400 --> 01:52:25,080
for my convenience.

2164
01:52:25,080 --> 01:52:28,260
Now we don't actually
have the article created,

2165
01:52:28,260 --> 01:52:30,480
but I could let's just
say this was the article.

2166
01:52:30,480 --> 01:52:33,330
I can just click this,
hit Save, and that

2167
01:52:33,330 --> 01:52:36,450
would associate the
new Wikipedia article

2168
01:52:36,450 --> 01:52:38,130
with this Wikidata item.

2169
01:52:38,130 --> 01:52:41,940
That is the beginning of the
inter-wiki list for this item.

2170
01:52:41,940 --> 01:52:43,620
I will not click
Save Now, because we

2171
01:52:43,620 --> 01:52:45,289
didn't have the article yet.

2172
01:52:45,289 --> 01:52:46,830
So I hope that
answers that question.

2173
01:52:46,830 --> 01:52:50,340
Was there another question
that I missed here?

2174
01:52:50,340 --> 01:52:51,450
No.

2175
01:52:51,450 --> 01:52:53,170
OK.

2176
01:52:53,170 --> 01:52:55,300
Any questions about
the Wikidata game?

2177
01:52:55,300 --> 01:53:00,740
About this idea of
micro contributions?

2178
01:53:00,740 --> 01:53:05,330
If not then we can move
on to embedding data,

2179
01:53:05,330 --> 01:53:07,490
and after that we
can discuss queries,

2180
01:53:07,490 --> 01:53:12,000
how to get at all this
data from Wikidata.

2181
01:53:12,000 --> 01:53:16,500
So the short version of how
to embed data from Wikidata

2182
01:53:16,500 --> 01:53:19,920
is that there is this
little magic incantation.

2183
01:53:19,920 --> 01:53:25,410
Curly brace, curly brace,
hash mark, property.

2184
01:53:25,410 --> 01:53:29,820
It looks like a template, but
it isn't because of that hash.

2185
01:53:29,820 --> 01:53:31,320
And that is magic.

2186
01:53:31,320 --> 01:53:34,170
Take a look at this little
demo that I prepared.

2187
01:53:34,170 --> 01:53:37,950
This page, which is off
my user page on meta,

2188
01:53:37,950 --> 01:53:40,110
but it could be on any wiki.

2189
01:53:40,110 --> 01:53:42,490
OK.

2190
01:53:42,490 --> 01:53:49,420
Says, since San Francisco
is item Q62 in Wikidata,

2191
01:53:49,420 --> 01:53:55,240
and since population is
property P1082, I can tell you

2192
01:53:55,240 --> 01:53:58,840
that according to Wikidata the
population of San Francisco

2193
01:53:58,840 --> 01:54:02,180
is this.

2194
01:54:02,180 --> 01:54:08,420
And this bolded number here was
produced with this incantation.

2195
01:54:08,420 --> 01:54:14,420
Curly brace, curly brace,
hash mark, property P1082,

2196
01:54:14,420 --> 01:54:18,751
that's population,
type from what item?

2197
01:54:18,751 --> 01:54:19,250
Right?

2198
01:54:19,250 --> 01:54:21,650
Cause I'm pulling
an arbitrary number.

2199
01:54:21,650 --> 01:54:23,570
I could put any
property in any item

2200
01:54:23,570 --> 01:54:27,020
here, and kind of include
it, embedded, into my text.

2201
01:54:27,020 --> 01:54:29,630
This isn't even about-- you
notice this is my user page.

2202
01:54:29,630 --> 01:54:32,480
This isn't even the article
about San Francisco.

2203
01:54:32,480 --> 01:54:35,210
I just want to pull that
number into this thing

2204
01:54:35,210 --> 01:54:36,410
that I'm writing.

2205
01:54:36,410 --> 01:54:38,820
So it's fairly simple.

2206
01:54:38,820 --> 01:54:40,970
I identify the property.

2207
01:54:40,970 --> 01:54:43,440
I identify the item
to take it from.

2208
01:54:43,440 --> 01:54:47,120
And Wikidata will,
I mean Wikipedia,

2209
01:54:47,120 --> 01:54:50,480
or the wiki I'm on, in this
case meta, will go to Wikipedia

2210
01:54:50,480 --> 01:54:52,820
and fetch it for me.

2211
01:54:52,820 --> 01:54:56,480
Likewise, since Denny Vrandecic,
the designer of Wikidata

2212
01:54:56,480 --> 01:55:01,370
is item 18618629, right?

2213
01:55:01,370 --> 01:55:04,790
I mean, he's a notable person,
so he has a Wikidata entity.

2214
01:55:04,790 --> 01:55:09,160
And since occupation is property
106, and date of birth is 569,

2215
01:55:09,160 --> 01:55:12,290
and place of birth
is 19, because

2216
01:55:12,290 --> 01:55:14,720
of all that I can tell you
that Vrandecic was born

2217
01:55:14,720 --> 01:55:19,130
in Stuttgart, on this date,
and is researcher, programmer,

2218
01:55:19,130 --> 01:55:20,850
and computer scientist.

2219
01:55:20,850 --> 01:55:25,010
If you look at the source for
this page, click Edit Source,

2220
01:55:25,010 --> 01:55:28,700
you can see that the word
Stuttgart does not appear here,

2221
01:55:28,700 --> 01:55:30,530
because it came from Wikidata.

2222
01:55:30,530 --> 01:55:34,171
I did not write this into
my little demo page here.

2223
01:55:34,171 --> 01:55:34,670
See?

2224
01:55:34,670 --> 01:55:37,380
Place of birth is--

2225
01:55:37,380 --> 01:55:37,880
where is it?

2226
01:55:37,880 --> 01:55:38,380
Here.

2227
01:55:38,380 --> 01:55:43,790
Born in property 19 from
queue number so-and-so.

2228
01:55:43,790 --> 01:55:46,970
That is how easy
it is to pull stuff

2229
01:55:46,970 --> 01:55:51,890
into a wiki from Wikidata.

2230
01:55:51,890 --> 01:55:55,280
OK now there's
some nuance to it.

2231
01:55:55,280 --> 01:55:57,470
And there's there are
some additional parameters

2232
01:55:57,470 --> 01:55:58,130
you can give.

2233
01:55:58,130 --> 01:56:00,230
And you can ask
Wikidata to give you

2234
01:56:00,230 --> 01:56:03,635
not just the text of the values,
but actually make it links.

2235
01:56:03,635 --> 01:56:06,750


2236
01:56:06,750 --> 01:56:14,825
So, for example, if I change
this from property to values--

2237
01:56:14,825 --> 01:56:25,950


2238
01:56:25,950 --> 01:56:29,142
No, that did not work at all.

2239
01:56:29,142 --> 01:56:29,850
Wasn't it values?

2240
01:56:29,850 --> 01:56:30,350
What was it?

2241
01:56:30,350 --> 01:56:33,370


2242
01:56:33,370 --> 01:56:34,614
Values and then--

2243
01:56:34,614 --> 01:57:19,265


2244
01:57:19,265 --> 01:57:19,890
Oh, statements.

2245
01:57:19,890 --> 01:57:20,710
My bad, sorry.

2246
01:57:20,710 --> 01:57:22,980
The Magic word is statements.

2247
01:57:22,980 --> 01:57:24,010
Statements.

2248
01:57:24,010 --> 01:57:28,680
So going back here.

2249
01:57:28,680 --> 01:57:35,385
If I change the word property
to the word statements

2250
01:57:35,385 --> 01:57:40,890
here then this same value--

2251
01:57:40,890 --> 01:57:43,300
that did not work at all.

2252
01:57:43,300 --> 01:57:46,690
Oh, because I'm on meta.

2253
01:57:46,690 --> 01:57:48,670
So because I'm on
meta, meta doesn't

2254
01:57:48,670 --> 01:57:52,230
have an article named
researcher, programmer,

2255
01:57:52,230 --> 01:57:53,500
or computer scientist.

2256
01:57:53,500 --> 01:57:55,120
But Wikipedia does.

2257
01:57:55,120 --> 01:58:00,210
If I included this same
syntax in Wikipedia,

2258
01:58:00,210 --> 01:58:02,950
like English Wikipedia,
for example--

2259
01:58:02,950 --> 01:58:04,855
So let's go there right now.

2260
01:58:04,855 --> 01:58:11,240


2261
01:58:11,240 --> 01:58:13,480
And go-- go to my--

2262
01:58:13,480 --> 01:58:18,550


2263
01:58:18,550 --> 01:58:19,345
Go to my sandbox.

2264
01:58:19,345 --> 01:58:23,090


2265
01:58:23,090 --> 01:58:27,982
If I just brutally paste
this on my sandbox here--

2266
01:58:27,982 --> 01:58:32,690


2267
01:58:32,690 --> 01:58:35,810
So, see, these became links.

2268
01:58:35,810 --> 01:58:39,740
Because Wikipedia has an article
called programmer and computer

2269
01:58:39,740 --> 01:58:40,910
scientist.

2270
01:58:40,910 --> 01:58:43,460
So, like I said, there's
some additional nuance

2271
01:58:43,460 --> 01:58:44,840
to the embedding.

2272
01:58:44,840 --> 01:58:47,030
The important thing
is that this is

2273
01:58:47,030 --> 01:58:51,470
the key to delivering on that
first problem that I mentioned.

2274
01:58:51,470 --> 01:58:55,970
How to get data from
a central location

2275
01:58:55,970 --> 01:58:58,850
onto your wiki in your language.

2276
01:58:58,850 --> 01:59:04,460
Basically using property and
statements magic incantations.

2277
01:59:04,460 --> 01:59:07,100
And of course,
usually, this would be

2278
01:59:07,100 --> 01:59:10,010
in the context of an info box.

2279
01:59:10,010 --> 01:59:14,180
Some wikis-- English Wikipedia
is not leading the way there.

2280
01:59:14,180 --> 01:59:16,490
Some smaller wikis
are more advanced

2281
01:59:16,490 --> 01:59:22,070
actually in integrating
Wikidata embeddings like this

2282
01:59:22,070 --> 01:59:24,620
into their info boxes.

2283
01:59:24,620 --> 01:59:26,300
So that instead of
the info box just

2284
01:59:26,300 --> 01:59:30,620
being a template on the wiki
with field equals value,

2285
01:59:30,620 --> 01:59:31,685
field equals value.

2286
01:59:31,685 --> 01:59:35,700
That template of the
info box on the wiki

2287
01:59:35,700 --> 01:59:40,160
pulls the values, the birthdate,
the languages, et cetera,

2288
01:59:40,160 --> 01:59:44,210
pulls them from Wikidata.

2289
01:59:44,210 --> 01:59:49,820
So basically just-- I just
demonstrated single calls

2290
01:59:49,820 --> 01:59:52,550
to this, but of course
an info box template

2291
01:59:52,550 --> 01:59:56,270
would include maybe
20 or 40 such embeds,

2292
01:59:56,270 --> 01:59:57,710
and that is not a problem.

2293
01:59:57,710 --> 02:00:01,460
Of course, before you go and
edit the English Wikipedia's

2294
02:00:01,460 --> 02:00:06,050
info box person and replace
it all with Wikidata embeds,

2295
02:00:06,050 --> 02:00:09,050
you should discuss it with the
English Wikipedia community.

2296
02:00:09,050 --> 02:00:12,000
These discussions have
already been taking place.

2297
02:00:12,000 --> 02:00:13,640
There are some
concerns about how

2298
02:00:13,640 --> 02:00:17,150
to patrol this, how to keep
it newbie friendly, et cetera.

2299
02:00:17,150 --> 02:00:20,690
So there are legitimate concerns
with just moving everything

2300
02:00:20,690 --> 02:00:22,910
to be embedded from Wikidata.

2301
02:00:22,910 --> 02:00:26,450
But the communities are
gradually handling this.

2302
02:00:26,450 --> 02:00:29,390
I mean this ability to embed
from Wikidata is not very old.

2303
02:00:29,390 --> 02:00:31,550
It's been around
for about a year.

2304
02:00:31,550 --> 02:00:35,150
So communities are
still working on kind

2305
02:00:35,150 --> 02:00:37,560
of integrating that technology.

2306
02:00:37,560 --> 02:00:40,190
But that is that is kind
of just the basics of how

2307
02:00:40,190 --> 02:00:44,210
to pull data, individual bits
of data, that's not querying,

2308
02:00:44,210 --> 02:00:47,330
that's not asking those sweeping
questions that I was talking

2309
02:00:47,330 --> 02:00:48,850
about yet.

2310
02:00:48,850 --> 02:00:50,720
We'll get to that
right now this is

2311
02:00:50,720 --> 02:00:55,310
how to pull a specific datum,
a specific piece of data,

2312
02:00:55,310 --> 02:00:57,395
from Wikidata.

2313
02:00:57,395 --> 02:01:01,530


2314
02:01:01,530 --> 02:01:02,530
OK.

2315
02:01:02,530 --> 02:01:07,080
So here's another quick
thing to demonstrate

2316
02:01:07,080 --> 02:01:09,880
before we go to
queries, and that

2317
02:01:09,880 --> 02:01:12,010
is the article placeholder.

2318
02:01:12,010 --> 02:01:15,010
The article placeholder
is a feature

2319
02:01:15,010 --> 02:01:19,660
that is being tested on the
Esperanto Wikipedia, and maybe

2320
02:01:19,660 --> 02:01:22,180
another wiki, I don't remember.

2321
02:01:22,180 --> 02:01:28,490
And it is using the
potential of Wikidata

2322
02:01:28,490 --> 02:01:32,690
to offer a placeholder
for an article.

2323
02:01:32,690 --> 02:01:37,940
An automatically generated
Wikidata powered replacement

2324
02:01:37,940 --> 02:01:41,720
placeholder for an article
for articles that don't yet

2325
02:01:41,720 --> 02:01:45,950
exist on Esperanto.

2326
02:01:45,950 --> 02:01:50,440
So let's go to the
Esperanto Wikipedia.

2327
02:01:50,440 --> 02:01:52,440
I don't speak Esperanto.

2328
02:01:52,440 --> 02:01:56,760
But let's look for Helen
Dewitt, our friend,

2329
02:01:56,760 --> 02:01:58,170
in Esperanto Wikipedia.

2330
02:01:58,170 --> 02:02:00,270
Now Esperanto is not
one of the Wikipedias

2331
02:02:00,270 --> 02:02:03,060
that have an article
about Helen Dewitt.

2332
02:02:03,060 --> 02:02:04,890
And so it tells me that, right?

2333
02:02:04,890 --> 02:02:06,570
There is no Helen Dewitt.

2334
02:02:06,570 --> 02:02:08,670
Maybe you were looking
for Helena Dewitt.

2335
02:02:08,670 --> 02:02:10,200
No, I was not.

2336
02:02:10,200 --> 02:02:13,650
You can start an article
about Helen Dewitt.

2337
02:02:13,650 --> 02:02:15,390
You can search.

2338
02:02:15,390 --> 02:02:17,820
You know, there's
all this stuff.

2339
02:02:17,820 --> 02:02:24,180
But there is also this
little option here, hiding,

2340
02:02:24,180 --> 02:02:30,640
which tells me that the
Esperanto Wikipedia is--

2341
02:02:30,640 --> 02:02:31,580
what's happening here?

2342
02:02:31,580 --> 02:02:35,140


2343
02:02:35,140 --> 02:02:35,890
Yes.

2344
02:02:35,890 --> 02:02:40,520
The Esperanto Wikipedia is
ready to give me this page.

2345
02:02:40,520 --> 02:02:44,020
This page, as you can see, it's
on the Esperanto Wikipedia,

2346
02:02:44,020 --> 02:02:46,090
but it's not an article.

2347
02:02:46,090 --> 02:02:47,480
See, it's a special page.

2348
02:02:47,480 --> 02:02:49,700
It's machine generated.

2349
02:02:49,700 --> 02:02:52,150
You can see the URL as well.

2350
02:02:52,150 --> 02:02:54,410
It's not, you know,
slash Helen Dewitt.

2351
02:02:54,410 --> 02:02:58,450
It's slash specialio,
about topic,

2352
02:02:58,450 --> 02:03:01,570
and then the Wikidata
ID of Helen Dewitt.

2353
02:03:01,570 --> 02:03:03,760
And what I get here--

2354
02:03:03,760 --> 02:03:05,860
I get an English
description, by the way,

2355
02:03:05,860 --> 02:03:08,300
because there is no
Esperanto description.

2356
02:03:08,300 --> 02:03:10,420
Wikidata can't make it up.

2357
02:03:10,420 --> 02:03:13,600
But what it can do is
offer me these pieces

2358
02:03:13,600 --> 02:03:16,960
of data in my language,
in this case Esperanto.

2359
02:03:16,960 --> 02:03:18,921
I'm on the Esperanto Wikipedia.

2360
02:03:18,921 --> 02:03:19,420
OK.

2361
02:03:19,420 --> 02:03:23,380
So it tells me that she's
American, for example,

2362
02:03:23,380 --> 02:03:26,090
and it tells me
that in Esperanto.

2363
02:03:26,090 --> 02:03:29,350
OK and it tells me
that she speaks Latin.

2364
02:03:29,350 --> 02:03:32,410
Remember we taught
Wikidata that?

2365
02:03:32,410 --> 02:03:35,800
It tells me that she
was educated in Oxford,

2366
02:03:35,800 --> 02:03:38,050
you know, and gives me the
references to the extent

2367
02:03:38,050 --> 02:03:39,130
that they exist.

2368
02:03:39,130 --> 02:03:41,560
I mean this is not an article.

2369
02:03:41,560 --> 02:03:46,650
It's not, you know, paragraphs
of fluent Esperanto text.

2370
02:03:46,650 --> 02:03:50,190
But it is information
that I can understand

2371
02:03:50,190 --> 02:03:51,960
if I speak this language.

2372
02:03:51,960 --> 02:03:55,380
And it's better than nothing.

2373
02:03:55,380 --> 02:04:00,120
And remember Helen Dewitt was
not a very detailed article.

2374
02:04:00,120 --> 02:04:03,690
If I were to ask about, I
don't know, some politician,

2375
02:04:03,690 --> 02:04:08,340
or popular singer that
has more data in Wikidata,

2376
02:04:08,340 --> 02:04:12,690
than this machine generated
thing would have been richer.

2377
02:04:12,690 --> 02:04:16,320
So this feature is available
and is under beta testing

2378
02:04:16,320 --> 02:04:19,530
right now, but generally if
this sounds interesting for you

2379
02:04:19,530 --> 02:04:21,600
especially if you come
from a smaller wiki that

2380
02:04:21,600 --> 02:04:25,230
is missing a lot of articles
that people may want to learn

2381
02:04:25,230 --> 02:04:28,320
about, you can contact
the Wikimedia foundation

2382
02:04:28,320 --> 02:04:33,486
and ask for article placeholder
to be enabled on your wiki.

2383
02:04:33,486 --> 02:04:34,860
And again, this
is a placeholder.

2384
02:04:34,860 --> 02:04:37,890
Of course, it exists only
until someone actually

2385
02:04:37,890 --> 02:04:43,290
writes a proper Esperanto
article about Helen Dewitt.

2386
02:04:43,290 --> 02:04:45,060
So I hope this is clear.

2387
02:04:45,060 --> 02:04:50,810
This is all coming from
Wikidata on the fly.

2388
02:04:50,810 --> 02:04:51,470
In real time.

2389
02:04:51,470 --> 02:04:57,500
As you can see it includes my
latest edits to Helen Dewitt.

2390
02:04:57,500 --> 02:04:58,940
OK.

2391
02:04:58,940 --> 02:05:05,250
Questions about the-- questions
about the article placeholder?

2392
02:05:05,250 --> 02:05:09,580
If there are try and
put them on the channel.

2393
02:05:09,580 --> 02:05:13,300
And this brings us to one of
the main courses of this talk,

2394
02:05:13,300 --> 02:05:15,270
which is querying Wikidata.

2395
02:05:15,270 --> 02:05:18,660
So I've explained
how Wikidata works.

2396
02:05:18,660 --> 02:05:19,680
We've walked through it.

2397
02:05:19,680 --> 02:05:20,850
We've added to it.

2398
02:05:20,850 --> 02:05:22,800
We've created a new item.

2399
02:05:22,800 --> 02:05:26,360
We learned how to contribute
during our commutes.

2400
02:05:26,360 --> 02:05:30,150
And all this was you
kept promising us,

2401
02:05:30,150 --> 02:05:32,050
Asaf, that this would be--

2402
02:05:32,050 --> 02:05:34,690
this would enable
these amazing queries.

2403
02:05:34,690 --> 02:05:37,960
So time to make good on that.

2404
02:05:37,960 --> 02:05:42,880
The URL you need to remember
is query.wikidata.org.

2405
02:05:42,880 --> 02:05:49,390
And that will take you
to a query system that

2406
02:05:49,390 --> 02:05:52,510
uses a language called SPARQL.

2407
02:05:52,510 --> 02:05:58,150
SPARQL, spelt with
a Q. This language

2408
02:05:58,150 --> 02:06:01,690
is not a Wikimedia creation.

2409
02:06:01,690 --> 02:06:06,010
It's a standardized language
used for querying linked data

2410
02:06:06,010 --> 02:06:07,540
sources.

2411
02:06:07,540 --> 02:06:10,720
And because of that
there are there

2412
02:06:10,720 --> 02:06:14,590
are certain usability prices
that we pay for using SPARQL,

2413
02:06:14,590 --> 02:06:16,010
for using a standard language.

2414
02:06:16,010 --> 02:06:19,570
It's not completely custom
made for querying Wikidata,

2415
02:06:19,570 --> 02:06:21,740
and we'll see that
in just a moment.

2416
02:06:21,740 --> 02:06:23,530
The principle to
remember about Wikidata

2417
02:06:23,530 --> 02:06:27,880
query is that Wikidata will
tell you everything it knows,

2418
02:06:27,880 --> 02:06:29,470
but no more.

2419
02:06:29,470 --> 02:06:32,440
I have anticipated this
several times already, right?

2420
02:06:32,440 --> 02:06:35,980
Until this moment when
we taught Wikidata data

2421
02:06:35,980 --> 02:06:38,590
that Helen Dewitt
speaks Latin, she

2422
02:06:38,590 --> 02:06:41,500
would not have appeared
in query results

2423
02:06:41,500 --> 02:06:45,974
asking who are American
writers who speak Latin?

2424
02:06:45,974 --> 02:06:47,140
She would not have appeared.

2425
02:06:47,140 --> 02:06:49,090
But as of this
afternoon, she will

2426
02:06:49,090 --> 02:06:52,950
appear because I've added
that piece of information.

2427
02:06:52,950 --> 02:07:01,380
So a result of that principle
is that you can never say,

2428
02:07:01,380 --> 02:07:05,950
well I ran a Wikidata
query and this

2429
02:07:05,950 --> 02:07:11,510
is the list of Flemish painters
who are sons of painters.

2430
02:07:11,510 --> 02:07:12,310
The list.

2431
02:07:12,310 --> 02:07:14,110
That these are all
the Flemish painters

2432
02:07:14,110 --> 02:07:15,220
who are sons of painters.

2433
02:07:15,220 --> 02:07:19,390
That is never something you can
say based on a Wikidata query,

2434
02:07:19,390 --> 02:07:22,390
because of course, maybe
not all the Flemish painters

2435
02:07:22,390 --> 02:07:26,020
who are sons of painters have
been expressed in Wikidata data

2436
02:07:26,020 --> 02:07:26,760
yet.

2437
02:07:26,760 --> 02:07:28,840
Wikidata doesn't know
about some of them,

2438
02:07:28,840 --> 02:07:30,340
or maybe it knows
about all of them

2439
02:07:30,340 --> 02:07:32,500
but doesn't know
the important fact

2440
02:07:32,500 --> 02:07:35,200
that this person is
the son of that person,

2441
02:07:35,200 --> 02:07:38,740
because those properties
have not been added.

2442
02:07:38,740 --> 02:07:40,940
And so they cannot be
included in the results.

2443
02:07:40,940 --> 02:07:42,550
So the results of
a Wikidata query

2444
02:07:42,550 --> 02:07:46,870
are never the definitive sets.

2445
02:07:46,870 --> 02:07:49,600
What you can say about
a Wikidata query is here

2446
02:07:49,600 --> 02:07:52,840
are some Flemish painters
who are sons of painters.

2447
02:07:52,840 --> 02:07:56,260
Here are some cities
with female mayors.

2448
02:07:56,260 --> 02:07:58,270
Whatever it is
you're querying about

2449
02:07:58,270 --> 02:08:01,030
is never guaranteed
to be complete

2450
02:08:01,030 --> 02:08:03,580
because Wikidata,
like Wikipedia, is

2451
02:08:03,580 --> 02:08:05,530
a work in progress.

2452
02:08:05,530 --> 02:08:13,240
And of course, the more
we teach Wikidata the

2453
02:08:13,240 --> 02:08:16,240
more useful it becomes.

2454
02:08:16,240 --> 02:08:22,520
OK so lets go and
see those queries.

2455
02:08:22,520 --> 02:08:25,990
So this is query.wikidata.org.

2456
02:08:25,990 --> 02:08:29,000
It's not the wiki.

2457
02:08:29,000 --> 02:08:29,500
All right?

2458
02:08:29,500 --> 02:08:32,530
So this isn't like some
page on the wiki itself.

2459
02:08:32,530 --> 02:08:35,099
This is kind of an
external system.

2460
02:08:35,099 --> 02:08:35,890
So it's not a wiki.

2461
02:08:35,890 --> 02:08:37,960
You can see I don't
have a user page here.

2462
02:08:37,960 --> 02:08:39,520
I don't have a history tab.

2463
02:08:39,520 --> 02:08:40,960
This isn't a wiki page.

2464
02:08:40,960 --> 02:08:44,560
This is a special kind
of tool or system.

2465
02:08:44,560 --> 02:08:51,330
And it invites me to
input a SPARQL query.

2466
02:08:51,330 --> 02:08:55,060
Now most of us do
not speak SPARQL.

2467
02:08:55,060 --> 02:08:59,800
It's a a technical language.

2468
02:08:59,800 --> 02:09:01,720
It's a query language.

2469
02:09:01,720 --> 02:09:06,760
Some of you may be thinking
about SQL, the database query

2470
02:09:06,760 --> 02:09:08,500
language.

2471
02:09:08,500 --> 02:09:13,330
SPARQL is named with kind
of a wink, or a nod, to SQL.

2472
02:09:13,330 --> 02:09:17,440
But, I warn you, if
you are comfortable in

2473
02:09:17,440 --> 02:09:22,750
SQL don't expect to carry
over your knowledge of SQL

2474
02:09:22,750 --> 02:09:23,550
into SPARQL.

2475
02:09:23,550 --> 02:09:26,140
They're not the same.

2476
02:09:26,140 --> 02:09:27,940
They are superficially similar.

2477
02:09:27,940 --> 02:09:28,440
Right?

2478
02:09:28,440 --> 02:09:31,530
So they both use
the keyword select,

2479
02:09:31,530 --> 02:09:35,010
and they use the word where,
and they use things like limit,

2480
02:09:35,010 --> 02:09:35,770
and order.

2481
02:09:35,770 --> 02:09:38,190
So again, if you know
this already from SQL

2482
02:09:38,190 --> 02:09:40,500
those mean roughly
the same things,

2483
02:09:40,500 --> 02:09:44,550
but don't expect it to
behave just like SQL.

2484
02:09:44,550 --> 02:09:49,800
You do need to spend some time
understanding how SPARQL works.

2485
02:09:49,800 --> 02:09:52,560
So, by all means, I
invite you to go and read

2486
02:09:52,560 --> 02:09:55,680
one of the many fine
SPARQL tutorials that

2487
02:09:55,680 --> 02:09:59,590
are out there on the web, or
to click the Help button here,

2488
02:09:59,590 --> 02:10:03,930
which also includes
help about SPARQL.

2489
02:10:03,930 --> 02:10:08,440
But I also know
that most of us when

2490
02:10:08,440 --> 02:10:12,580
we want to do some advanced
formatting on wiki,

2491
02:10:12,580 --> 02:10:16,090
for example, we don't go
and read the help page

2492
02:10:16,090 --> 02:10:18,220
on templates, right?

2493
02:10:18,220 --> 02:10:21,460
We go to a page that already
does what we want to do,

2494
02:10:21,460 --> 02:10:27,430
and adopt and adapt the code
from that other page, right?

2495
02:10:27,430 --> 02:10:30,610
So we just take something that
does roughly what we want,

2496
02:10:30,610 --> 02:10:33,280
and just copy it over and
change what we need to change.

2497
02:10:33,280 --> 02:10:35,620
That is a very pragmatic
and reasonable way

2498
02:10:35,620 --> 02:10:37,420
to do things which is why--

2499
02:10:37,420 --> 02:10:39,850
and the wiki data
engineers know this,

2500
02:10:39,850 --> 02:10:43,300
which is why they prepared
this very handy button for us

2501
02:10:43,300 --> 02:10:45,580
called examples.

2502
02:10:45,580 --> 02:10:47,710
We click the examples button.

2503
02:10:47,710 --> 02:10:52,390
And, oh my god, there is a ton
of-- well there's 312 example

2504
02:10:52,390 --> 02:10:55,582
queries for us to choose from.

2505
02:10:55,582 --> 02:10:57,040
And we can just
pick something that

2506
02:10:57,040 --> 02:11:00,310
is roughly like what
we're trying to find out,

2507
02:11:00,310 --> 02:11:02,740
and then just change
what needs changing.

2508
02:11:02,740 --> 02:11:05,410
So let's take a very simple one.

2509
02:11:05,410 --> 02:11:07,020
The cats query.

2510
02:11:07,020 --> 02:11:10,270
Maybe one of the simplest
you could possibly have.

2511
02:11:10,270 --> 02:11:13,510
And let's run it first
and then I'll kind of

2512
02:11:13,510 --> 02:11:16,420
walk you through it.

2513
02:11:16,420 --> 02:11:18,460
The goal here is not
to teach you SPARQL,

2514
02:11:18,460 --> 02:11:20,860
but to get you to be kind
of literate in SPARQL.

2515
02:11:20,860 --> 02:11:23,980
To kind of understand why
this does what it does.

2516
02:11:23,980 --> 02:11:25,730
So let's run this query first.

2517
02:11:25,730 --> 02:11:31,390
We click Run and here I
have results at the bottom.

2518
02:11:31,390 --> 02:11:34,060
The item, which is
just a Wikidata item,

2519
02:11:34,060 --> 02:11:35,290
which of course is a number.

2520
02:11:35,290 --> 02:11:38,860
Remember, wiki data thinks
of items as queue numbers.

2521
02:11:38,860 --> 02:11:40,900
And the label,
because we're humans

2522
02:11:40,900 --> 02:11:43,190
and we prefer words to numbers.

2523
02:11:43,190 --> 02:11:49,870
So these 114 results
are all the cats

2524
02:11:49,870 --> 02:11:53,310
that wiki data knows about.

2525
02:11:53,310 --> 02:11:55,380
Is this all the
cats in the world?

2526
02:11:55,380 --> 02:11:57,320
No of course not, remember?

2527
02:11:57,320 --> 02:11:59,730
It's all the cats Wikidata
knows about, which

2528
02:11:59,730 --> 02:12:01,410
means they're somehow notable.

2529
02:12:01,410 --> 02:12:05,130
I mean someone bothered to
describe them on Wikidata.

2530
02:12:05,130 --> 02:12:12,570
And Wikidata was told this
item is an instance of cat.

2531
02:12:12,570 --> 02:12:13,620
Right?

2532
02:12:13,620 --> 02:12:17,040
So these are those cats.

2533
02:12:17,040 --> 02:12:18,540
And we can click any of them.

2534
02:12:18,540 --> 02:12:20,190
I don't know,
Pixel, for example.

2535
02:12:20,190 --> 02:12:21,780
Click the Wikipedia item.

2536
02:12:21,780 --> 02:12:24,090
And here is the Wikidata
item about Pixel

2537
02:12:24,090 --> 02:12:25,860
with the queue number.

2538
02:12:25,860 --> 02:12:28,980
And he is a tortoiseshell cat.

2539
02:12:28,980 --> 02:12:32,640
And as you can see
instance of cat.

2540
02:12:32,640 --> 02:12:33,610
OK.

2541
02:12:33,610 --> 02:12:37,220
And he is five inches high.

2542
02:12:37,220 --> 02:12:41,780
And he is apparently documented
in Indonesian, In Bahasa.

2543
02:12:41,780 --> 02:12:45,080
Right here this is Pixel.

2544
02:12:45,080 --> 02:12:50,060
And he is apparently somehow
related to the Guinness World

2545
02:12:50,060 --> 02:12:52,160
Records book.

2546
02:12:52,160 --> 02:12:54,650
I don't speak Bahasa, so
I don't know exactly why

2547
02:12:54,650 --> 02:12:56,120
this cat is so notable.

2548
02:12:56,120 --> 02:12:58,889
But, of course, cats
can become notable

2549
02:12:58,889 --> 02:12:59,930
for all kinds of reasons.

2550
02:12:59,930 --> 02:13:02,204
Maybe they're a
YouTube sensation,

2551
02:13:02,204 --> 02:13:03,620
you know, maybe
they were involved

2552
02:13:03,620 --> 02:13:05,330
in some historical event.

2553
02:13:05,330 --> 02:13:09,410
I like this cat named Gladstone.

2554
02:13:09,410 --> 02:13:16,590
This cat named Gladstone is--

2555
02:13:16,590 --> 02:13:19,950
he has position
held Chief Mouser

2556
02:13:19,950 --> 02:13:22,320
to Her Majesty's Treasury.

2557
02:13:22,320 --> 02:13:25,230
This is an official
cat with a job.

2558
02:13:25,230 --> 02:13:29,190
And he has been holding this
job, mind you, since the 28th

2559
02:13:29,190 --> 02:13:31,570
of June this past year.

2560
02:13:31,570 --> 02:13:32,970
That's the start time.

2561
02:13:32,970 --> 02:13:35,760
And there is no end time
which means he currently

2562
02:13:35,760 --> 02:13:38,850
holds the position
of Chief Mouser

2563
02:13:38,850 --> 02:13:40,470
to her Majesty's Treasury.

2564
02:13:40,470 --> 02:13:42,750
His employer is Her
Majesty's Treasury.

2565
02:13:42,750 --> 02:13:44,290
He's a male creature.

2566
02:13:44,290 --> 02:13:46,650
And Wikidata knows
that this cat is

2567
02:13:46,650 --> 02:13:53,127
named after William Gladstone,
the Victorian prime minister.

2568
02:13:53,127 --> 02:13:54,960
Of course if I don't
know who this person is

2569
02:13:54,960 --> 02:13:57,540
I can click through
and learn that he

2570
02:13:57,540 --> 02:14:01,860
was a liberal politician
and prime minister, right?

2571
02:14:01,860 --> 02:14:03,390
He even has a Twitter account.

2572
02:14:03,390 --> 02:14:05,910
And Wikidata sends
me right to it.

2573
02:14:05,910 --> 02:14:08,040
The treasury cat
Twitter account.

2574
02:14:08,040 --> 02:14:11,010
And he has articles in
German, and English,

2575
02:14:11,010 --> 02:14:15,520
and of course Japanese,
because he's a cat.

2576
02:14:15,520 --> 02:14:16,020
All right.

2577
02:14:16,020 --> 02:14:19,500
So this was a very simple query.

2578
02:14:19,500 --> 02:14:21,400
Let's find out why it works.

2579
02:14:21,400 --> 02:14:21,900
OK.

2580
02:14:21,900 --> 02:14:25,800
So what did we actually
tell Wikidata to do for us?

2581
02:14:25,800 --> 02:14:31,650
We said, please select
some items for us

2582
02:14:31,650 --> 02:14:33,580
along with their labels.

2583
02:14:33,580 --> 02:14:34,080
OK?

2584
02:14:34,080 --> 02:14:36,180
Along with their
human readable labels

2585
02:14:36,180 --> 02:14:42,010
because if I remove this
label what I get is, see,

2586
02:14:42,010 --> 02:14:44,200
just a list of item numbers.

2587
02:14:44,200 --> 02:14:45,280
That's not as fun.

2588
02:14:45,280 --> 02:14:46,930
So that's what this
little bit did.

2589
02:14:46,930 --> 02:14:49,630
I just said, give me the
items, but also they're

2590
02:14:49,630 --> 02:14:52,330
human readable label.

2591
02:14:52,330 --> 02:14:54,620
And I want you to
select a bunch of items,

2592
02:14:54,620 --> 02:14:56,770
but not just any
random bunch of items,

2593
02:14:56,770 --> 02:15:01,210
I want to select items where
a certain condition holds.

2594
02:15:01,210 --> 02:15:02,790
What is the condition?

2595
02:15:02,790 --> 02:15:06,430
The condition is that the
item that I want you to select

2596
02:15:06,430 --> 02:15:14,360
needs to have property
31 with a value of Q146.

2597
02:15:14,360 --> 02:15:15,670
Well, that's helpful.

2598
02:15:15,670 --> 02:15:18,070
If I hover over these numbers--

2599
02:15:18,070 --> 02:15:19,750
Again, I get the human
readable version.

2600
02:15:19,750 --> 02:15:23,530
So I'm looking for
items that have property

2601
02:15:23,530 --> 02:15:28,841
instance of with the value cat.

2602
02:15:28,841 --> 02:15:29,340
Right?

2603
02:15:29,340 --> 02:15:31,173
Because that's literally
what I want, right?

2604
02:15:31,173 --> 02:15:33,960
I want all the items that have
a property, a statement, that

2605
02:15:33,960 --> 02:15:36,840
says instance of cat.

2606
02:15:36,840 --> 02:15:37,950
That's the condition.

2607
02:15:37,950 --> 02:15:41,640
I'm not interested in items
that are instance of book,

2608
02:15:41,640 --> 02:15:43,200
or instance of human.

2609
02:15:43,200 --> 02:15:46,290
I'm interested in
instance of cat.

2610
02:15:46,290 --> 02:15:51,090
That is the only condition
here in this query.

2611
02:15:51,090 --> 02:15:55,800
This complicated line I ask
you to basically ignore.

2612
02:15:55,800 --> 02:15:57,510
This is one of those
sacrifices that we

2613
02:15:57,510 --> 02:16:00,720
make for using a standard
language like SPARQL.

2614
02:16:00,720 --> 02:16:02,820
But the role of this
complicated line

2615
02:16:02,820 --> 02:16:04,920
is to basically
ensure that we get

2616
02:16:04,920 --> 02:16:07,860
the English label for that cat.

2617
02:16:07,860 --> 02:16:08,817
OK?

2618
02:16:08,817 --> 02:16:09,900
So don't worry about that.

2619
02:16:09,900 --> 02:16:11,550
Just leave it there.

2620
02:16:11,550 --> 02:16:13,320
And we run the query
and we get the list

2621
02:16:13,320 --> 02:16:17,330
of cats with their English
labels, and that is awesome.

2622
02:16:17,330 --> 02:16:21,510
By the way, if I change EN,
without really understanding

2623
02:16:21,510 --> 02:16:27,260
this line, if I change
EN to HE, for Hebrew,

2624
02:16:27,260 --> 02:16:30,160
I get the same results
with a Hebrew label.

2625
02:16:30,160 --> 02:16:33,670
Of course, these cats,
nobody bothered to give them

2626
02:16:33,670 --> 02:16:35,709
Hebrew labels unfortunately.

2627
02:16:35,709 --> 02:16:37,570
So I get the queue number.

2628
02:16:37,570 --> 02:16:42,874
But if I changed
it to Japanese, JA,

2629
02:16:42,874 --> 02:16:45,290
I would get still a bunch of
queue numbers for where there

2630
02:16:45,290 --> 02:16:47,389
isn't a Japanese label,
but I would get the labels

2631
02:16:47,389 --> 02:16:48,781
in Japanese.

2632
02:16:48,781 --> 02:16:49,280
OK?

2633
02:16:49,280 --> 02:16:51,260
So this is an example
of how you don't even

2634
02:16:51,260 --> 02:16:54,620
need to understand all
the syntax of this query

2635
02:16:54,620 --> 02:16:56,100
to adapt it to your needs.

2636
02:16:56,100 --> 02:16:58,070
If you want this
query as is, but you

2637
02:16:58,070 --> 02:17:00,320
want the labels in
Japanese, you can just

2638
02:17:00,320 --> 02:17:03,190
change the language code here.

2639
02:17:03,190 --> 02:17:06,559
OK so that is all
this query does.

2640
02:17:06,559 --> 02:17:08,870
Again, just give
me the items that

2641
02:17:08,870 --> 02:17:17,590
have property 31, instance of,
with a value 146, which is cat.

2642
02:17:17,590 --> 02:17:20,379
Let's take a question just
about this very simple query

2643
02:17:20,379 --> 02:17:25,809
before we advance to
more complicated queries.

2644
02:17:25,809 --> 02:17:29,200
Any questions just about this?

2645
02:17:29,200 --> 02:17:32,850
Like, did anyone kind of
really lose me talking

2646
02:17:32,850 --> 02:17:35,010
about this simple query?

2647
02:17:35,010 --> 02:17:39,389
Again, this query just tells
Wikidata, get me all the items

2648
02:17:39,389 --> 02:17:41,280
that somewhere among
their statements

2649
02:17:41,280 --> 02:17:44,219
have instance of cat.

2650
02:17:44,219 --> 02:17:46,670
That's the only condition.

2651
02:17:46,670 --> 02:17:47,740
No questions.

2652
02:17:47,740 --> 02:17:49,959
OK, feel free to ask if
you'd come up with one.

2653
02:17:49,959 --> 02:17:54,709
So let's complicate
things a little.

2654
02:17:54,709 --> 02:17:59,365
Let's ask only for male cats.

2655
02:17:59,365 --> 02:18:02,080


2656
02:18:02,080 --> 02:18:03,070
OK.

2657
02:18:03,070 --> 02:18:07,330
Remember this cat
Gladstone is male,

2658
02:18:07,330 --> 02:18:09,850
and we know this because
he has a property called

2659
02:18:09,850 --> 02:18:14,320
sex or gender, and the value
is male creature, right?

2660
02:18:14,320 --> 02:18:17,950
So let's add another
condition right here

2661
02:18:17,950 --> 02:18:19,860
under the first condition.

2662
02:18:19,860 --> 02:18:20,870
OK?

2663
02:18:20,870 --> 02:18:22,750
This is a new line.

2664
02:18:22,750 --> 02:18:24,940
And I'm adding a new
condition to the query.

2665
02:18:24,940 --> 02:18:30,520
I'm saying, not only do I
want this item that you return

2666
02:18:30,520 --> 02:18:35,469
to be instance of cat, I
also want this same item

2667
02:18:35,469 --> 02:18:39,280
to have another property,
the property sex or gender.

2668
02:18:39,280 --> 02:18:40,299
Right?

2669
02:18:40,299 --> 02:18:43,480
And I need to refer to
the property by number.

2670
02:18:43,480 --> 02:18:45,760
But don't worry,
Wikidata will help you.

2671
02:18:45,760 --> 02:18:49,500
So you start with this
prefix, Wikidata WDDT.

2672
02:18:49,500 --> 02:18:52,520


2673
02:18:52,520 --> 02:18:54,980
Again, just ignore
that prefix it's

2674
02:18:54,980 --> 02:18:58,940
one of the features of SPARQL
that we need to respect.

2675
02:18:58,940 --> 02:19:02,715
WDT colon, and then I can
just type control space

2676
02:19:02,715 --> 02:19:04,340
to do a search, to
do an auto complete.

2677
02:19:04,340 --> 02:19:08,090
So I can just type sex
and Wikidata helpfully

2678
02:19:08,090 --> 02:19:11,760
offers me a drop down
with relevant properties.

2679
02:19:11,760 --> 02:19:15,200
So I click property 21, which
is the sex or gender property.

2680
02:19:15,200 --> 02:19:17,629
And then I say, so I want
the sex or gender property

2681
02:19:17,629 --> 02:19:19,670
to have the Wikidata value.

2682
02:19:19,670 --> 02:19:21,799
Again, control space.

2683
02:19:21,799 --> 02:19:25,340
And I can just
say male creature.

2684
02:19:25,340 --> 02:19:25,850
See?

2685
02:19:25,850 --> 02:19:30,950
There's a different item
for male, as inhuman,

2686
02:19:30,950 --> 02:19:33,799
and a different one for
male creature, for reasons

2687
02:19:33,799 --> 02:19:34,910
that we won't go into.

2688
02:19:34,910 --> 02:19:36,535
Let's pick male
creature, because we're

2689
02:19:36,535 --> 02:19:38,040
talking about cats here.

2690
02:19:38,040 --> 02:19:38,540
All right.

2691
02:19:38,540 --> 02:19:42,080
And add a period here at
the end and click Run.

2692
02:19:42,080 --> 02:19:48,330
And instead of 114 cats, we get,
this time, we got 43 results.

2693
02:19:48,330 --> 02:19:53,360
Including our friend Gladstone
who is a male creature cat.

2694
02:19:53,360 --> 02:19:58,530
So that means all the
rest are female, right?

2695
02:19:58,530 --> 02:20:00,410
Wrong.

2696
02:20:00,410 --> 02:20:00,980
Wrong.

2697
02:20:00,980 --> 02:20:02,840
That does not mean that at all.

2698
02:20:02,840 --> 02:20:06,530
What it means is of
the 114 items that

2699
02:20:06,530 --> 02:20:11,960
have instance of cat,
only 43 have explicitly

2700
02:20:11,960 --> 02:20:14,690
sex male creature.

2701
02:20:14,690 --> 02:20:17,570
The rest of them do not.

2702
02:20:17,570 --> 02:20:21,800
Maybe because they have
sex female creature,

2703
02:20:21,800 --> 02:20:25,930
but maybe because they don't
have that property at all.

2704
02:20:25,930 --> 02:20:28,290
I'm emphasizing
this to kind of help

2705
02:20:28,290 --> 02:20:31,770
you train yourself to
correctly interpret

2706
02:20:31,770 --> 02:20:34,140
the results of
queries from Wikidata.

2707
02:20:34,140 --> 02:20:36,870
Don't jump into this kind
of simplistic conclusion,

2708
02:20:36,870 --> 02:20:41,820
OK there's 114 total, 43 male,
therefore the rest are female.

2709
02:20:41,820 --> 02:20:43,520
That is not correct.

2710
02:20:43,520 --> 02:20:45,030
OK?

2711
02:20:45,030 --> 02:20:49,740
But 43 of those explicitly
had another statement, sex

2712
02:20:49,740 --> 02:20:52,530
or gender, male creature.

2713
02:20:52,530 --> 02:20:55,020
So I just added
another condition,

2714
02:20:55,020 --> 02:20:58,290
and now my query is
asking two separate things

2715
02:20:58,290 --> 02:21:00,150
about the results.

2716
02:21:00,150 --> 02:21:04,472
They need to be a cat
and a male creature.

2717
02:21:04,472 --> 02:21:06,270
AUDIENCE: Maybe we
should see how many

2718
02:21:06,270 --> 02:21:08,100
cats have Twitter accounts.

2719
02:21:08,100 --> 02:21:11,440
But there is a
question from YouTube,

2720
02:21:11,440 --> 02:21:14,220
which is will you talk about
the export possibilities

2721
02:21:14,220 --> 02:21:17,280
of the result of the query?

2722
02:21:17,280 --> 02:21:18,420
ASAF BARTOV: Absolutely.

2723
02:21:18,420 --> 02:21:21,000
Absolutely I will in
just a little bit.

2724
02:21:21,000 --> 02:21:23,010
I mean there is, in
addition to just getting

2725
02:21:23,010 --> 02:21:28,350
this kind of table, I can get
these results in other formats.

2726
02:21:28,350 --> 02:21:30,360
And I can also
download these results.

2727
02:21:30,360 --> 02:21:32,820
I can click the Download
button and get them

2728
02:21:32,820 --> 02:21:35,070
as a comma separated
file, tab separated

2729
02:21:35,070 --> 02:21:38,910
file, a JSON file, which is
useful for programmatic uses.

2730
02:21:38,910 --> 02:21:40,590
I can also get a link.

2731
02:21:40,590 --> 02:21:42,330
So I can get a
link to this query.

2732
02:21:42,330 --> 02:21:45,990
I mean, I spent all this time
designing this beautiful query.

2733
02:21:45,990 --> 02:21:50,280
I can get a short URL that was
generated especially for me

2734
02:21:50,280 --> 02:21:52,170
right now with a tiny URL.

2735
02:21:52,170 --> 02:21:54,690
I can just paste this
into Twitter and go,

2736
02:21:54,690 --> 02:21:59,280
hey people look at all the male
cats that Wikidata knows about.

2737
02:21:59,280 --> 02:22:01,170
OK, this is not a
very exciting query.

2738
02:22:01,170 --> 02:22:03,900
But once I get to a really
complicated exciting query

2739
02:22:03,900 --> 02:22:07,650
I can totally share that
very easily through this.

2740
02:22:07,650 --> 02:22:09,750
And we will get to more
interesting queries

2741
02:22:09,750 --> 02:22:11,740
in just a second.

2742
02:22:11,740 --> 02:22:16,400
Any questions on this kind
of basic querying so far?

2743
02:22:16,400 --> 02:22:17,940
OK.

2744
02:22:17,940 --> 02:22:25,340
So that was a very
simple example.

2745
02:22:25,340 --> 02:22:30,250
Let's spend a moment exploring.

2746
02:22:30,250 --> 02:22:38,920
So this cat Gladstone was
named after this dude, William

2747
02:22:38,920 --> 02:22:43,550
Gladstone, who was an
important British politician.

2748
02:22:43,550 --> 02:22:45,760
I'm sure he's not the
only thing out there

2749
02:22:45,760 --> 02:22:48,970
in the universe that's named
after Gladstone, right?

2750
02:22:48,970 --> 02:22:52,120
I mean there has got
to be, I don't know,

2751
02:22:52,120 --> 02:22:54,790
park benches,
planets, asteroids,

2752
02:22:54,790 --> 02:22:59,590
something other than the
cat, named after this guy.

2753
02:22:59,590 --> 02:23:04,030
So we can ask Wikidata
to tell us all the things

2754
02:23:04,030 --> 02:23:06,850
that, you know, without
saying instance of something.

2755
02:23:06,850 --> 02:23:10,960
Like, I don't know, anything
named after William Gladstone.

2756
02:23:10,960 --> 02:23:12,760
So how do I do that?

2757
02:23:12,760 --> 02:23:15,310
Same principle.

2758
02:23:15,310 --> 02:23:19,850
Instead of asking about the
property instance of, property

2759
02:23:19,850 --> 02:23:25,360
31, instead of that, I
will ask about the property

2760
02:23:25,360 --> 02:23:26,860
named after--

2761
02:23:26,860 --> 02:23:29,120
sorry, named after--

2762
02:23:29,120 --> 02:23:30,830
I don't need to
remember the number.

2763
02:23:30,830 --> 02:23:32,240
I have auto-complete.

2764
02:23:32,240 --> 02:23:35,360
Named after is property 138.

2765
02:23:35,360 --> 02:23:37,430
And I want anything
at all that is

2766
02:23:37,430 --> 02:23:42,080
named after this person,
William Gladstone.

2767
02:23:42,080 --> 02:23:43,850
Here we go.

2768
02:23:43,850 --> 02:23:45,860
Which is 160852.

2769
02:23:45,860 --> 02:23:46,820
Whatever.

2770
02:23:46,820 --> 02:23:48,230
OK.

2771
02:23:48,230 --> 02:23:50,510
You notice I removed
instance of cat.

2772
02:23:50,510 --> 02:23:52,040
I remove the male creature.

2773
02:23:52,040 --> 02:23:55,130
I'm only asking,
get me all the items

2774
02:23:55,130 --> 02:23:58,940
that are somehow named after
that particular politician.

2775
02:23:58,940 --> 02:24:00,920
And I run the query,
and it turns out

2776
02:24:00,920 --> 02:24:05,007
the Wikidata knows
about three such things.

2777
02:24:05,007 --> 02:24:06,590
Does that mean that's
the only-- these

2778
02:24:06,590 --> 02:24:08,881
are the only three things
named after him in the world?

2779
02:24:08,881 --> 02:24:09,939
Of course not.

2780
02:24:09,939 --> 02:24:12,230
But these are the only three
items that are in Wikidata

2781
02:24:12,230 --> 02:24:17,720
and explicitly have the
property named after Gladstone.

2782
02:24:17,720 --> 02:24:20,150
For all I know, there
may be a village

2783
02:24:20,150 --> 02:24:23,600
in England called Gladstone
named after this person.

2784
02:24:23,600 --> 02:24:27,410
But if nobody added the
property, named after, linking

2785
02:24:27,410 --> 02:24:30,950
to the person, he wouldn't show
up in the results to my query.

2786
02:24:30,950 --> 02:24:33,750
So Wikidata knows about
three such things.

2787
02:24:33,750 --> 02:24:36,110
One of them is something
called the Gladstone Professor

2788
02:24:36,110 --> 02:24:37,360
of Government.

2789
02:24:37,360 --> 02:24:40,370
I can click through and see
that it's a chair at Oxford

2790
02:24:40,370 --> 02:24:41,180
University, right?

2791
02:24:41,180 --> 02:24:43,470
So it's a position.

2792
02:24:43,470 --> 02:24:49,520
And another is the William
Gladstone school number 18.

2793
02:24:49,520 --> 02:24:51,470
William Gladstone
school number 18.

2794
02:24:51,470 --> 02:24:52,900
Where is that?

2795
02:24:52,900 --> 02:24:55,380
That is in Sofia, Bulgaria.

2796
02:24:55,380 --> 02:24:56,470
Again.

2797
02:24:56,470 --> 02:24:59,000
All right, so that's a
particular school in Bulgaria

2798
02:24:59,000 --> 02:25:02,720
named after William Gladstone.

2799
02:25:02,720 --> 02:25:07,220
And finally, the third
result is, of course, our pal

2800
02:25:07,220 --> 02:25:09,800
Gladstone the Cheif Mouser.

2801
02:25:09,800 --> 02:25:12,674
If I click through,
that's the cat.

2802
02:25:12,674 --> 02:25:14,090
All right, so that
was an example.

2803
02:25:14,090 --> 02:25:15,700
I mean, you saw how easy it was.

2804
02:25:15,700 --> 02:25:18,980
I just named the property and
the value that I care about,

2805
02:25:18,980 --> 02:25:21,420
and I get the results.

2806
02:25:21,420 --> 02:25:23,289
Again, I mean, it's
kind of a silly example,

2807
02:25:23,289 --> 02:25:24,080
but think about it.

2808
02:25:24,080 --> 02:25:27,570
This is-- how else can
you answer that question?

2809
02:25:27,570 --> 02:25:30,470
There's no reference desk,
even at a great University

2810
02:25:30,470 --> 02:25:34,250
of Oxford, where you can
walk in and say, give me

2811
02:25:34,250 --> 02:25:37,470
a list of things
named after Gladstone.

2812
02:25:37,470 --> 02:25:40,590
There's no easy way to
answer that unless you happen

2813
02:25:40,590 --> 02:25:44,520
to have a very large
structured and linked

2814
02:25:44,520 --> 02:25:48,130
data store, like Wikidata.

2815
02:25:48,130 --> 02:25:50,560
All right, so that
was a silly example.

2816
02:25:50,560 --> 02:25:51,280
Let's take some--

2817
02:25:51,280 --> 02:25:53,113
AUDIENCE: There's a
bunch of stuff on there.

2818
02:25:53,113 --> 02:25:54,446
ASAF: Oh, OK.

2819
02:25:54,446 --> 02:25:57,430
AUDIENCE: Can you show
easy query on the video?

2820
02:25:57,430 --> 02:26:02,260
And somebody needs to know
how to just do property

2821
02:26:02,260 --> 02:26:05,750
exists without giving
a specific value.

2822
02:26:05,750 --> 02:26:11,030
And then once you show easy
query you reload the page and--

2823
02:26:11,030 --> 02:26:13,240
ASAF: I don't know easy query.

2824
02:26:13,240 --> 02:26:15,670
So is that a gadget?

2825
02:26:15,670 --> 02:26:17,110
I don't know what easy query is.

2826
02:26:17,110 --> 02:26:19,870
I don't use it.

2827
02:26:19,870 --> 02:26:24,760
So someone can maybe
send a link or something?

2828
02:26:24,760 --> 02:26:26,100
Oh it is a gadget.

2829
02:26:26,100 --> 02:26:27,100
I don't have it enabled.

2830
02:26:27,100 --> 02:26:31,610


2831
02:26:31,610 --> 02:26:32,480
That is nice.

2832
02:26:32,480 --> 02:26:42,080
So now, what I just did by hand,
by formulating the query named

2833
02:26:42,080 --> 02:26:45,200
after Gladstone--

2834
02:26:45,200 --> 02:26:48,390
I guess this is the--

2835
02:26:48,390 --> 02:26:48,960
Is it?

2836
02:26:48,960 --> 02:26:53,000


2837
02:26:53,000 --> 02:26:53,720
Yeah.

2838
02:26:53,720 --> 02:26:56,050
So this-- I just
clicked the three--

2839
02:26:56,050 --> 02:26:57,470
the ellipsis here.

2840
02:26:57,470 --> 02:26:58,460
Right after the name.

2841
02:26:58,460 --> 02:26:59,630
You see this?

2842
02:26:59,630 --> 02:27:03,050
This was just added by
enabling easy query,

2843
02:27:03,050 --> 02:27:04,640
which I just learned about.

2844
02:27:04,640 --> 02:27:07,640
So you just click this
and it auto-magically

2845
02:27:07,640 --> 02:27:09,620
made this kind of trivial query.

2846
02:27:09,620 --> 02:27:12,380
Of course, if I want a more
complicated query like,

2847
02:27:12,380 --> 02:27:14,510
I don't know, give me
all the things that

2848
02:27:14,510 --> 02:27:18,110
are named after Lincoln
but are a school,

2849
02:27:18,110 --> 02:27:21,650
I will still need to kind
of edit a custom query.

2850
02:27:21,650 --> 02:27:23,450
But this is a super
easy and very nice

2851
02:27:23,450 --> 02:27:28,620
way of just doing a very super
quick query for exactly this.

2852
02:27:28,620 --> 02:27:29,120
Right?

2853
02:27:29,120 --> 02:27:33,410
Like. what other items have
exactly this property and value

2854
02:27:33,410 --> 02:27:35,720
named after William Gladstone?

2855
02:27:35,720 --> 02:27:38,750
So, thank you to whoever
made this suggestion

2856
02:27:38,750 --> 02:27:42,140
to demonstrate that, and
I'm glad I learned something

2857
02:27:42,140 --> 02:27:45,230
too today.

2858
02:27:45,230 --> 02:27:48,590
Let's move to
another sample query.

2859
02:27:48,590 --> 02:27:50,360
Here's a fun example.

2860
02:27:50,360 --> 02:27:56,910
Popular surnames among
fictional characters.

2861
02:27:56,910 --> 02:27:58,650
Think about that for a second.

2862
02:27:58,650 --> 02:28:03,030
Popular surnames among
fictional characters.

2863
02:28:03,030 --> 02:28:06,510
So we're asking Wikidata
to go through all

2864
02:28:06,510 --> 02:28:10,120
the fictional
characters you know,

2865
02:28:10,120 --> 02:28:13,510
and of those look through
their surnames, group

2866
02:28:13,510 --> 02:28:15,910
them so that you can count
them, the repetitions

2867
02:28:15,910 --> 02:28:18,460
of the surnames,
and give me the most

2868
02:28:18,460 --> 02:28:21,550
popular surnames among them.

2869
02:28:21,550 --> 02:28:26,280
Additionally, I want you to
awesomely present the results

2870
02:28:26,280 --> 02:28:28,020
as a bubble chart.

2871
02:28:28,020 --> 02:28:29,220
Oh, yeah.

2872
02:28:29,220 --> 02:28:31,050
Wikidata can do that.

2873
02:28:31,050 --> 02:28:34,420
And I run the query.

2874
02:28:34,420 --> 02:28:36,750
And check it out.

2875
02:28:36,750 --> 02:28:41,130
The most popular names
among fictional characters

2876
02:28:41,130 --> 02:28:45,780
we can say that knows about are
Joan, Smith, Taylor, et cetera.

2877
02:28:45,780 --> 02:28:48,450
I mean for all we know,
the most popular name

2878
02:28:48,450 --> 02:28:50,770
among fictional characters
actually in the world

2879
02:28:50,770 --> 02:28:52,350
may be Wu.

2880
02:28:52,350 --> 02:28:54,790
Or something in Chinese
for all we know.

2881
02:28:54,790 --> 02:28:57,930
But if that has not been
modeled in Wikidata,

2882
02:28:57,930 --> 02:29:01,020
we're not going to get that.

2883
02:29:01,020 --> 02:29:03,540
So Taylor, Smith,
Jones, Williams,

2884
02:29:03,540 --> 02:29:06,870
seem to be the
most popular names.

2885
02:29:06,870 --> 02:29:08,400
And again, I could limit this.

2886
02:29:08,400 --> 02:29:11,520
I could make the
same query but add,

2887
02:29:11,520 --> 02:29:14,250
only among works whose
original language

2888
02:29:14,250 --> 02:29:19,020
was Italian, for example, to get
more interesting results if I

2889
02:29:19,020 --> 02:29:21,480
only care about
Italian literature.

2890
02:29:21,480 --> 02:29:24,720
But this is an example of
how I got awesome bubble

2891
02:29:24,720 --> 02:29:28,170
charts for free, and
I can just plug this

2892
02:29:28,170 --> 02:29:30,900
into an awesome
presentation that I make.

2893
02:29:30,900 --> 02:29:34,500
Of course I can still
look at the raw table.

2894
02:29:34,500 --> 02:29:37,940
So the query still resulted
in a bunch of data, right?

2895
02:29:37,940 --> 02:29:42,480
So Smith repeats 41 times,
Jones 38 times, Taylor 34 times,

2896
02:29:42,480 --> 02:29:43,750
et cetera, et cetera.

2897
02:29:43,750 --> 02:29:48,960
And down that list.

2898
02:29:48,960 --> 02:29:52,320
And I could, again, I could
export this into a file

2899
02:29:52,320 --> 02:29:56,100
and load it up in a spreadsheet,
and do additional processing

2900
02:29:56,100 --> 02:29:56,670
on it.

2901
02:29:56,670 --> 02:29:58,560
I can link to it.

2902
02:29:58,560 --> 02:30:02,530
I can do all kinds of
awesome things with it.

2903
02:30:02,530 --> 02:30:05,250
So that's another awesome query.

2904
02:30:05,250 --> 02:30:08,460
We don't have to go into
every line by line analysis

2905
02:30:08,460 --> 02:30:11,670
here of why this
works the way it does.

2906
02:30:11,670 --> 02:30:15,840
I want to show you some
other queries first.

2907
02:30:15,840 --> 02:30:22,470
Let's look at-- this is just
fun, overall causes of death.

2908
02:30:22,470 --> 02:30:24,870
Again a bubble
chart just looking

2909
02:30:24,870 --> 02:30:28,260
at people who died
of things, and have

2910
02:30:28,260 --> 02:30:30,760
a cause of death listed.

2911
02:30:30,760 --> 02:30:34,380
And we learn that the most
commonly listed cause of death

2912
02:30:34,380 --> 02:30:40,350
is myocardial infarction,
pneumonitis, cerebral vascular,

2913
02:30:40,350 --> 02:30:42,620
lung cancer, et
cetera, et cetera.

2914
02:30:42,620 --> 02:30:44,850
And again, in a bubble chart.

2915
02:30:44,850 --> 02:30:49,670
And so how does that work?

2916
02:30:49,670 --> 02:30:53,050
So just very briefly, the
important parts of this query

2917
02:30:53,050 --> 02:30:59,150
are I'm looking for something,
for some person, who

2918
02:30:59,150 --> 02:31:04,240
is instance of 31, instance
of Q5, which is human.

2919
02:31:04,240 --> 02:31:05,390
So a human.

2920
02:31:05,390 --> 02:31:07,130
Again, just to kind
of limit the query.

2921
02:31:07,130 --> 02:31:11,330
I'm not interested in
books or mountains.

2922
02:31:11,330 --> 02:31:14,420
I'm looking for humans
who have that same person,

2923
02:31:14,420 --> 02:31:21,150
that same variable PID,
should have a 509, meaning--

2924
02:31:21,150 --> 02:31:22,412
Hello.

2925
02:31:22,412 --> 02:31:24,620
Why don't I have the--

2926
02:31:24,620 --> 02:31:25,120
Yeah.

2927
02:31:25,120 --> 02:31:28,480
A 509, which is cause of death.

2928
02:31:28,480 --> 02:31:31,540
And that cause of death
is another variable,

2929
02:31:31,540 --> 02:31:32,930
that I'm calling CID.

2930
02:31:32,930 --> 02:31:35,410
Now, previously
we were saying you

2931
02:31:35,410 --> 02:31:36,850
know I want things
that are named

2932
02:31:36,850 --> 02:31:39,550
after Gladstone specifically.

2933
02:31:39,550 --> 02:31:42,000
Only things that have
that particular value.

2934
02:31:42,000 --> 02:31:44,320
Here I'm saying I'm
looking for things

2935
02:31:44,320 --> 02:31:47,110
that have some cause of death.

2936
02:31:47,110 --> 02:31:48,760
Not a specific one.

2937
02:31:48,760 --> 02:31:50,260
I just wanted to
get everything that

2938
02:31:50,260 --> 02:31:54,880
has a statement with some
value about property 509

2939
02:31:54,880 --> 02:31:56,530
cause of death.

2940
02:31:56,530 --> 02:31:57,940
OK?

2941
02:31:57,940 --> 02:32:04,410
And then this other bit of
magic here, the group by,

2942
02:32:04,410 --> 02:32:07,870
tells Wikidata I'm not
actually interested

2943
02:32:07,870 --> 02:32:09,100
in every individual thing.

2944
02:32:09,100 --> 02:32:12,310
I want you to group those
causes, and then count them

2945
02:32:12,310 --> 02:32:14,230
and give me the top ones.

2946
02:32:14,230 --> 02:32:15,523
So that's how this query works.

2947
02:32:15,523 --> 02:32:20,550


2948
02:32:20,550 --> 02:32:22,320
Here's that query I promised.

2949
02:32:22,320 --> 02:32:26,460
Painters whose fathers
were also painters.

2950
02:32:26,460 --> 02:32:28,630
I can only think of a couple.

2951
02:32:28,630 --> 02:32:31,890
I mean, Monet and Vogel.

2952
02:32:31,890 --> 02:32:34,800
But I'm sure Wikidata
knows many more.

2953
02:32:34,800 --> 02:32:38,620
So let's run this query.

2954
02:32:38,620 --> 02:32:40,270
And I have 100 results.

2955
02:32:40,270 --> 02:32:43,120
By the way, I have limited
it to 100 results just

2956
02:32:43,120 --> 02:32:44,650
to keep it kind of snappy.

2957
02:32:44,650 --> 02:32:47,530
But actually, we could
maybe try removing the limit

2958
02:32:47,530 --> 02:32:50,170
and see if Wikidata
could tell us

2959
02:32:50,170 --> 02:32:53,890
the total number in Wikidata.

2960
02:32:53,890 --> 02:32:55,120
Yeah, that wasn't too bad.

2961
02:32:55,120 --> 02:32:58,400
So 1,270 results.

2962
02:32:58,400 --> 02:32:59,140
OK.

2963
02:32:59,140 --> 02:33:04,150
Wikidata, already at this
early date and it's progress,

2964
02:33:04,150 --> 02:33:07,540
already knows about
more than 1,200 painters

2965
02:33:07,540 --> 02:33:10,980
who are sons of painters.

2966
02:33:10,980 --> 02:33:16,140
Sons of male painters, like
their father is a painter.

2967
02:33:16,140 --> 02:33:18,120
There may be
additional painters who

2968
02:33:18,120 --> 02:33:21,390
are sons of female painters
not included in this query.

2969
02:33:21,390 --> 02:33:24,990
Again, always remember what
exactly you are asking.

2970
02:33:24,990 --> 02:33:27,840
In this query I was
asking about the father.

2971
02:33:27,840 --> 02:33:30,330
I'm leaving out any
possible painters who

2972
02:33:30,330 --> 02:33:32,720
are sons of mother painters.

2973
02:33:32,720 --> 02:33:33,390
OK?

2974
02:33:33,390 --> 02:33:35,250
So how does this work?

2975
02:33:35,250 --> 02:33:39,630
I'm asking for the painter
along with the human label,

2976
02:33:39,630 --> 02:33:42,630
and the father along
with the human label.

2977
02:33:42,630 --> 02:33:47,610
So Michel Monet is the
son of Claude Monet.

2978
02:33:47,610 --> 02:33:54,180
And Domenico Tintoretto is the
son of the famous Tintoretto

2979
02:33:54,180 --> 02:33:57,210
whose label, you know, is just
Tintoretto like Michelangelo.

2980
02:33:57,210 --> 02:33:59,960
You know, you don't always
have to have the full name

2981
02:33:59,960 --> 02:34:02,420
in the common label.

2982
02:34:02,420 --> 02:34:07,010
Paloma Picasso is the
daughter of Pablo Picasso.

2983
02:34:07,010 --> 02:34:07,510
OK.

2984
02:34:07,510 --> 02:34:11,040
So Wikidata knows about
all these results.

2985
02:34:11,040 --> 02:34:14,610
Of course Holbein the Younger
son of Holbein the Elder.

2986
02:34:14,610 --> 02:34:15,760
And how did we get there?

2987
02:34:15,760 --> 02:34:20,860
Well we asked Wikidata
to look for something,

2988
02:34:20,860 --> 02:34:26,820
let's call it painter, which
has 106, which is occupation,

2989
02:34:26,820 --> 02:34:31,100
with a value painter.

2990
02:34:31,100 --> 02:34:31,600
Right?

2991
02:34:31,600 --> 02:34:35,310
This unwieldy number
1028181, that's painter.

2992
02:34:35,310 --> 02:34:40,250
So I'm asking for any item
that has occupation painter.

2993
02:34:40,250 --> 02:34:43,300
And let's call
that item painter.

2994
02:34:43,300 --> 02:34:49,770
I also want that painter to have
a property 22, which is father.

2995
02:34:49,770 --> 02:34:50,850
OK.

2996
02:34:50,850 --> 02:34:52,350
Father.

2997
02:34:52,350 --> 02:34:55,140
And I want it to
have some value.

2998
02:34:55,140 --> 02:34:58,770
OK, I'm putting it into
another variable called father.

2999
02:34:58,770 --> 02:35:01,320
I could have called
it, you know, frog.

3000
02:35:01,320 --> 02:35:04,230
That doesn't change
anything, just to be clear.

3001
02:35:04,230 --> 02:35:06,630
What matters is that this
is the property father.

3002
02:35:06,630 --> 02:35:10,320
I could have called
it anything I want.

3003
02:35:10,320 --> 02:35:13,590
So, and then, I have
a third condition.

3004
02:35:13,590 --> 02:35:18,010
That the father, like whatever
it says here in property 22,

3005
02:35:18,010 --> 02:35:22,590
I want that father to have
himself a property 106

3006
02:35:22,590 --> 02:35:27,750
occupation with a value painter.

3007
02:35:27,750 --> 02:35:28,730
OK?

3008
02:35:28,730 --> 02:35:30,800
These conditions
combined to give me

3009
02:35:30,800 --> 02:35:36,080
a list of people who have
a father and that father

3010
02:35:36,080 --> 02:35:37,850
has occupation painter as well.

3011
02:35:37,850 --> 02:35:40,550
Of course, if I suddenly,
or if you suddenly,

3012
02:35:40,550 --> 02:35:44,480
are consumed by
curiosity to know

3013
02:35:44,480 --> 02:35:51,344
who are some politicians
who are sons of carpenters?

3014
02:35:51,344 --> 02:35:52,760
You could just
change that, right?

3015
02:35:52,760 --> 02:35:56,700
Change the first value
from painter to politician.

3016
02:35:56,700 --> 02:36:02,624
Change the third line's value
from painter to carpenter.

3017
02:36:02,624 --> 02:36:04,040
Maybe that list
will be very short

3018
02:36:04,040 --> 02:36:06,680
because carpenters don't
tend to be notable,

3019
02:36:06,680 --> 02:36:08,910
so they wouldn't be
represented on Wikidata.

3020
02:36:08,910 --> 02:36:11,990
That's why this works relatively
well with painters, right?

3021
02:36:11,990 --> 02:36:14,420
Because most of
them are notable.

3022
02:36:14,420 --> 02:36:16,370
But generally you
could do that, right?

3023
02:36:16,370 --> 02:36:18,500
That's an example of
how you can take a query

3024
02:36:18,500 --> 02:36:22,340
and just replace one of those
values, or even the language.

3025
02:36:22,340 --> 02:36:26,840
So again, I could ask
for these same painters.

3026
02:36:26,840 --> 02:36:27,650
It's limited again.

3027
02:36:27,650 --> 02:36:31,190
These same painters,
but with Arabic labels.

3028
02:36:31,190 --> 02:36:34,880
Same query, but I have Arabic
labels for these painters.

3029
02:36:34,880 --> 02:36:37,250
And of course where
there is no Arabic label

3030
02:36:37,250 --> 02:36:40,360
I get the queue number.

3031
02:36:40,360 --> 02:36:40,860
OK?

3032
02:36:40,860 --> 02:36:43,650
So that's that query
that I promised you,

3033
02:36:43,650 --> 02:36:47,670
painters who sons of painters
can be done by Wikidata

3034
02:36:47,670 --> 02:36:49,830
in under one second.

3035
02:36:49,830 --> 02:36:51,480
How awesome is that?

3036
02:36:51,480 --> 02:36:52,950
We can also get some statistics.

3037
02:36:52,950 --> 02:36:55,920
So how about counting
total articles

3038
02:36:55,920 --> 02:36:59,740
in a given wiki by gender.

3039
02:36:59,740 --> 02:37:02,070
This is what we call
the content gender

3040
02:37:02,070 --> 02:37:06,900
gap, as distinct from the
participation gender gap.

3041
02:37:06,900 --> 02:37:10,276
This is the gender gap in
what we cover on Wikipedia.

3042
02:37:10,276 --> 02:37:11,400
So let's take one of these.

3043
02:37:11,400 --> 02:37:16,380


3044
02:37:16,380 --> 02:37:17,630
So this is a query.

3045
02:37:17,630 --> 02:37:23,130
Articles about women in
some given Wikipedia.

3046
02:37:23,130 --> 02:37:23,660
All right.

3047
02:37:23,660 --> 02:37:25,799
So let's take--

3048
02:37:25,799 --> 02:37:26,340
I don't know.

3049
02:37:26,340 --> 02:37:30,240
Let's take the Tamil Wikipedia.

3050
02:37:30,240 --> 02:37:32,460
That's language code TA.

3051
02:37:32,460 --> 02:37:34,950
So I just put TA here.

3052
02:37:34,950 --> 02:37:38,850
And I click Run, and
I get this count.

3053
02:37:38,850 --> 02:37:39,960
That's all I wanted.

3054
02:37:39,960 --> 02:37:41,720
I'm not actually
interested in the items,

3055
02:37:41,720 --> 02:37:44,962
like in the list of women
on the Tamil Wikipedia.

3056
02:37:44,962 --> 02:37:45,920
I just want the number.

3057
02:37:45,920 --> 02:37:48,510
So I selected the count here.

3058
02:37:48,510 --> 02:37:52,610
And this number
turns out to be 2159.

3059
02:37:52,610 --> 02:37:57,300
So there are 2000
articles about women

3060
02:37:57,300 --> 02:38:02,350
the Tamil Wikipedia that
Wikidata knows to be female.

3061
02:38:02,350 --> 02:38:02,850
Right?

3062
02:38:02,850 --> 02:38:05,730
I'm asking about the gender
field, property 21 again.

3063
02:38:05,730 --> 02:38:08,900
Remember, if there's some
article about a woman in Tamil

3064
02:38:08,900 --> 02:38:12,090
Wikipedia, but wiki
data doesn't have

3065
02:38:12,090 --> 02:38:14,460
a statement about the
gender, that person

3066
02:38:14,460 --> 02:38:15,640
will not be counted here.

3067
02:38:15,640 --> 02:38:18,240
So again, be careful
about kind of stating

3068
02:38:18,240 --> 02:38:22,800
that is exactly the number
of women articles on Tamil

3069
02:38:22,800 --> 02:38:23,340
Wikipedia.

3070
02:38:23,340 --> 02:38:24,600
That's probably not true.

3071
02:38:24,600 --> 02:38:27,560
I'm sure some of those
articles are missing

3072
02:38:27,560 --> 02:38:30,740
a sex or gender or property.

3073
02:38:30,740 --> 02:38:33,150
But for raw statistics,
that's probably good,

3074
02:38:33,150 --> 02:38:35,700
because some men are also
missing the sex or gender

3075
02:38:35,700 --> 02:38:37,620
statistic property.

3076
02:38:37,620 --> 02:38:41,820
So we could take the
same query for men.

3077
02:38:41,820 --> 02:38:43,170
It's essentially the exact same.

3078
02:38:43,170 --> 02:38:48,840
It just has this unwieldy
number for males, 6581097.

3079
02:38:48,840 --> 02:38:52,710
I can change this language
code again to TA for Tamil.

3080
02:38:52,710 --> 02:38:58,880
And how many men are covered
on Tamil Wikipedia 14,649.

3081
02:38:58,880 --> 02:38:59,610
OK.

3082
02:38:59,610 --> 02:39:06,880
So women, 2,100, men,
about seven times as many.

3083
02:39:06,880 --> 02:39:07,380
Right?

3084
02:39:07,380 --> 02:39:12,300
So that's the approximate
size of the content gender

3085
02:39:12,300 --> 02:39:14,610
gap on Tamil Wikipedia.

3086
02:39:14,610 --> 02:39:18,850
And again, I can complicate
this query as much as I want.

3087
02:39:18,850 --> 02:39:21,390
For example, I can
try and find out

3088
02:39:21,390 --> 02:39:30,390
if this gender gap is wider
or narrower among musicians,

3089
02:39:30,390 --> 02:39:31,350
just as an example.

3090
02:39:31,350 --> 02:39:35,850
I could just add a line here
that says occupation musician,

3091
02:39:35,850 --> 02:39:37,890
and then I'm only
counting articles

3092
02:39:37,890 --> 02:39:41,190
on Tamil Wikipedia about
musicians who are female

3093
02:39:41,190 --> 02:39:43,190
versus articles
on Tamil Wikipedia

3094
02:39:43,190 --> 02:39:45,030
about musicians who are male.

3095
02:39:45,030 --> 02:39:47,890
And I can kind of
compare the gender--

3096
02:39:47,890 --> 02:39:53,820
the content gender gap across
occupations on Tamil Wikipedia.

3097
02:39:53,820 --> 02:39:56,030
Do you see the
important point here?

3098
02:39:56,030 --> 02:39:58,490
Is that this is not just
kind of a one purpose query.

3099
02:39:58,490 --> 02:40:01,250
I can just with a single
additional conditional suddenly

3100
02:40:01,250 --> 02:40:04,370
make it a much more interesting
query, because I break it down

3101
02:40:04,370 --> 02:40:05,540
by occupation.

3102
02:40:05,540 --> 02:40:07,810
Or I break it down by century.

3103
02:40:07,810 --> 02:40:12,530
Do we have more of the coverage
gap in 19th century people

3104
02:40:12,530 --> 02:40:13,940
than in 21st century people?

3105
02:40:13,940 --> 02:40:15,560
I mean, I sure hope so, right?

3106
02:40:15,560 --> 02:40:18,480
The patriarchy is
weakening somewhat.

3107
02:40:18,480 --> 02:40:21,830
So I wouldn't be surprised if
there are many more notable men

3108
02:40:21,830 --> 02:40:23,430
covered about the 19th century.

3109
02:40:23,430 --> 02:40:25,784
But if we are also covering--

3110
02:40:25,784 --> 02:40:27,200
I mean it's the
gender gap is just

3111
02:40:27,200 --> 02:40:29,540
as wide for 21st century
people, that would

3112
02:40:29,540 --> 02:40:30,800
be a little disappointing.

3113
02:40:30,800 --> 02:40:35,870
Again that's something I
can fairly easily find out

3114
02:40:35,870 --> 02:40:38,980
on Wikidata query.

3115
02:40:38,980 --> 02:40:41,500
Any questions so far, or
are you just sharing links?

3116
02:40:41,500 --> 02:40:43,160
AUDIENCE: Yep there is one.

3117
02:40:43,160 --> 02:40:47,480
So somebody is wondering if you
can demonstrate, or at least

3118
02:40:47,480 --> 02:40:50,420
give a short answer of the
latter of this question.

3119
02:40:50,420 --> 02:40:52,530
Is it possible using
in Wikidata SPARQL

3120
02:40:52,530 --> 02:40:55,520
to find specific
Wikidata articles, e.g.

3121
02:40:55,520 --> 02:40:59,060
featured articles, of a
certain language which do not

3122
02:40:59,060 --> 02:41:01,160
exist in another language.

3123
02:41:01,160 --> 02:41:03,770
I know it is possible
to find category based

3124
02:41:03,770 --> 02:41:05,820
results using a PET scan tool.

3125
02:41:05,820 --> 02:41:09,110
But can we specify
that by selecting e.g.

3126
02:41:09,110 --> 02:41:10,055
featured articles?

3127
02:41:10,055 --> 02:41:11,390
ASAF BARTOV: Yes.

3128
02:41:11,390 --> 02:41:12,600
Excellent question.

3129
02:41:12,600 --> 02:41:14,120
It is possible, indeed.

3130
02:41:14,120 --> 02:41:17,570
And I will demonstrate
one such query.

3131
02:41:17,570 --> 02:41:19,190
Another query that
I already mentioned

3132
02:41:19,190 --> 02:41:24,840
largest cities in the
world with a female mayor.

3133
02:41:24,840 --> 02:41:29,190
This query-- let's
close some of these tabs

3134
02:41:29,190 --> 02:41:30,315
before my browser chokes.

3135
02:41:30,315 --> 02:41:33,600


3136
02:41:33,600 --> 02:41:36,840
So this query lists
the major world cities

3137
02:41:36,840 --> 02:41:39,120
run by women currently.

3138
02:41:39,120 --> 02:41:45,650
And the answer is Mumbai, Mexico
City, Tokyo, bunch of others.

3139
02:41:45,650 --> 02:41:49,470


3140
02:41:49,470 --> 02:41:52,371
And wait-- that's not it at all.

3141
02:41:52,371 --> 02:41:53,370
I clicked the wrong one.

3142
02:41:53,370 --> 02:41:55,050
That's the map of paintings.

3143
02:41:55,050 --> 02:41:55,800
OK.

3144
02:41:55,800 --> 02:41:57,370
Let's demonstrate
that for a second.

3145
02:41:57,370 --> 02:41:59,520
So this is the map
of all paintings

3146
02:41:59,520 --> 02:42:03,870
for which we know a location
with the count per location.

3147
02:42:03,870 --> 02:42:07,770
And the results are
awesomely presented on a map.

3148
02:42:07,770 --> 02:42:08,830
OK.

3149
02:42:08,830 --> 02:42:12,420
Again, under the hood this is
a table, of course, of results.

3150
02:42:12,420 --> 02:42:15,660
But, awesomely, I can
browse it as a map.

3151
02:42:15,660 --> 02:42:20,320
So here is a map of the
world with all the paintings

3152
02:42:20,320 --> 02:42:22,060
that Wikidata knows about.

3153
02:42:22,060 --> 02:42:23,920
Not just knows
about the paintings,

3154
02:42:23,920 --> 02:42:28,180
but knows about their
location in a museum.

3155
02:42:28,180 --> 02:42:30,670
Not surprisingly
Europe is much better

3156
02:42:30,670 --> 02:42:35,540
covered than Russia or Africa.

3157
02:42:35,540 --> 02:42:40,150
There is a huge gap in
contribution to Wikidata

3158
02:42:40,150 --> 02:42:41,740
from these countries.

3159
02:42:41,740 --> 02:42:43,780
And some of it can be fixed.

3160
02:42:43,780 --> 02:42:47,740
And of course there is much more
documentation, and much more

3161
02:42:47,740 --> 02:42:50,260
art in Europe.

3162
02:42:50,260 --> 02:42:54,280
But if we zoom in, I
don't know, Rome probably

3163
02:42:54,280 --> 02:42:55,900
has a few paintings.

3164
02:42:55,900 --> 02:42:56,400
Right?

3165
02:42:56,400 --> 02:43:00,080


3166
02:43:00,080 --> 02:43:02,288
Hello.

3167
02:43:02,288 --> 02:43:04,200
Sorry.

3168
02:43:04,200 --> 02:43:09,780
It's-- Yes.

3169
02:43:09,780 --> 02:43:13,290
Vatican City sounds
like a good bet, right?

3170
02:43:13,290 --> 02:43:14,290
I can zoom in here.

3171
02:43:14,290 --> 02:43:16,290
And I can just click
one of these dots

3172
02:43:16,290 --> 02:43:21,400
and see in this point
there are two paintings.

3173
02:43:21,400 --> 02:43:25,270
And in this one there is one
and it's the Archbasilica

3174
02:43:25,270 --> 02:43:27,460
of St. John Lateran.

3175
02:43:27,460 --> 02:43:31,060
Let's see, this is the
actual St. Peter, right?

3176
02:43:31,060 --> 02:43:33,650
Sistine Chapel has 23 paintings.

3177
02:43:33,650 --> 02:43:34,330
What?

3178
02:43:34,330 --> 02:43:36,670
The Sistine Chapel has way
more than 23 paintings.

3179
02:43:36,670 --> 02:43:40,330
Correct, but 23 of them
are documented on Wikidata.

3180
02:43:40,330 --> 02:43:43,330
Have their own item
for the painting, not

3181
02:43:43,330 --> 02:43:45,280
the Sistine Chapel,
the painting has

3182
02:43:45,280 --> 02:43:49,540
an item that lists its
being in the Sistine Chapel.

3183
02:43:49,540 --> 02:43:50,950
There are 23 of those.

3184
02:43:50,950 --> 02:43:52,270
OK.

3185
02:43:52,270 --> 02:43:54,310
There is definitely
room to document

3186
02:43:54,310 --> 02:43:57,040
the rest of the artworks
in the Sistine Chapel.

3187
02:43:57,040 --> 02:43:59,740
So, again, this is just
not the kind of query

3188
02:43:59,740 --> 02:44:03,330
you were able to
make before Wikidata,

3189
02:44:03,330 --> 02:44:07,750
and it's a fairly simple
query, as you can see.

3190
02:44:07,750 --> 02:44:13,020
There are examples using
maps like airports within 100

3191
02:44:13,020 --> 02:44:15,040
kilometers of Berlin.

3192
02:44:15,040 --> 02:44:18,310
Again using the coordinates
as a useful data point.

3193
02:44:18,310 --> 02:44:21,880
And here is a map showing me
only airports within a 100

3194
02:44:21,880 --> 02:44:25,990
kilometer radius from Berlin.

3195
02:44:25,990 --> 02:44:29,140
But I wanted to show
you the mayors query.

3196
02:44:29,140 --> 02:44:34,510
Let's click the-- oh I just
have the wrong link here.

3197
02:44:34,510 --> 02:44:41,040
But I can still find it
here by typing mayor.

3198
02:44:41,040 --> 02:44:44,590
Here we go, largest
cities with female mayor.

3199
02:44:44,590 --> 02:44:47,230
So this is a slightly
more complicated query.

3200
02:44:47,230 --> 02:44:53,010
But if I run it, I get the top
10, because I set limit to 10.

3201
02:44:53,010 --> 02:44:54,820
I get the top 10
cities in the world,

3202
02:44:54,820 --> 02:44:59,710
by population, size that
are currently run by women.

3203
02:44:59,710 --> 02:45:03,490
Tokyo, Mumbai, Yokohama,
Caracas, et cetera.

3204
02:45:03,490 --> 02:45:08,080
And one interesting thing that
you may want to notice here

3205
02:45:08,080 --> 02:45:10,690
is that I'm asking for cities.

3206
02:45:10,690 --> 02:45:13,660
I mean items, that
are instance of city.

3207
02:45:13,660 --> 02:45:16,420
And that have a
head of government,

3208
02:45:16,420 --> 02:45:18,640
that have some
statement about who

3209
02:45:18,640 --> 02:45:28,440
is in charge, and that statement
has sex that's listed up here

3210
02:45:28,440 --> 02:45:29,886
as female.

3211
02:45:29,886 --> 02:45:31,510
Don't worry about
the syntax right now.

3212
02:45:31,510 --> 02:45:34,590
I just want to show you
some specific angle here.

3213
02:45:34,590 --> 02:45:37,920
And I'm further
filtering these results.

3214
02:45:37,920 --> 02:45:45,400
I only want those items where
there is not the property

3215
02:45:45,400 --> 02:45:48,630
and the qualifier, end time.

3216
02:45:48,630 --> 02:45:50,390
Why is that important?

3217
02:45:50,390 --> 02:45:56,530
Because if a city once
had a female mayor,

3218
02:45:56,530 --> 02:45:59,890
but that mayor is not the mayor
anymore, because mayors change,

3219
02:45:59,890 --> 02:46:01,600
I don't want them in this query.

3220
02:46:01,600 --> 02:46:04,990
I want to query of
cities currently having

3221
02:46:04,990 --> 02:46:05,680
a female mayor.

3222
02:46:05,680 --> 02:46:07,990
And of course Wikidata
may have historical data

3223
02:46:07,990 --> 02:46:09,880
with start and
end time, as we've

3224
02:46:09,880 --> 02:46:14,530
seen, that documents this
person was the mayor of Tokyo

3225
02:46:14,530 --> 02:46:17,170
or San Francisco
between these years.

3226
02:46:17,170 --> 02:46:18,820
But if there is no
end times that means

3227
02:46:18,820 --> 02:46:21,520
they are currently the mayor.

3228
02:46:21,520 --> 02:46:24,490
So that's an example of
asking about a qualifier

3229
02:46:24,490 --> 02:46:28,180
of a statement, to again, to get
the results we actually want.

3230
02:46:28,180 --> 02:46:31,630
If we want current mayors it's
important to put this filter.

3231
02:46:31,630 --> 02:46:35,365
If we don't, we will get
historical female mayors

3232
02:46:35,365 --> 02:46:35,865
as well.

3233
02:46:35,865 --> 02:46:39,920


3234
02:46:39,920 --> 02:46:40,490
All right.

3235
02:46:40,490 --> 02:46:45,380
So these are some
example queries.

3236
02:46:45,380 --> 02:46:49,085
Questions about that?

3237
02:46:49,085 --> 02:46:51,620


3238
02:46:51,620 --> 02:46:53,030
Oh, the featured
article example.

3239
02:46:53,030 --> 02:46:58,280


3240
02:46:58,280 --> 02:47:01,700
So let's look at that.

3241
02:47:01,700 --> 02:47:07,050


3242
02:47:07,050 --> 02:47:12,660
So I have prepared
such a query recently.

3243
02:47:12,660 --> 02:47:15,300
Here we go.

3244
02:47:15,300 --> 02:47:18,570
So this is a query.

3245
02:47:18,570 --> 02:47:20,472
I just saved it here
on my user page.

3246
02:47:20,472 --> 02:47:21,930
I mean, this is
not Wikidata query.

3247
02:47:21,930 --> 02:47:25,390
This is just a meta page
containing the query usefully.

3248
02:47:25,390 --> 02:47:28,260


3249
02:47:28,260 --> 02:47:33,800
And let's run this.

3250
02:47:33,800 --> 02:47:38,030
So this query, it's actually
not very complicated.

3251
02:47:38,030 --> 02:47:40,030
It's just has a long
list of countries,

3252
02:47:40,030 --> 02:47:42,170
because I'm asking
about African countries.

3253
02:47:42,170 --> 02:47:42,670
OK.

3254
02:47:42,670 --> 02:47:45,010
I'm looking for human
females from one

3255
02:47:45,010 --> 02:47:51,060
of these countries that
have an article in English.

3256
02:47:51,060 --> 02:47:53,010
That's what this line means.

3257
02:47:53,010 --> 02:47:55,620
But not in French.

3258
02:47:55,620 --> 02:47:57,570
That's what this part means.

3259
02:47:57,570 --> 02:47:59,170
OK.

3260
02:47:59,170 --> 02:48:01,720
This part, these
two lines together.

3261
02:48:01,720 --> 02:48:03,190
But not in French.

3262
02:48:03,190 --> 02:48:05,920
And this is what's
called a badge.

3263
02:48:05,920 --> 02:48:09,430
That's Wikidata's concept of
good and featured articles.

3264
02:48:09,430 --> 02:48:10,600
It's called a badge.

3265
02:48:10,600 --> 02:48:16,500
So I want them to have some
badge on English Wikipedia.

3266
02:48:16,500 --> 02:48:17,000
OK?

3267
02:48:17,000 --> 02:48:22,250
So again, this query is
asking for the top 100 women

3268
02:48:22,250 --> 02:48:26,150
from Africa who are documented
on English Wikipedia,

3269
02:48:26,150 --> 02:48:28,730
in a featured or
good article status.

3270
02:48:28,730 --> 02:48:30,660
But not on French Wikipedia.

3271
02:48:30,660 --> 02:48:33,270
So this is a query that's
a to-do query, right?

3272
02:48:33,270 --> 02:48:35,630
That's a query
for French editors

3273
02:48:35,630 --> 02:48:40,100
to consider what they might
usefully translate or create

3274
02:48:40,100 --> 02:48:41,180
in French.

3275
02:48:41,180 --> 02:48:48,860
And if we run this see
we have three results.

3276
02:48:48,860 --> 02:48:50,720
I mean, we have many
women from Africa

3277
02:48:50,720 --> 02:48:52,460
covered on English Wikipedia.

3278
02:48:52,460 --> 02:48:57,500
But only three articles
have featured or good status

3279
02:48:57,500 --> 02:49:03,460
among those that do not have
French Wikipedia coverage.

3280
02:49:03,460 --> 02:49:04,900
Let me rephrase that.

3281
02:49:04,900 --> 02:49:07,990
Among the English Wikipedia
articles about African women

3282
02:49:07,990 --> 02:49:11,170
that don't have a
French counterpart,

3283
02:49:11,170 --> 02:49:14,520
only three are featured or good.

3284
02:49:14,520 --> 02:49:16,960
OK?

3285
02:49:16,960 --> 02:49:17,640
Do you see this?

3286
02:49:17,640 --> 02:49:19,720
The badge is good article.

3287
02:49:19,720 --> 02:49:23,550
This little incantation
here is what allows

3288
02:49:23,550 --> 02:49:25,950
you to ask about the badge.

3289
02:49:25,950 --> 02:49:28,730
This here.

3290
02:49:28,730 --> 02:49:33,420
And, by the way, the slides
will be uploaded to commons.

3291
02:49:33,420 --> 02:49:38,708
And we will-- how shall we make
it available on the YouTube

3292
02:49:38,708 --> 02:49:39,710
thing as well?

3293
02:49:39,710 --> 02:49:42,730


3294
02:49:42,730 --> 02:49:43,230
No, no.

3295
02:49:43,230 --> 02:49:45,870
But, I mean, for people who
will later watch this video.

3296
02:49:45,870 --> 02:49:52,119


3297
02:49:52,119 --> 02:49:54,160
Oh yeah, we can add it to
the YouTube description

3298
02:49:54,160 --> 02:49:55,368
and the comments description.

3299
02:49:55,368 --> 02:49:58,090
So in the-- if you're
watching this video later,

3300
02:49:58,090 --> 02:50:00,820
in the description, we will
add a link to this query

3301
02:50:00,820 --> 02:50:01,480
specifically.

3302
02:50:01,480 --> 02:50:03,340
Because it's not in
the slides right now.

3303
02:50:03,340 --> 02:50:03,910
It will be.

3304
02:50:03,910 --> 02:50:06,622


3305
02:50:06,622 --> 02:50:07,980
OK.

3306
02:50:07,980 --> 02:50:10,260
So.

3307
02:50:10,260 --> 02:50:13,590
Questions so far?

3308
02:50:13,590 --> 02:50:14,700
We're almost done.

3309
02:50:14,700 --> 02:50:16,260
We have a few minutes left.

3310
02:50:16,260 --> 02:50:18,090
So questions about queries?

3311
02:50:18,090 --> 02:50:20,130
I mean, I'm sure
there's tons of things

3312
02:50:20,130 --> 02:50:21,510
you don't know how to do yet.

3313
02:50:21,510 --> 02:50:24,720
And you maybe you didn't really
get the sense for SPARQL.

3314
02:50:24,720 --> 02:50:27,120
It's something you need
to really do on your own

3315
02:50:27,120 --> 02:50:28,290
on your computer.

3316
02:50:28,290 --> 02:50:29,465
See how it works.

3317
02:50:29,465 --> 02:50:30,090
Fiddle with it.

3318
02:50:30,090 --> 02:50:30,900
Change something.

3319
02:50:30,900 --> 02:50:33,270
See that it breaks
and complains.

3320
02:50:33,270 --> 02:50:37,470
But, very importantly-- oh I
had this in the other questions

3321
02:50:37,470 --> 02:50:38,340
slide.

3322
02:50:38,340 --> 02:50:42,480
Remember Wikidata project chat.

3323
02:50:42,480 --> 02:50:45,810
That's kind of the Wikidata
equivalent of the village pump.

3324
02:50:45,810 --> 02:50:47,790
It's the page on Wikidata
where you can just

3325
02:50:47,790 --> 02:50:49,830
show up and ask a question.

3326
02:50:49,830 --> 02:50:52,290
In my experience, the
Wikidata community

3327
02:50:52,290 --> 02:50:55,410
is very nice, very
welcoming, and very eager

3328
02:50:55,410 --> 02:51:00,100
to help newer people integrate
and learn how to do things.

3329
02:51:00,100 --> 02:51:01,800
There's also an IRC channel.

3330
02:51:01,800 --> 02:51:04,260
If you know what IRC is and
how to use it, by all means,

3331
02:51:04,260 --> 02:51:07,890
go to IRC channel Wikidata.

3332
02:51:07,890 --> 02:51:09,330
There's people
there all the time,

3333
02:51:09,330 --> 02:51:11,040
and you can just ask a question.

3334
02:51:11,040 --> 02:51:13,245
If you're trying to do a
query, and you don't quite

3335
02:51:13,245 --> 02:51:15,870
understand the syntax, or you're
not sure how to get the result

3336
02:51:15,870 --> 02:51:16,680
you want.

3337
02:51:16,680 --> 02:51:20,050
There are people there who
will gladly help you do that.

3338
02:51:20,050 --> 02:51:22,560
There is also a
Wikidata newsletter

3339
02:51:22,560 --> 02:51:25,680
published by the Wikidata team,
which is centered in Germany

3340
02:51:25,680 --> 02:51:27,330
and Wikipedia Germany.

3341
02:51:27,330 --> 02:51:31,890
And they send out a newsletter
in English with Wikidata news.

3342
02:51:31,890 --> 02:51:33,570
You know, new
properties, new items,

3343
02:51:33,570 --> 02:51:34,920
new things in the project.

3344
02:51:34,920 --> 02:51:36,840
But also sample queries.

3345
02:51:36,840 --> 02:51:39,300
So once a week there is
kind of an awesome query

3346
02:51:39,300 --> 02:51:43,440
to learn from, if you want
to learn that way instead

3347
02:51:43,440 --> 02:51:46,230
of reading like a
whole manual on SPARQL.

3348
02:51:46,230 --> 02:51:48,300
So I'm just encouraging
you to get help

3349
02:51:48,300 --> 02:51:49,470
in one of those channels.

3350
02:51:49,470 --> 02:51:51,000
Of course you can write to me.

3351
02:51:51,000 --> 02:51:55,920
Just reach out to me and
ask me questions as well.

3352
02:51:55,920 --> 02:51:58,860
I hope by now you agree
that Wikidata is love,

3353
02:51:58,860 --> 02:52:03,150
and Wikidata data is awesome.

3354
02:52:03,150 --> 02:52:06,480
If there are no questions,
we do have a tiny bit of time

3355
02:52:06,480 --> 02:52:11,510
to demonstrate one
more tool but that's--

3356
02:52:11,510 --> 02:52:12,010
no?

3357
02:52:12,010 --> 02:52:13,170
No questions.

3358
02:52:13,170 --> 02:52:17,600
OK so let's talk about--

3359
02:52:17,600 --> 02:52:19,100
well, the resonator
is kind of nice,

3360
02:52:19,100 --> 02:52:22,890
but it's a little like
the article placeholder.

3361
02:52:22,890 --> 02:52:25,530
So this is not Wikidata
this is a tool again

3362
02:52:25,530 --> 02:52:26,805
built by Magnus Manske--

3363
02:52:26,805 --> 02:52:29,310
AUDIENCE: There's also one
final question to you in case--

3364
02:52:29,310 --> 02:52:29,820
ASAF BARTOV: Oh,
there is a question.

3365
02:52:29,820 --> 02:52:30,390
AUDIENCE: Yeah.

3366
02:52:30,390 --> 02:52:32,348
ASAF BARTOV: Which
advantages and disadvantages

3367
02:52:32,348 --> 02:52:35,370
to create an item
before an article is

3368
02:52:35,370 --> 02:52:37,920
done on English Wikipedia?

3369
02:52:37,920 --> 02:52:42,340
Well, I mean, this example
that I just made right.

3370
02:52:42,340 --> 02:52:46,960
I'm reading this book
by a notable author.

3371
02:52:46,960 --> 02:52:47,810
OK.

3372
02:52:47,810 --> 02:52:51,400
I want this to
exist on Wikidata,

3373
02:52:51,400 --> 02:52:53,320
and to be mentioned
on Wikidata, so

3374
02:52:53,320 --> 02:52:56,950
that when people look up
that author in Wikidata

3375
02:52:56,950 --> 02:52:59,170
they will know about one
of his notable works.

3376
02:52:59,170 --> 02:53:02,470
But I'm not prepared to
put in the time investment

3377
02:53:02,470 --> 02:53:05,670
to build a whole article
on English Wikipedia.

3378
02:53:05,670 --> 02:53:07,420
Either because I don't
have the time, or I

3379
02:53:07,420 --> 02:53:09,040
don't have good sources.

3380
02:53:09,040 --> 02:53:11,560
Or maybe my English
is not good enough,

3381
02:53:11,560 --> 02:53:14,980
but it is good enough to just
record these very basic facts

3382
02:53:14,980 --> 02:53:17,850
and point to the Library of
Congress records et cetera.

3383
02:53:17,850 --> 02:53:20,170
So that it's better
than nothing.

3384
02:53:20,170 --> 02:53:23,170
So that's one reason
to maybe do it.

3385
02:53:23,170 --> 02:53:26,690
Another reason is to
be able to link to it.

3386
02:53:26,690 --> 02:53:30,190
So remember that
translator lady already

3387
02:53:30,190 --> 02:53:33,280
had an item on Wikidata, but if
she hadn't we could have just

3388
02:53:33,280 --> 02:53:38,560
created a very, very basic
rudimentary item about her just

3389
02:53:38,560 --> 02:53:41,740
saying, you know,
this name is human.

3390
02:53:41,740 --> 02:53:43,060
Country, Bulgaria.

3391
02:53:43,060 --> 02:53:45,220
Occupation, translator.

3392
02:53:45,220 --> 02:53:48,580
Even just that would have
would have been something,

3393
02:53:48,580 --> 02:53:51,610
and would have enabled me
to link to this person.

3394
02:53:51,610 --> 02:53:56,860
So these are legitimate reasons
to create Wikidata entities

3395
02:53:56,860 --> 02:54:01,510
without, or at least before,
creating a Wikipedia article.

3396
02:54:01,510 --> 02:54:02,709
If you are going to create--

3397
02:54:02,709 --> 02:54:04,750
I mean if you're at and
edit-a-thon or something,

3398
02:54:04,750 --> 02:54:07,690
and you have come to
create Wikipedia articles,

3399
02:54:07,690 --> 02:54:10,660
by all means, first create
the Wikipedia article,

3400
02:54:10,660 --> 02:54:13,982
then create the Wikipedia
item and link to it.

3401
02:54:13,982 --> 02:54:17,580


3402
02:54:17,580 --> 02:54:20,500
I hope that answers
the question.

3403
02:54:20,500 --> 02:54:24,940
So the reasonator
is simply a kind

3404
02:54:24,940 --> 02:54:31,330
of prettier view of
items in Wikidata.

3405
02:54:31,330 --> 02:54:35,980
So you can just type the name
of an item or the number.

3406
02:54:35,980 --> 02:54:39,010
Let's pick just a
random number, 42.

3407
02:54:39,010 --> 02:54:39,595
Say 42.

3408
02:54:39,595 --> 02:54:42,770


3409
02:54:42,770 --> 02:54:45,950
Which happens to
be, maybe you've

3410
02:54:45,950 --> 02:54:51,310
heard of this guy,
Douglas Adams.

3411
02:54:51,310 --> 02:54:55,490
He happened to have received
the queue number 42.

3412
02:54:55,490 --> 02:54:58,760
I'm sure it's a
cosmic coincidence

3413
02:54:58,760 --> 02:55:01,460
of infinite improbability.

3414
02:55:01,460 --> 02:55:03,470
And this is a view--

3415
02:55:03,470 --> 02:55:05,690
this is a tool that
is not Wikidata.

3416
02:55:05,690 --> 02:55:09,690
It's a tool built on top of
Wikidata called resonator.

3417
02:55:09,690 --> 02:55:14,750
And it gives us the information
from Q42, that is from the--

3418
02:55:14,750 --> 02:55:18,800
this item in Wikidata, which
looks like an item in Wikidata.

3419
02:55:18,800 --> 02:55:21,320
But it gives it to us in a
slightly more rational kind

3420
02:55:21,320 --> 02:55:22,430
of lay out.

3421
02:55:22,430 --> 02:55:24,200
It even kind of
generates a little bit

3422
02:55:24,200 --> 02:55:27,620
of pseudo article text for us.

3423
02:55:27,620 --> 02:55:30,429
You know, Douglas Adams was
a British writer, playwright,

3424
02:55:30,429 --> 02:55:31,970
screenwriter,
bla-bla-bla, an author.

3425
02:55:31,970 --> 02:55:35,630
He was born on this date, in
this place, to these people.

3426
02:55:35,630 --> 02:55:39,080
He studied at this place
between these years.

3427
02:55:39,080 --> 02:55:40,670
That's all machine generated.

3428
02:55:40,670 --> 02:55:42,230
Nobody wrote this text.

3429
02:55:42,230 --> 02:55:46,330
That's all taken from those
statements in Wikidata,

3430
02:55:46,330 --> 02:55:51,080
and generates this reasonable
reading summary paragraph.

3431
02:55:51,080 --> 02:55:54,140
And then it gives us this
little table of relatives.

3432
02:55:54,140 --> 02:55:55,610
It's all taken from Wikidata.

3433
02:55:55,610 --> 02:55:57,740
But as you can see,
this is already

3434
02:55:57,740 --> 02:56:02,120
a little more accessible than
the essentially arbitrary

3435
02:56:02,120 --> 02:56:05,120
ordering of statements
on Wikidata.

3436
02:56:05,120 --> 02:56:06,200
And that's OK.

3437
02:56:06,200 --> 02:56:08,060
I mean, that's
kind of by design.

3438
02:56:08,060 --> 02:56:10,100
Wikidata is the platform.

3439
02:56:10,100 --> 02:56:11,960
There is going to
be-- there are going

3440
02:56:11,960 --> 02:56:15,680
to be many new applications,
and platforms, and tools,

3441
02:56:15,680 --> 02:56:19,010
and visual interfaces
on top of Wikidata

3442
02:56:19,010 --> 02:56:23,000
to browse Wikidata in a more
friendly or more customized

3443
02:56:23,000 --> 02:56:24,480
ways.

3444
02:56:24,480 --> 02:56:27,080
For example, one of the
things that resonator

3445
02:56:27,080 --> 02:56:31,610
does for us is give us pictures
and maps and a timeline.

3446
02:56:31,610 --> 02:56:32,960
Check it out this.

3447
02:56:32,960 --> 02:56:38,990
Time line machine generated,
just from dates and points

3448
02:56:38,990 --> 02:56:44,090
in time, mentioned in the
relatively rich Wikidata

3449
02:56:44,090 --> 02:56:47,200
item about Douglas Adams.

3450
02:56:47,200 --> 02:56:47,700
Right?

3451
02:56:47,700 --> 02:56:50,030
So this timeline, for example
again, completely machine

3452
02:56:50,030 --> 02:56:51,140
generated.

3453
02:56:51,140 --> 02:56:53,270
But he was educated
between these years,

3454
02:56:53,270 --> 02:56:54,920
so I can put it on the timeline.

3455
02:56:54,920 --> 02:56:57,260
And this is the year he was
nominated for a Hugo awards,

3456
02:56:57,260 --> 02:56:59,570
so I can put that in a timeline.

3457
02:56:59,570 --> 02:57:00,600
Et cetera.

3458
02:57:00,600 --> 02:57:03,050
So that's just a super
quick demonstration

3459
02:57:03,050 --> 02:57:06,620
of that tool, the resonator.

3460
02:57:06,620 --> 02:57:10,310
Links are all here
in the slides.

3461
02:57:10,310 --> 02:57:13,390
And the final tool I wanted
to mention very quickly

3462
02:57:13,390 --> 02:57:16,220
is the mix and match tool.

3463
02:57:16,220 --> 02:57:21,980
You remember my explanation
about Wikidata as Nexus,

3464
02:57:21,980 --> 02:57:27,320
as connection point between many
databases, many data sources.

3465
02:57:27,320 --> 02:57:31,080
Those depend on
these equivalencies.

3466
02:57:31,080 --> 02:57:35,300
On Wikidata being taught
that this item is like that

3467
02:57:35,300 --> 02:57:37,940
ID in this other database.

3468
02:57:37,940 --> 02:57:41,810
And mix and match is a tool
again by, Magnus Manske.

3469
02:57:41,810 --> 02:57:44,690
Maybe you're detecting
a pattern here.

3470
02:57:44,690 --> 02:57:47,390
It's a tool by Magnus
that is designed

3471
02:57:47,390 --> 02:57:50,270
to enable us to kind
of take a foreign,

3472
02:57:50,270 --> 02:57:54,950
an external data set, put
it alongside Wikidata,

3473
02:57:54,950 --> 02:57:56,690
and kind of try and align them.

3474
02:57:56,690 --> 02:57:59,410
So this item in this
external dataset,

3475
02:57:59,410 --> 02:58:01,230
is that already
covered in Wikidata?

3476
02:58:01,230 --> 02:58:02,890
If so, by what queue number?

3477
02:58:02,890 --> 02:58:03,890
By what item?

3478
02:58:03,890 --> 02:58:06,170
If not, maybe we need
to create a Wikidata

3479
02:58:06,170 --> 02:58:07,610
item to represent it.

3480
02:58:07,610 --> 02:58:10,010
Or maybe it's a
duplicate, or something.

3481
02:58:10,010 --> 02:58:15,980
So the mix and match tool has
a list of external data sets,

3482
02:58:15,980 --> 02:58:18,140
as you can see.

3483
02:58:18,140 --> 02:58:21,260
The Art and Architecture
Thesaurus by the Getty Research

3484
02:58:21,260 --> 02:58:22,220
Institute.

3485
02:58:22,220 --> 02:58:26,690
Or the Australian
Dictionary of Biography.

3486
02:58:26,690 --> 02:58:28,880
All kinds of external
data sets here.

3487
02:58:28,880 --> 02:58:32,470


3488
02:58:32,470 --> 02:58:40,060
Somewhere here I had a specific
link to the Royal Society.

3489
02:58:40,060 --> 02:58:41,710
It can also give
me some statistics.

3490
02:58:41,710 --> 02:58:47,410
So there is an external data set
of all the Fellows of the Royal

3491
02:58:47,410 --> 02:58:48,001
Society.

3492
02:58:48,001 --> 02:58:48,500
Right?

3493
02:58:48,500 --> 02:58:54,970
The oldest academic
learned society in England.

3494
02:58:54,970 --> 02:58:57,415
And the internet is tired.

3495
02:58:57,415 --> 02:59:03,240


3496
02:59:03,240 --> 02:59:04,640
Here we go.

3497
02:59:04,640 --> 02:59:07,115
Nope.

3498
02:59:07,115 --> 02:59:08,105
Did that work?

3499
02:59:08,105 --> 02:59:12,560


3500
02:59:12,560 --> 02:59:15,390
Fellows of the Royal
Society, here we go.

3501
02:59:15,390 --> 02:59:17,970
So this one is complete.

3502
02:59:17,970 --> 02:59:21,330
I mean, people have manually
gone over every single item

3503
02:59:21,330 --> 02:59:24,330
there and either
matched it to Wikidata

3504
02:59:24,330 --> 02:59:27,390
or declared that it was not
in scope, or a duplicate

3505
02:59:27,390 --> 02:59:28,520
or whatever.

3506
02:59:28,520 --> 02:59:31,230
But let's look at site stats.

3507
02:59:31,230 --> 02:59:35,210
This is a fun kind of
aspect of this tool.

3508
02:59:35,210 --> 02:59:38,530
But that is not working.

3509
02:59:38,530 --> 02:59:40,820
Or it's taking too long.

3510
02:59:40,820 --> 02:59:43,940
So let's just demonstrate
how this works.

3511
02:59:43,940 --> 02:59:45,590
Maybe Britannica?

3512
02:59:45,590 --> 02:59:46,780
Is that done already?

3513
02:59:46,780 --> 02:59:52,570


3514
02:59:52,570 --> 02:59:53,990
Here we go.

3515
02:59:53,990 --> 02:59:55,330
Encyclopedia Britannica.

3516
02:59:55,330 --> 02:59:55,960
Yeah.

3517
02:59:55,960 --> 03:00:02,040
So the Encyclopedia
Britannica has

3518
03:00:02,040 --> 03:00:05,940
40% of the items there
are not yet processed.

3519
03:00:05,940 --> 03:00:07,830
So let's process one of them.

3520
03:00:07,830 --> 03:00:16,180
For example there is an item
in the Encyclopedia Britannica

3521
03:00:16,180 --> 03:00:19,960
called Boston, England.

3522
03:00:19,960 --> 03:00:23,050
As you know
All-American place names

3523
03:00:23,050 --> 03:00:26,050
are totally stolen
from elsewhere.

3524
03:00:26,050 --> 03:00:29,440
So there is a Boston
in England, though it's

3525
03:00:29,440 --> 03:00:30,700
no longer the famous one.

3526
03:00:30,700 --> 03:00:36,340
And the mix and match
tool has automatically

3527
03:00:36,340 --> 03:00:39,610
matched it based on
the label to queue

3528
03:00:39,610 --> 03:00:43,900
100, which is Boston big
city in the United States.

3529
03:00:43,900 --> 03:00:45,500
And that is incorrect, right?

3530
03:00:45,500 --> 03:00:48,910
That's kind of naive computer
going, well this is Boston,

3531
03:00:48,910 --> 03:00:50,820
and this other thing
is also Boston.

3532
03:00:50,820 --> 03:00:56,260
And it is asking me to
confirm this match or not.

3533
03:00:56,260 --> 03:00:57,400
You see?

3534
03:00:57,400 --> 03:01:01,120
So this is the Boston,
England from Britannica.

3535
03:01:01,120 --> 03:01:04,720
And the tool is asking
me, is this the same as

3536
03:01:04,720 --> 03:01:06,910
Boston queue 100 in America?

3537
03:01:06,910 --> 03:01:07,990
The answer is no.

3538
03:01:07,990 --> 03:01:10,110
I removed this.

3539
03:01:10,110 --> 03:01:11,860
I remove this match.

3540
03:01:11,860 --> 03:01:15,430
And now this Boston,
England is unmatched.

3541
03:01:15,430 --> 03:01:23,230
And I can match it to the
correct one in England.

3542
03:01:23,230 --> 03:01:27,370
I can do this by searching
English Wikipedia,

3543
03:01:27,370 --> 03:01:28,780
or searching Wikidata.

3544
03:01:28,780 --> 03:01:32,000
I mean, it has
these handy links.

3545
03:01:32,000 --> 03:01:36,910
So the English town
is in Lincolnshire.

3546
03:01:36,910 --> 03:01:38,230
Boston, Lincolnshire.

3547
03:01:38,230 --> 03:01:46,030
So I can go there and then
get the Wikidata item number.

3548
03:01:46,030 --> 03:01:49,810
See this is not queue
100, Boston in the states,

3549
03:01:49,810 --> 03:01:53,440
this is queue 311975
town in Lincolnshire.

3550
03:01:53,440 --> 03:01:57,310
I can get this queue
number, go back to the mix

3551
03:01:57,310 --> 03:01:58,160
and match tool--

3552
03:01:58,160 --> 03:01:59,110
Where was that?

3553
03:01:59,110 --> 03:02:00,180
Here we are.

3554
03:02:00,180 --> 03:02:01,510
And set queue.

3555
03:02:01,510 --> 03:02:08,650
I can tell the tool that this is
the right Boston, and click OK.

3556
03:02:08,650 --> 03:02:14,550
And now this town
in Lincolnshire,

3557
03:02:14,550 --> 03:02:17,100
you can see this here,
this item, queue 311975,

3558
03:02:17,100 --> 03:02:21,190
is linked to Britannica.

3559
03:02:21,190 --> 03:02:22,660
What does this mean?

3560
03:02:22,660 --> 03:02:23,820
Well, if we go there.

3561
03:02:23,820 --> 03:02:25,380
If we actually go
to the Wikidata

3562
03:02:25,380 --> 03:02:28,890
entity you will see
that in addition

3563
03:02:28,890 --> 03:02:34,140
to the few statements that
it already had, it now has,

3564
03:02:34,140 --> 03:02:38,610
thanks to my clicking, it now
has another identifier here.

3565
03:02:38,610 --> 03:02:39,270
See?

3566
03:02:39,270 --> 03:02:43,950
Encyclopedia Britannica
Online ID, with this link.

3567
03:02:43,950 --> 03:02:49,440
And if we click it, we
will indeed reach this page

3568
03:02:49,440 --> 03:02:51,510
in the Britannica
online, which is indeed

3569
03:02:51,510 --> 03:02:53,700
about this town in Lincolnshire.

3570
03:02:53,700 --> 03:02:54,510
You see?

3571
03:02:54,510 --> 03:02:58,650
So I've contributed one
of those mappings, one

3572
03:02:58,650 --> 03:03:01,950
of those identifiers,
into Wikidata.

3573
03:03:01,950 --> 03:03:04,860
And I didn't have
to do it manually.

3574
03:03:04,860 --> 03:03:07,980
This tool kind of prompted
me to either confirm

3575
03:03:07,980 --> 03:03:09,480
if it was correct,
I could have just

3576
03:03:09,480 --> 03:03:12,150
clicked confirm since
it wasn't correct.

3577
03:03:12,150 --> 03:03:16,920
I corrected it manually, but
it made this edit on my behalf.

3578
03:03:16,920 --> 03:03:21,180
So that's another tool that
encourages us to systematically

3579
03:03:21,180 --> 03:03:24,360
teach Wikidata more things.

3580
03:03:24,360 --> 03:03:25,860
And we're out of time.

3581
03:03:25,860 --> 03:03:29,430
Go edit Wikidata, Now
that you have the power,

3582
03:03:29,430 --> 03:03:30,510
you know the deal.

3583
03:03:30,510 --> 03:03:32,430
Use it for good,
and not for evil.

3584
03:03:32,430 --> 03:03:35,640
If you have questions,
this is my email address.

3585
03:03:35,640 --> 03:03:38,640
If you're watching this video
not live the description

3586
03:03:38,640 --> 03:03:41,610
will have links to the
slides, and to a bunch

3587
03:03:41,610 --> 03:03:44,610
of other useful
pieces of information.

3588
03:03:44,610 --> 03:03:49,510
Any last questions on IRC?

3589
03:03:49,510 --> 03:03:53,210
If not, thank you
for your attention.

3590
03:03:53,210 --> 03:03:56,470
And if you like this, and if you
feel that you now get Wikidata,

3591
03:03:56,470 --> 03:03:58,330
and you get what it's
good for, and you're

3592
03:03:58,330 --> 03:04:01,660
inspired to contribute, I have
only one request from you.

3593
03:04:01,660 --> 03:04:04,960
I mean, in addition to using
it for good not for evil,

3594
03:04:04,960 --> 03:04:07,630
I ask that you spread the word.

3595
03:04:07,630 --> 03:04:09,550
Show this video--
share this video

3596
03:04:09,550 --> 03:04:13,180
with other people in your
community, or around you.

3597
03:04:13,180 --> 03:04:16,000
Teach this yourself
once you're comfortable

3598
03:04:16,000 --> 03:04:17,650
with these concepts.

3599
03:04:17,650 --> 03:04:21,330
Feel free to use my slides.

3600
03:04:21,330 --> 03:04:23,580
Yeah, and edit Wikidata.

3601
03:04:23,580 --> 03:04:27,010
Thank you very
much, and goodbye.

3602
03:04:27,010 --> 03:04:32,456