Commons:Media Data Verification Tool/Community Consultation

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search


Community needs finding[edit]

Screenshot for the "Media Data Verification Tool (MDVT) Community Suggestions Questionnaire"

We decided to consult the community before actually developing the tool. The main issue is that we didn't know what would be the best method of user verification (ensuring that the data processed through this tool is correct). Besides, we would also like to gather any comments by the community on the tool, such as any special features they'd like to see in the tool. For this reason, we decided to publish a questionnaire. invitations to complete the survey were posted on the Commons village pump and technical village pump and on members' talk page who were involved in the ISA tool (mainly those who reported ISA bugs on Phabricator) or structured data (those following the structured data project on Phabricator).

Results[edit]

16 submissions were recorded. The results and result analysis of each question are listed below.

Complexity of the current method[edit]

Question:

How complicated is it for a volunteer to go over images manually on Commons and verify it's data (caption / depict / etc.)?

Results:

Complicacy Number of responses
Very simple
2
Fairly simple
2
50 / 50
6
Fairly complicated
4
Very complicated
2

Results analysis:

2.125 / 4

On average, the current method is in the middle, slightly tilted to complicated, of being simple and complicated.

MDVT data type coverage[edit]

Question:

How important is the verification of each type of data using the tool?

Results:

Data type Score
(Multilingual) captions 0.714
Depicts 0.964
Other data 0.727
How were the above scores calculated
The question provided 4 possible answers for each data type, which are: "Not important at all" (A), "50 / 50" (B), "Extremely important" (C) and "I don't know" (D). The score is calculated using the following formula: [ (A) count * 0 + (B) count * 1 + (C) count * 2 ] / [ (A) + (B) + (C) count * 2]

Other data types to be covered[edit]

Question:

Any "other data" that is important to be included in this tool:

Results:

  • 1 / 2 other types of data, so that people don't mix bad data (like location) in with depicts
  • Location data
  • To me captions and depicts are like labels and depict statements on Wikidata, somewhat useful but extremly tiny fraction of information stored,
  • Licence, Date, Location, Limited amount of EXIF data
  • The importance of data varies wildly with the data and people's perception of such over time
  • Categories
  • Some image metadata probably (typically licence)

User verification method[edit]

Question:

Which of the following methods do you think is most appropriate to ensure data entered through this tool is correct?

Results:

Method Responses
Show a question to multiple users and save the majority choice
8
Show users test questions (maybe 1 test in every 10 real questions) to see if they choose the right answer
4
Only allow "trusted" users to use the tool (maybe require a certain amount of edits and/or a certain account age?)
2
Have users to verify/approve others' entries
1

Additional suggestion: captcha

Results analysis:

Most people (8, 50%) chose to show a question to multiple users and use the majority choice. While that option is the clearly most popular, we'd still have to consider other options, trying to achieve a balance between the user's wants and what's "best".

Other comments[edit]

Question:

Anything else. Any special features you'd like to see in the tool? Anything you think we should be aware of while developing the tool? Maybe some comments on the current user interface?

Results:

  • We suggest that people be encouraged to confirm depictions of things by mapping the region to depicts, like in the https://wd-image-positions.toolforge.org/file/Effigie_du_chevalier_Philippe_Pot_(d%C3%A9tails)..jpg?
  • We also should recommend the removal of depicts statements that are in the same instance of/subclass of tree -- i.e. if both insect and a specific species of bug is being displayed on the item, it should suggest removal of those depicts statements.
  • The complication of verification (the first question here) can vary widely (most users can identify an image depicting "stairs" but depictions of a specific taxonomy of a life form might prove challenging unless it is especially notable). To that end, why not adding some sort of artificial intelligence for the verifying users like a Google image search results. Then if verifiers perceive an image of Selma Elloumi Rekik, they can compare it to potentially other similarly tagged items depicted elsewhere and they can make a useful verification instead of skipping things they know nothing about.
  • Maybe a better way would be to show ~20 images with some depict label at a time in a mosaic and ask to select the images where the label is wrong
  • Is this tool connected to the ISA tool ? Looks like it has a similar purpose. / The design doesn't look very wiki for now.