User:AbdealiJK/file-metadata/GSoC2016

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

This page has a final report for the GSoC 2016 project: "Port catimges.py to pywikibot-core".

Aim

[edit]

The aim of the project was to convert functionality provided by the catimages.py script in pywikibot-compat to pywikibot-core. The catimages.py project used various methods to identify the category an image falls in, including reading metadata like EXIF and Computer Vision to detect faces.

While doing this, some of the key aims were:

  • To make the code more stable, as the earlier catimages was considered more of a proof of concept
  • To use the latest frameworks and libraries when possible rather than the older ones used in the earlier script
  • To ensure that all dependencies were well supported with unittests and continuous integration and had an active maintainer
  • To clean up the code of catimages and add CI and unittests to ensure that other contributors can extend upon it easily
  • To make catimages user friendly with an easy installation procedure to get it up and running

Work done during project

[edit]

As part of the project, we made a new github repository which contains all the code of the project. This github project is going to be the official location of the catimages script.

Final Results

[edit]

The script was run on various days of a week, and the results of categories found and so on have been found and documented here. These statistics was got by running the log-bot binary provided by file-metadata which was run on ToolLabs.

The detailed logs can be seen at:

Notes:

  • The script ran on every 5th file in the case of uploaded files, because the total number of files is too large (~10000 per day)
  • To understand the quality of the categorization, the images were put into 3 buckets:
    • Type: Categories that describe what type of image it is and also other metadata about the image (For example JPEG file, Graphic, Taken with <camera model>)
    • Content: Categories that describe what is being seen in the image (For example, Faces)
    • Location: Categories related to the location the image was taken (using GPS)
  • The tables rendered below may not be optimal for small displays
  • The uncategorized images were analyzed on 19 Aug 2016. Some of the images may not be uncategorized now.
Buckets captured over days
All uploaded images Uncategorized images
10
20
30
40
50
60
70
80
90
100
20 Jul
21 Jul
22 Jul
23 Jul
24 Jul
25 Jul
26 Jul
  •   Type
  •   Content
  •   Location
10
20
30
40
50
60
70
80
90
100
20 Jul
21 Jul
22 Jul
23 Jul
24 Jul
25 Jul
26 Jul
  •   Type
  •   Content
  •   Location
Bucket 20 Jul 2016 21 Jul 2016 22 Jul 2016 23 Jul 2016 24 Jul 2016 25 Jul 2016 26 Jul 2016
Type 100 100 100 100 100 100 100
Content 21.24 18.49 19.12 22.26 13.50 23.16 18.03
Location 9.75 6.97 10.27 8.41 11.02 11.38 11.34
Bucket 20 Jul 2016 21 Jul 2016 22 Jul 2016 23 Jul 2016 24 Jul 2016 25 Jul 2016 26 Jul 2016
Type 100 100 100 100 100 100 100
Content 18.76 30.01 26.71 11.57 38.62 20.94 31.85
Location 6.69 12.50 5.02 6.22 3.11 2.64 4.55
Files analyzed and distinct categories used per day
All uploaded images Uncategorized images
500
1,000
1,500
2,000
2,500
3,000
20 Jul
21 Jul
22 Jul
23 Jul
24 Jul
25 Jul
26 Jul
  •   Files analyzed
  •   Distinct Categories used
250
500
750
1,000
1,250
1,500
20 Jul
21 Jul
22 Jul
23 Jul
24 Jul
25 Jul
26 Jul
  •   Files analyzed
  •   Distinct Categories used
20 Jul 2016 21 Jul 2016 22 Jul 2016 23 Jul 2016 24 Jul 2016 25 Jul 2016 26 Jul 2016
Files 1558 1449 1537 2044 1496 1977 1525
Categories 352 306 349 360 383 401 358
Bucket 20 Jul 2016 21 Jul 2016 22 Jul 2016 23 Jul 2016 24 Jul 2016 25 Jul 2016 26 Jul 2016
Files 986 1016 1134 916 963 1098 901
Categories 251 208 233 216 195 185 245

Examples

[edit]

Here are a few examples of the results obtained by the script:

Faces

[edit]

Football kits

[edit]