Commons:構造化データ/コンピュータ補助によるタグ付け

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
This page is a translated version of a page Commons:Structured data/Computer-aided tagging and the translation is 30% complete. Changes to the translation template, respectively the source language can be submitted through Commons:Structured data/Computer-aided tagging and have to be approved by a translation administrator.
Notice コンピュータ補助によるタグ付けという新技術では、提示されたタグが誤りだったり不適切な場合があります。これは予想範囲内の挙動です。

「コンピュータ補助によるタグ付け」ツールは、Structured Data on Commons team(コモンズ構造化データチーム?)により開発中のコミュニティメンバーによる特定およびラベル化支援のための機能ですdepicts statements for Commons files。コモンズには何千何百万もの慎重にcurate(整理公開?)されたファイルがありますが、構造化データツールは新しいものです。この機能により、既存のファイルはそのコンテンツを容易に迅速にそしてー注意深く用いられればー正確に説明されることが可能になります。編集者は投稿の際、ウィキデータがどのように動作するか知ることも特定の言語を話すことも必要ありません。この新しい機能は、computer vision modelを用いて利用者に対して「タグ」の提案を行い、人力のreviewを促します。コモンズ利用者は、コモンズの特定ページを訪れて提案されたをdepicts tagsを見て、どれを確認しどれを無視するか選択することができます。人の関与なくタグが自動的に追加されることはありません。

コンピュータ補助によるタグ付けは、構造化データを追加する手助けをします。次いで、それらのファイルは、通常の検索用語を用いて以前には不可能であった形ででもSpecial:MediaSearchで見つけることができるようになります。これにより、利用者は、ファイル情報や設置カテゴリに大きく頼っていた古い検索機能では容易に見つからなかったメディアをよりたやすく見つけられるようになります。具体情報が欠けていると、 コモンズの多くのメディアは、通常検索機能では見つけるのは至難至極でした。たとえば、 Peter_iredale_sunset_edited1.jpg はSpecial:MediaSearchをもちいると、コンピュータ支援タグにより"beach" depicts statementが追加されているので"beach"の検索で現れます。通常検索を用いて"beach" で検索した場合は全く現れません。

コンピュータ補助によるタグ付けは、stand-alone MediaWiki extension であり、コモンズそのものの core partではなく、Special:SuggestedTagsを用いてコモンズに結び付けられています。 back-endでは(最終的には?)、このツールは Google Cloud Vision の描写提案を利用することになります。ウィキメディアはすでにWikisource OCRでGoogle Cloud Visionサービスを利用しています。このツールは登録済みの自動承認された利用者にopt-in で提供されます。どの利用者グループに対してもデフォルトでonになってはいませんし、新規利用者や未登録利用者には提供されていません。

今まで(2020年2月14日):

  1. 5,809 total users have made edits via the Computer-Aided Tagging tool
    • 962 of these users did so via mobile web
  2. 341,957 total files have had edits made via Computer-Aided Tagging
    • 41,563 of those files have Computer-Aided Tagging edit on mobile web
  3. 72% of files with CAT edits had those edits done by the same user who uploaded the file
  4. Approximately 10,000 files edited by CAT so far were purely manual edits
  5. We’re averaging about 20 new users a week currently

Charts for this data are updated every Monday on the CAT usage report analytics page

CAT specificity

We’re working on possible techniques for improving the tool’s ability to accurately identify specific elements of photos, but it’s important to keep in mind that the Google Vision algorithm already does fairly well in many topic spaces.

Upcoming tweaks to the queue for general images

Although most usage of the Computer Aided Tagging system comes from users editing their own uploads (72%), there is a separate queue for “popular” images. Based on recent feedback from the Commons community, we’re exploring ways to prioritize this queue differently. Particularly, we’re considering a system that would focus more on files that do not have curated categories yet.

Google クラウドビジョン

Google クラウドビジョンを通過する情報もすべて公開します。データは完全に匿名化してダンプを提供、コモンズファイルとそのタグの提言、採用されたタグを一覧化します。Google クラウドビジョンはウィキメディア・コモンズから完全に隔絶され、その機能はコアなコモンズの体験とは別個のものです。

Although there are open source computer vision platforms available to start from, any such package would require resources or specialized expertise to provide an industry-standard experience with computer vision that the Wikimedia Foundation is unable to itself provide at this time. The team recognizes that Google Cloud Vision is not open source software. There will not be any non-free or proprietary code written by the Foundation for this project; all contributions will remain open source.[clarification needed] Google will not have access to any private, non-public, personal information, there will be no direct communication between users and Google's service.

Architecture and workflow

Design of information flow in computer-assisted image tagging. The "machine vision" provider on the far right requests and sends potential tags for images; there is no personal information exchanged and the provider is isolated from the rest of the system and Commons.

Registered, auto-confirmed users will be able to opt-in through their preferences or uploading files. After some time has passed, the user will be contacted through their notifications that their uploads are ready for tagging at Special:SuggestedTags. Users who have opted-in can visit Special:SuggestedTags at any time to view files ready for tag processing. Anonymous users, new users, and users who have not opted-in will not be able to access Special:SuggestedTags.

The concepts that are available for tagging are ones that translate from Google Knowledge Graph IDs to Wikidata IDs. At 2.1 million triplets, the list is too long to catalog here, but is available for download as freebase-wikidata mappings.

開発の状態

ツールについて元々計画されていたすべての機能がすべて開発され利用可能になりました。開発チームは引き続き微調整をする予定であり、将来的には新しい機能の可能性もあります。

実装と利用に関する注意事項

  • No personal information is sent to the computer vision platform provider. At launch, this new feature will only use the Google Cloud Vision system, which will be accessed via a middleware layer that hides all user data. Commons images are sent to Google servers from Wikimedia Foundation servers. There will be no direct communication between the user and external services. No personal information (IP, username, etc.) is sent to Google servers. The middleware that contacts Google servers is a Wikimedia project and is open source. No part of Google's service or code will be part of Wikimedia infrastructure.
  • Suggestions from the computer vision will not be added to an image file’s structured data until a user has verified them: This service is provided as a means to augment human activity, not replace it. All suggestions from the computer vision service are stored in a separate, specialized database. Suggestions are not saved as structured data on the Commons file until a human user confirms them.
  • Users can opt in to receive notifications alerting them that their recent uploads have suggested tags. In the last step of the UploadWizard upload process, users have an option to enable notifications that will inform them when recently uploaded files have passed the waiting period and have tags available for confirmation. This option can also be found in User Preferences under Notifications.
  • User contributions that confirm suggested depicts tags are licensed as CC0. This data is equivalent to adding Wikidata to an image, and as such must be contributed under the same CC0 license that Wikidata uses. Clear license notices will inform users that all contributions made via the computer vision tool will be licensed under CC0.
  • Analysis of images on Commons: The feature will analyze only images, and provide suggested “depicts” tags based on the content of those images.
  • Certain types of images will be excluded: Some types of imagery on Commons are not well-suited for this type of system. Small images (less than 100px wide), artworks (identified via the Artwork template), book page scans, and other files will not be included.
  • Newly uploaded files will be analyzed, but not during upload: Commons users continuously monitor new files for vandalism, copyright violations, and relevance to the project. Files that don’t meet the criteria are marked for deletion. The new computer vision feature will only analyze new files after a waiting period has passed, and will not analyze files marked for deletion.
  • All tag confirmations show up as regular structured data edits with an edit summary tag that identifies their origin from the computer vision tool: This enables all the usual curation and moderation workflows so changes can be improved, edited, or reverted. It also helps us measure the revert rate and ensure that edits made using CAT are not more frequently reverted than the average edit.
  • Problematic tags can be blocked from being suggested: There is a blocklist of tags that will not be suggested by the tagging tool. The official blocklist currently exists within the configuration file for Commons and cannot be edited directly by the community, but suggestions can be made on the blocklist talk page.

Userbox

You can use this userbox in your user page.

{{User Computer-aided tagging}}
links talk view

This user uses Computer-aided tagging tool for tagging images.

This was a failed project

As early as 13 February 2020, experienced Commons users were complaining that the bulk of tags added using this tool were, as one put it, "way too vague, irrelevant or even detrimental". After numerous such complaints over the next several years, on 16 June 2023 the Sr. Director in the WMF Product department acknowledged that "We understand that the accuracy and utility of the tags generated by this tool have been called into question." After some study, on 14 September 2023 they announced, "we will be deactivating the tool on September 20, 2023, after completing the necessary code changes."