Commons:Bots/Requests/Emijrpbot (Task 5)

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Operator: emijrp (talk)

Bot's tasks for which permission is being sought: Put language template in descriptions ({{En}}, {{Fr}}, {{Es}}...).

Automatic or manually assisted: Automatic

Edit type (e.g. Continuous, daily, one time run): Continuous

Maximum edit rate (eg edits per minute): 10

Bot flag requested: (Y/N): No

Programming language(s): Python (pywikipediabot)

emijrp (talk) 18:26, 23 December 2009 (UTC)[reply]

Discussion

Hi. I want to put the language templates in {{Information}} template descriptions, like this (direct example). I use Google API to detect the language.

The bot checks:

  • If image uses {{Information}} template
  • If description parameter is filled
  • If description is only a line (not \n inside)
  • If description is greater than 50 bytes and smaller than 2000
  • Then, go to Google and check the language
  • If the response is "reliable" with > 0.8 of confidence, then introduce the template

These options can be customized. Regards. emijrp (talk) 18:32, 23 December 2009 (UTC)[reply]

The code is here. emijrp (talk) 19:41, 26 February 2010 (UTC)[reply]

 Support I think it is a good idea and an useful service. I have some questions: Do you go to the next file if description already have language template? I understand the low limit on description length - it is to ensure enough text for google to make its decision by why do we have upper limit? Or should we just truncate (from the middle?) strings longer than 2000 characters? --Jarekt (talk) 19:47, 23 December 2009 (UTC)[reply]
I read somewhere that Google API doesn't accept more than 5000 bytes, so I think that it is the limit. Really, I put 2000 bytes for testing purposes. emijrp (talk) 11:27, 24 December 2009 (UTC)[reply]
Your other question: if a description field contains any template (i.e. language templates), bot skip that image. emijrp (talk) 12:53, 20 February 2010 (UTC)[reply]
I think will be good idea to run test manually and check validity of Google guesses. Will be good idea to check bot on descriptions of photos made in Europe where English words may be mixed with native place names. Also mix of Latin and English in plants/animal photos. --EugeneZelenko (talk) 15:47, 24 December 2009 (UTC)[reply]
Good idea. Where can I get a bunch of those images? emijrp (talk) 17:46, 29 December 2009 (UTC)[reply]
I think big cities categories of France, Poland, Czech Republic, etc. is good place to look at. --EugeneZelenko (talk) 16:13, 30 December 2009 (UTC)[reply]

I will work in this request in the next weeks, sorry about delay. emijrp (talk) 00:35, 6 February 2010 (UTC)[reply]

Hi, I'm running the bot in manual mode, while I make tests and fix bugs. New code. emijrp (talk) 15:57, 20 February 2010 (UTC)[reply]

I'm going to ask to Google AJAX Language API community if I can do many request to de API, because this bot would do millions of them. emijrp (talk) 17:02, 20 February 2010 (UTC)[reply]

OK, I can run all that bunch of queries to the API. I'm going to do some more tests in the next days. emijrp (talk) 19:40, 26 February 2010 (UTC)[reply]

Call the question. This looks OK to approve. Any objections? I propose to do so within a week if none are raised. I think no bot flag for now, to allow examination of the proposed languages, but we can revisit that if the author desires after there have been more runs. ++Lar: t/c 16:21, 27 March 2010 (UTC)[reply]

No bot flag? So, how many edits per minute? emijrp (talk) 10:26, 4 April 2010 (UTC)[reply]
Somewhere around 5-10 feels right to me. I meant to approve this sooner since no one else raised any issues but I guess I want to hear from you... do you have a problem with a no-flag approval? ++Lar: t/c 12:39, 13 April 2010 (UTC)[reply]
As this is a very new task and it is not unlikely that the bot will not select the right language (foreign words, citations). The remove of the botflag for this task should be considered, at least at the beginning. --Schlurcher (talk) 15:22, 13 April 2010 (UTC)[reply]
Maybe it should be running on a separate account. -- User:Docu at 15:25, 13 April 2010 (UTC)[reply]
Jep, good idea. --Schlurcher (talk) 15:27, 13 April 2010 (UTC)[reply]
Some users indicate language by bolding the language name (like "German: Beschreibung"). Chances are that when the user has made multiple translations of the description this way, google will have to pick one. I suggest adding a script in to recognize bolden language words both with the collin inside and outside the bolding (to remove it eitherway) - and if so, pick that one instead of the Google suggest one. ie. Description = '''German:''' Beschreibung.<br />'''Français''': description |Source > into > Description = {{de|1= Beschreibung}}.<br />'{{fr|1= description}} |Source
If possible even remove the <br />, but it I think it's to hard to do that without making mistakes, either way is good.
Also, note that I have the {1}-parameter set, instead of an empty one. This may be good for the bot aswell, since descriptions could contain external links and thus equal signs (the Upload-script also does it with the {1}-parameter) –Krinkletalk 16:10, 23 April 2010 (UTC)[reply]
PS: A list of native language words is available in MediaWiki's Magic words. I'm sure it's possible to get that list somehow. That would be a good start for the detector. Template:Language might have some handy things too. –Krinkletalk 16:15, 23 April 2010 (UTC)[reply]

Assuming there is still interest in this task, I think it's safe to ✓ approve without a bot flag for the time being. –Juliancolton | Talk 14:52, 29 June 2010 (UTC)[reply]

I will second that. --Jarekt (talk) 15:49, 29 June 2010 (UTC)[reply]

Hi again. I'm doing more tests. You can see the bot running in recent changes. A warning has been placed in its userpage. emijrp (talk) 12:56, 31 July 2010 (UTC)[reply]
I have done >500 edits with flag disabled, an no error was found. I stopped because of this. So, in the next runs I need enable flag. emijrp (talk) 08:02, 1 August 2010 (UTC)[reply]
I checkt all {{de}} and all where right. If someone could check {{fr}} and it is the same I have no objections that it can run in bot-mode. --Schlurcher (talk) 16:40, 2 August 2010 (UTC)[reply]

Can I do another trial of ~500 edits? emijrp (talk) 19:45, 27 November 2010 (UTC)[reply]