Commons talk:Batch uploading/Archive 1

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search
Archive 1 Archive 2 →

Things to add to this page

  • General flow of batch uploads:
    1. Find a source of free images
    2. Get this images and process the metadata to produce wikicode for Commons
    3. Upload the images
      1. By someone with an upload bot
      2. By one of the shells using import
  • Sample letters to send to potential image donors

(... please feel free to add things, it's just a list i won't forget things) Multichill (talk) 15:10, 7 June 2009 (UTC)

Hello?

Hello? Does anyone participating in this project watch this page? I would like to point out that a particular batch involving a University of Washington image bank requested a particular urgency...and I am surprised to see the image bank still exists, since there is a note on the site stating it was supposed to close permanently several months ago. Bob the Wikipedian (talk) 05:40, 19 September 2009 (UTC)

Well there were alot of things going on in the past few days. A batch upload of Wiki loves Art Netherland was started, Images from Troppenmuseum were imported and a tool for batch uploading from flickr was created. User:Multichill is currently the only one who is working on batch uploads, User:Dcoetzee was working on many uploads but since the NPG vs Dcoetzee case he has reduced his batch uploads. If you could contact the university and get the information database of these files and the list of file links, it would ease the upload and it would be delt with right away.--Diaa abdelmoneim (talk) 09:08, 19 September 2009 (UTC)

US federal goverment sites

I already added a couple of requests. air force and coast guard should be added too. Probably more nicely structured sites out there which can be copied to Commons. Multichill (talk) 22:47, 14 October 2009 (UTC)

Single air force images can be pulled from http://www.af.mil/photos/media_view.asp?id=314289, the id suggests the site contains a lot of images. Multichill (talk) 16:24, 23 October 2009 (UTC)
If possible, USFWS and NOAA would be nice to have. -- User:Docu at 18:14, 23 October 2009 (UTC)
How does the upload bot check if the image allready exists in Commons? Does it just check the VIRIN number or also other file characteristics like URL, file description or even more? --Zaccarias (talk) 21:01, 27 January 2010 (UTC)
It checks if a file with the same SHA1 hash already exists. Multichill (talk) 21:04, 27 January 2010 (UTC)

So an identical file would be found and the bot would not upload the image? The reason why I am concerned is because there a large number of US government sites are existing and some have the same pictures but sometimes they seem to have a different resolution.

I have worked on the images from the US Navy about Afghanistan and Pakistan and I have found a number of duplicates, not many but some. I think that we would end up with thousands of duplicates. In some cases I think it's just not possible to find still existing duplicates, but in many cases they could be avoided.

Examples:

The problem is that there are so many different websites. Due to this duplicates are quite likely to occur. I just think that everything which could be easily avoided should be taken care of. Maybe a bot could grab the Picture ID's somewhow and insert the {{ID-USMil}}-Templates. --Zaccarias (talk) 21:53, 27 January 2010 (UTC)

Swedish National Heritage Board asking about upload methods

I've been in touch with the Swedish National Heritage Board about the possibilities of donations. They have previous made a minor upload on Flickr Commons[1] and have announced it on there Swedish site here. Sophie Jonasson, the person in charge of the project is interested, but needs to convince her superiors and wants to know how much work is needed for uploads. I said that we had various scripting possibilities, but I said I'd ask about the details. Can someone expand on the methods for making group uploads, large or small? An especially important aspect is how to include meta-info.

Peter Isotalo 13:12, 29 January 2010 (UTC)

The Nordiska Museet upload would probably be the best example of something similar. Although it might be best to wait until those images are actually uploaded before using that as an example. Assuming they've already sorted the possible legalities (i.e. make sure they can actually release the rights to the images) I'm guessing the amount of work needed would probably depend largely on how well (or like Commons) organised their metadata is. At least that seems to be where the most work is spent on the Nordiska Museet images. /Lokal_Profil 15:05, 7 February 2011 (UTC)

Batch from Picasa or Panoramio ?

Hello,

A fellow French user is currently working on a project which would imply many pictures (hopefully :-), a staging area, then mass-upload to Commons.

At this moment, he is assessing which platform would be the best for the purpose of the staging area, and he is considering Commons itself, Flickr, Picasa and Panoramio. He asked on the French Village Pump about bot-upload from those websites.

Here is his question : is it possible/easy to perform a batch-upload from Picasa or Panoramio (given correct licensing of course) ? The technical feasibility and easiness would be a major criterion in the choice of the platform.

Cheers, Jean-Fred (talk) 15:45, 23 March 2010 (UTC)

With what purpose he is considering Flickr, Picasa and Panoramio? Those projects seem to be easy in use when individual images are uploaded with a project like Commons:Wiki Loves Art Netherlands. But with a large upload, I would directly upload at Commons. But I think you or other users may ask for example User:Multichill for help/comment, as he does do such things very often. Greetings - Romaine (talk) 19:29, 23 March 2010 (UTC)

Maps

Hey, do you know about this maps page http://english.freemap.jp/ ? emijrp (talk) 20:29, 13 June 2010 (UTC)

Hi. We will be producing hundreds of maps (hopefully thousands) pretty soon to upload to commons, based on that partnership. I would like to request some assistance and advice from this project. How would the uploading work? Cheers, GoEThe (talk) 11:05, 18 June 2010 (UTC)

Depends a bit on where the files are located (somewhere in the web, on your local disk, ... ?), and on the size of the files. If you e.g. have them on your disk, and they are 1 to 4 GB in total, you could burn a DVD from them and send it to me, and I can then do the upload. If they are on the web, they would have to be downloaded first... --Reinhard Kraasch (talk) 14:24, 23 June 2010 (UTC)
Probably best to create a new subpage to discus this. Multichill (talk) 14:55, 25 June 2010 (UTC)
I've done that at Commons:Batch uploading/IUCN red list. GoEThe (talk) 14:18, 6 October 2010 (UTC)

88gb of Public Health imagery

Hi all. I crawled the public health image library from the CDC a few months back: [2] . It's about 88gb of high resolution, print quality .tiffs. I'm setting up a mirror of the content and I would also like to upload the content to wikicommons. I have descriptions and some interesting MeSH metadata, but few titles in a local db. Would someone like to drop me a line and help me script an upload script? I think that the MeSH categorization data is particularly useful and I would hate to see it be separated from the images.

Sethwoodworth (talk) 03:17, 12 July 2010 (UTC)

Navy manuals

There is a series of manuals of the US Navy available on the internet, e.g. at

www.hnsa.org/doc/

These includes drawings and schemes on various parts of ships. After an initial batch, images would need to be extracted from the publications. If the PD status of these is ok, these could easily fit in to the current somewhat too similar collection of US Navy media. Even if they are somewhat dated, I think they would be a good addition.  Docu  at 05:44, 6 October 2010 (UTC)

Advertising

Past batch uploads section would be useful in convincing others to donate images to us, but it is missing a key component: links to media discussing them. Telling others that if they donate to Commons they will get free advertising in media through news articles is helpful, this would be a good place to show them some proof. --Piotr Konieczny aka Prokonsul Piotrus Talk 13:01, 22 December 2010 (UTC)

This is more about the technical side. You probably want to take a look at Commons:Partnerships. Multichill (talk) 16:35, 22 December 2010 (UTC)

Book-pages

I intend to upload pages from a book. It's plain text, no illustrations, no pictures.

The first pages are to be found in Category:Gamla Testamentet (Myrberg). - There is in total 314 pages. These files will be used on Wikisource.

I have designed some software in C# based on the DotNetWikiBot Framework. The code can be found here.

Do you have any objection about the use of such software for uploading files? (I use similair code to upload text from Finereader to Wikisource.) -- Lavallen (talk) 17:10, 10 February 2011 (UTC)

Updating this page

The Commons:Batch uploading page has numerous old requests and in-progress requests, many that are GLAM-related. The status of many is unknown. I would like to work on getting this updated and improve the process for requests. Right now, as a new/aspiring batch uploader, it's difficult to know where to begin. If anyone wants to help, that would be awesome, or otherwise I'll try poking people. -Aude (talk | contribs) 20:18, 9 March 2011 (UTC)

The Prado in Google Earth

FYI, I haven't listed it on this page but I'm currently uploading a small set of very high-resolution works from the Prado in Google Earth project, to be placed in Category:Prado in Google Earth. There were a couple there already, including one featured picture, but they aren't nearly as high res as they could be. Dcoetzee (talk) 07:06, 13 May 2011 (UTC)

This is done. Dcoetzee (talk) 19:28, 23 May 2011 (UTC)

C2RMF

I'm currently in the process of uploading a set of 22 high-resolution (about 100-300 megapixel) works from C2RMF, from French museums. These include the famous Mona Lisa. These will go in Category:High-resolution images from C2RMF. Dcoetzee (talk) 21:16, 5 June 2011 (UTC)

This is also done. Dcoetzee (talk) 21:28, 26 September 2012 (UTC)

New batch upload in progress

I'm doing a new batch upload at the moment, in the downloading stage right now. I'm avoiding discussing it in public but please e-mail me if you want more information. Thanks! Dcoetzee (talk) 21:30, 26 September 2012 (UTC)

Now announced at Commons:Village pump#New_Google_Art_Project_uploads_have_begun. Dcoetzee (talk) 06:11, 30 September 2012 (UTC)

Listing criteria (e.g. minimum number of files)

I've just created a new request, but it's for around only 20 files. I couldn't find any reference here to a minimum number of files - perhaps I should've looked somewhere else. Guidance appreciated. Thanks. -- Trevj (talk) 12:17, 9 January 2013 (UTC)

Purpose of this page

Hi all. Just wanted a clarification. Is this page/project intended primarily for people looking for bot operators to help them or is it also intended for discussions about formatting, problems, recommendations related to batch uploads which already have a bot operator. In other words if I'm intending to do a batch upload myself should I start a post here or should the whole thing be contained to Commons:Bots/Requests? Cheers /André Costa (WMSE) (talk) 15:52, 5 February 2013 (UTC)

Both. If you plan to do a batch upload, please open a page here so we can discuss it :). Jean-Fred (talk) 16:43, 5 February 2013 (UTC)
That's what we did last time but reading the intro now I suddenly became unsure. Will write an entry later today or tomorrow. /André Costa (WMSE) (talk) 22:12, 5 February 2013 (UTC)
Done. By the way I've put a question about Chunked uploads on the subpage as well, any help is welcome. /André Costa (WMSE) (talk) 09:33, 7 February 2013 (UTC)

RFC: Minimum requirements for categorization

I have created Commons:Requests for comment/Batch categorization requirements to gain a community consensus on guidance for batch uploaders as to use of backlog categories. This is a frequent complaint about batch uploads, probably as folks don't appreciate how working with cooperative teams means that resolving a batch upload backlog may take many months. If the community is against this approach, then Commons may fail to preserve large batch uploads where categorization is not obvious from the source metadata.

Opinions from experienced batch uploaders to the RFC would be highly welcome. Thanks -- (talk) 10:01, 29 September 2013 (UTC)

12,000 Dutch colonial maps now online from Indonesia, Dutch Antilles and Surinam

This may be of interest: https://twitter.com/bl_eap/status/454263083012087808/photo/1. See also http://www.library.leiden.edu/special-collections/maps/introduction-maps.html. — SMUconlaw (talk) 16:02, 10 April 2014 (UTC)

Cleaning up

This page has become pretty unwieldily. Maybe time to clean up, and move stuff to subpages? Husky (talk to me) 07:56, 13 May 2014 (UTC)

Opinions about a mass content donation offering

The Open Culture Data network in the Netherlands has offered to do an extremely large content donation. I'd like some reactions to this proposal.

Ter-burg (talk) 08:28, 20 May 2014 (UTC)

From Panoramio to Commons?

Hi, I want to donate all my photos about railroads in Panoramio to Commons. I saw earlier a bot and a special request page for Flickr, but nothing for Panoramio. How can I request a mass upload in this case? I do not want to upload nearly 1200 photos manually. --ProgramadorCCCP (talk) 23:00, 12 October 2015 (UTC)

Looking for an experienced importer to work with a developer to create a Commons importer module for ePrints

Hi all

I'm currently working at UNESCO with a developer who is working on a module for ePrints to automate importing content into Wikimedia Commons and retrieving data (e.g translations) via the API. This will allow UNESCO and anyone else using the ePrints software (generally large organisation) to import content to Commons easily. I'm looking for someone/some people who are experienced with mass importing content to Commons who could assist them with with understanding and working with the API interface. We would like to do this this month.

Many thanks

John Cummings (talk) 16:06, 7 April 2016 (UTC)

Collection of documentation about batch uploading

I am not sure what to call the act of "uploading a set of media files with metadata to Wikimedia Commons". Maybe that is "batch uploading". I was trying to find the main page for discussing this concept. There are these -

I was thinking to collect all the documentation in one place somewhere, if this has not been done already. I am not sure if a main page already exists. Hmm... Blue Rasberry (talk) 15:48, 30 September 2016 (UTC)

Archiving

Hi, There are stalled requests from 4 years ago. Shouldn't archive them? Regards, Yann (talk) 15:44, 16 September 2017 (UTC)

My 5 cents: Depends on the reason for the stalling - if the request is just undecided or nobody willing to execute it, it should stay, if the request has been declined (or at least: there is a strong opinion to decline it, as e.g. in "Works of Maurice Ravel") it should be archived. --Reinhard Kraasch (talk) 18:48, 17 September 2017 (UTC)
OK, thanks. Actually works of Maurice Ravel are OK. There have been articles in several newspapers or magazines about them coming into the public domain, i.e. [3]. Regards, Yann (talk) 18:56, 17 September 2017 (UTC)
So "Works of Maurice Ravel" is maybe a bad example. The main argument to archive this might be another one I did not mention above: Less than 100 images can also be handled without a batch upload. And I think the originator of the request should be informed that it has been archived for this or that reason. --Reinhard Kraasch (talk) 23:06, 17 September 2017 (UTC)

Reviewing batch upload tests

Hello, I have fun few test for a batch upload operation. Can someone review the tests so far before I can run more imports?

Cheers, African Hope (talk) 18:16, 4 December 2017 (UTC)

Missing book page from IA

I cannot find any of the two images on this page. Could it have escaped for some reason? Should I upload manually? (@: ping!) -- Tuválkin 16:55, 9 April 2018 (UTC)

It may have been missed, it depends on which batch process was uploading related pages. In same cases the url page reference to IA is off by one in the uploaded image. Keep in mind some of my projects are several years old, it takes some work to rediscover how they did what they did. -- (talk) 17:06, 9 April 2018 (UTC)
  • ✓ Uploaded manually:
(omg, filenames with no spaces!) -- Tuválkin 15:40, 30 June 2018 (UTC)

Commons:Batch uploading/PauloGuedes

I created Commons:Batch uploading/PauloGuedes; do I need to add anything else? -- Tuválkin 16:20, 4 July 2018 (UTC)

Huge batch upload

Hi there. I need help with huge upload. It is 23 415 works (graphics, paintings, photography, plans) digitalised from slovak galleries. All in public domain licence. User Fjaker from slovak wiki is working for Slovak National Gallery and have access in www.webumenia.sk. But we dont have experience in this. Can someone help us?

P.s. some pictures was already uploaded. @Multichill:

--Vegetator (talk) 09:36, 8 March 2020 (UTC)

@Vegetator: Yes, I have access to Webumenia and have uploaded most of the paintings that are in the public domain. I actually visited the developers for Webumenia at the Slovak National Gallery. I'm sure they will be happy to have more of their content here. At the moment the API still has an issue that needs to be solved. When that is solved I plan to sync up the paintings .
I could probably also upload other things if people are willing to sort it out here. Multichill (talk) 11:29, 8 March 2020 (UTC)
Nice. If you could upload others thigs it will be great. How can i (we) help with sorting? --Vegetator (talk) 13:22, 8 March 2020 (UTC)

Reactivate and reorganize?

Hi! I stumbled upon this page a few days ago, and noticed that it says that it might be inactive. We had some ideas from Wikimedia Sverige, and the work with the Content Partnerships Hub initiative, on how it could be used in the future. So I would like to ask whether it would be ok or even appreciated if we spent some time on refreshing the page, perhaps reorganizing it a bit to make it easier to overview? I think that the page could be of high value, not the least for the work that we will be doing. But I don't want to do any of it unless approved by those active on this page. Eric Luth (WMSE) (talk) 13:55, 5 October 2021 (UTC)

Hi Eric, I would find it useful if this page was refreshed. I find the documentation on batch uploading very helpful, but a way of raising issues with similar users would still be helpful. Wjbfarrell (talk) 15:22, 6 January 2022 (UTC)

Plans to batch upload 17th century portraits

Hi, I am planning to batch upload media from the Fairclough Portrait Collection (portrait engravings of people in 17th Century Britain) in the next few days. This is a digitised collection from the University of Leicester Library where I work. I'd appreciate any feedback anyone has on the test I plan to do soon. --Wjbfarrell (talk) 15:37, 6 January 2022 (UTC)

Lazy categorizing images

Commons:Batch uploading

Commons:Batch uploading is all well and fine but it is crowding out location categories such as Category:Norwich, which now has images 2,676 in total. I have been working to subcategories them but as fast as I work this bot is adding more. I have also come across many images which do not belong there as well. This is a very disruptive way of adding content with a lazy and incompetent way of categorizing images. Please stop! Kolforn (talk) 12:45, 13 January 2022 (UTC)