Commons:Bots/Requests/Cxaux-cxaux-bot
Hi everyone, I'm super nabla (joined Wikipedia in 2020, more active in wiki projects since 2022, autopatroller right on Italian Wikipedia since 2023, running occasionally bot it:Utente:Cxaux-cxaux-bot in itwiki).
Recently, I had the pleasure of participating in the Satellite Hackathon Event in Palermo this year. I'm currently dedicating my spare time to an open-source object storage aimed at trying to reduce the current time, space, energy and financial costs associated with large-scale scraping and crawling operations (a problem currently faced by Wikimedia and discussed, e.g., here: [1]).
For this purpose, I've developed a software (object storage) prototype that has encouraging initial results. However, to truly evaluate its effectiveness, I would greatly benefit from the opportunity to download approximately 200 GB of SVG images to run tests on real-world, large-scale data.
It would be highly valuable to me if I could receive a bot flag to facilitate this data download. This would allow me to finalize my experiments and prepare the code for release. I genuinely believe this prototype has the potential to make a real impact by improving energy and time efficiency, and hopefully contribute meaningfully to the betterment of Wikimedia software.
Thank you all in advance for considering my request and for any assistance you might be able to offer. I really appreciate your help.
Operator: Super nabla (talk · contributions · Statistics · Recent activity · block log · User rights log · uploads · Global account information)
Bot's tasks for which permission is being sought:
- Tasks. To download approximately 200 GB of SVG images for research and testing of an open-source project focused on mitigating the costs of large-scale scraping/crawling. No editing will be performed.
Automatic or manually assisted:
- Supervised data download only. No edits will be made by the bot.
Edit type (e.g. Continuous, daily, one time run):
- N/A (No edits will be made; this bot will only download SVG files
Maximum edit rate (e.g. edits per minute):
- N/A (No edits will be made; this bot will only download SVG files
Bot flag requested: (Y/N):
- Y (I've never run a bot on Commons)
Programming language(s):
- Python3 (using the "requests" library and RESTful APIs)
—super nabla¶ 14:37, 15 April 2025 (UTC)
- Discussion
I think this isn't approvable. Wikimedia bots are for improving Wikimedia projects, not for external or personal use. You can access Commons data with a normal account. --Krd 15:30, 15 April 2025 (UTC)
- @Krd Thanks for the feedback. I understand the concern about bot usage. My project aims to improve Wikimedia's efficiency in handling large-scale data retrieval, as highlighted by Wikimedia itself. The 200GB download is crucial for testing this optimization and potentially benefiting the project's infrastructure. If specific guidelines restrict this during development, please point me to them. Otherwise, given Commons' open nature and the potential benefits, I hope for community support in this testing phase. Thanks for your consideration.—super nabla¶ 16:14, 15 April 2025 (UTC)
- How does a bot flag help you with retreiving the data? If the Wikimedia Foundation supports your project, please obtain a global flag. Krd 16:46, 15 April 2025 (UTC)
- Are the rate limits different if you have a bot flag? I think you can scrape the contents normally, maybe using multiple machines and up to a few days of time. Prototyperspective (talk) 20:52, 15 April 2025 (UTC)
Declined per above. --Krd 07:42, 21 April 2025 (UTC)