Commons talk:Library back up project

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

How to upload 1927-1949 books? 如何上傳民國圖書?[edit]

I have obtained a large Chinese books database. People can't get these files online now so there is an urgent need to upload them. I have uploaded file published 95 year ago (1209-1899 1900-1910 1911-1920 1921-1926) because they have entered PD in US and highly likely author died over 50 year. How to upload books published 1927-1949? I can't verify one by one myself due to the large number. There are two ways:

  1. Publish the book list and request users identify PD books and copy to another list. I will upload the identified PD books. Pro: Good copyright protection. Con: Author information is difficult to find for the old authors. Only a small proportion is expected to be uploaded.
  2. Publish the book list and request users to delete non-PD books from the list. Then I will upload every book remain on the list. If non-PD books not removed by users were uploaded and found to be non-PD after the upload, then any user could report to delete them. Pro: A large proportion will be uploaded, which is good for the preservation of old books. Cons: It will increase the workload for admins to delete after the upload.

Which way should I upload them? Please comment.

我获得了许多中文书籍。人们现在无法在线获取这些文件,因此迫切需要上传它们。但是,有些人进入了公有领域(PD),而其他则没有。我应该如何上传它们?由于美国保护期为95年,而且这时期出版图书的绝大部分作者已经去世超过了50年,我已经上传了95年前出版的图书(1209-1899 1900-1910 1911-1920 1921-1926)。由于数量众多,我无法自己一一验证。对于1927-1949年出版的图书,由于数量众多,我无法自己一一验证,有两种上传方法:

  1. 发布图书列表,并请求用户识别 PD 图书并复制到另一个列表。 我将上传已识别的 PD 书籍。 优点:良好的版权保护。 缺点:老作者很难找到作者信息。 预计只有一小部分图书会被上传。
  2. 发布图书列表,并请求用户从列表中删除非 PD 图书。 然后上传每一本保留在列表中的图书。 如果非 PD 图书没有被用户删除,上传后发现是非 PD,则任何用户都可以举报删除。 优点:会上传很大一部分,有利于旧书的保存。 缺点:会增加管理员上传后删除的工作量。

请问哪个方案比较好?请留言。--Upload for Freedom (talk) 13:03, 5 November 2022 (UTC)[reply]

可先上传书名、作者姓名、甚至作品影印的图片一张,正文确定是PD之后再说。如果是民国的中文书籍,遵守中国法律50年其实就可以了,比如现在老舍的一些1949年之前的作品已经在上传 Thering29 (talk) 02:17, 8 March 2023 (UTC)[reply]

Scheme 1[edit]

Scheme 2[edit]

Comments[edit]

直帹丄傳絟蔀口巴,仮罡φ忟蝂權沒仒СаRЁ。--RZuo (talk) 07:42, 8 November 2022 (UTC)[reply]

You mean 直接上传全部吧,反正中文版权没人care ? That's great. However, I might get blocked or have my bot status removed if I force to upload a large amount of non-PD books. I want to allow other users to remove non-PD books before and after my upload and I think that would be a good balance between copyright protection and old book prevervation. I want to have support over uploading in this way, please support Scheme 2. Upload for Freedom (talk) 11:46, 8 November 2022 (UTC)[reply]
now i have an idea. if you have the authors' names, you could match them against wikidata, then query their date of death (P570). in any case this should be some automatic job instead of manually checking, when your list is too large. RZuo (talk) 14:48, 8 November 2022 (UTC)[reply]
most authors won't be identified in this way Upload for Freedom (talk) 04:44, 9 November 2022 (UTC)[reply]
Write a script to check against wikidata, worldcat.org and loc.gov for author's death date, given the title or author's name. --Happyseeu (talk) 16:41, 15 November 2022 (UTC)[reply]

Proposal: Tolerate users to upload pre-1949 Chinese books 建议:容忍用户上传1949年和以前的中文书籍[edit]

Wikipedia is the world's most visited non-for-profit website. Wikimedia Commons is its companion website for hosting free media files. One aim is to maintain the files are free. But from a historical point of view, another important aim is too preserve the world's civilization, including old books.

There is a category of Chinese books are rare and need preservation. That is those published during the Kuomintang rule (1911-1949). They are considered non good thought by the new China so seldom reprinted so are often rare, and need preservation, especially considering prospect of the Mainland-Taiwan war. However, some books have not entered public domain. It would be too difficult to identify the vast amount of books. No one would be able to identify them and upload them. As a result, these books could disappear one day. What a loss! On the other hand, copyright laws must be obeyed. But it there a way to pursue both?

I propose to tolerate users to upload pre-1949 Chinese books with the following condition:

  1. After the upload, the uploader will positively identify non-PD books using author information from wikidata and put them to a list for deletion. These deletions should be done by an admin bot. They will be tagged with the year of restoration. They will be batch restored by admins when they enter PD.
  2. The uploader must publish a list of uploaded books including file name and author name information. Other users will be welcomed to identify and nominate the deletion of non-PD books.

This does NOT intend to change any Commons policy. It only indicates uploaders won't be punished for uploading these books.



维基百科是世界上访问量最大的非盈利网站。 Wikimedia Commons是其托管免费媒体文件的配套网站。一个目的是保持文件是自由的。但从历史的角度来看,另一个重要目的是保护世界文明,包括旧书。

有一类中文书籍很稀有,需要保存。那是在国民党统治时期(1911-1949)出版的那些。它们通常被新中国认为是不好的思想,很少再版,因此往往很少见,需要保存,特别是考虑到大陆与台湾可能的战争。但是,有些书籍尚未进入公共领域。识别大量书籍太难了。没有人能够识别它们并上传它们。没人从中国传播这些书的结果就是,这些书可能有一天会消失。真是个损失!另一方面,必须遵守版权法。但有办法兼顾两者吗?

我的目的是容忍用户上传 1949 年和以前的中文书籍,条件如下:

  1. 上传后,上传者会利用wikidata中的作者信息,积极识别非PD书籍,并将其放入列表进行删除。这些删除应由管理员机器人完成。它们将被按恢复日期标注标记。进入PD后,管理员会批量恢复。
  2. 上传者必须发布包含文件名和作者姓名信息的上传书籍列表。欢迎其他用户识别并提名删除非PD书籍。

这并不打算更改任何 Commons 政策。它仅表明上传者不会因上传这些书籍而受到惩罚。--Upload for Freedom (talk) 12:37, 12 November 2022 (UTC)[reply]

Support[edit]

Oppose[edit]

  • Yes, the point is to upload books on Commons, as there is a much better chance of long-term preservation here that on other projects. As long as the books still under a copyright are deleted (and placed in the relevant Undelete category), there is no policy on Commons which doesn't allow this. Yann (talk) 22:21, 3 February 2023 (UTC)[reply]

Comments[edit]

  • Why should Commons be the vehicle for this? It is going to take a lot of work, and not just by the uploader(s). This would seem like a more natural project for the Internet Archive.
  • If some subset of these works can be identified as now being in the public domain, I'm completely in favor of that being updated. The further in the future that the material will come into the public domain, the less I see Commons as the appropriate vehicle. - Jmabel ! talk 17:46, 12 November 2022 (UTC)[reply]
    There is no better project than Wikimedia Commons. It is linked to Wikisource, where people can transcribe to text and Wikipedia, Where people can insert as illustration. Therefore, when a died old book is uploaded here, it becomes alive. Freedom gives the book life.
    没有比维基共享资源更好的计划。它与维基文库相连,人们可以将文本转录。它与维基百科相连,人们可以加入插图。因此,当一个死去的老书传到了这里,它就活了!自由给予了书籍生命。--Upload for Freedom (talk) 12:09, 15 November 2022 (UTC)[reply]
This is an excellent point. Wikisource is a valuable repository of content as text can be easily cut and pasted for quotation w/o fear of copyright infringement. I'm in favor of a solution that would help expand the content of Wikisource. --Happyseeu (talk) 16:20, 15 November 2022 (UTC)[reply]
  • I prosposed this entirely for the public interest. I won't gain anything personality from this. 我完全是为了公众利益提出此案,我完全不会获得私利。--Upload for Freedom (talk) 12:16, 15 November 2022 (UTC)[reply]
  • I am doubting whether there is any guarantee that deleted files could be restored in the long term, or if they are under the same storage/backup level as published files, considering the proposal is for the far future. And, if the proposal is actually accepted and such actions are allowed, such files may be listed in Category:Undeletion_requests. --虹易 (talk) 12:52, 15 November 2022 (UTC)[reply]
  • I think there are some problems:
    • First, Commons is not a bill-free storage space. Using deletion to hide save files, I don't know how the foundation will comment, but they are very dissatisfied with some people uploading pirated movies to the foundation's server, and then using the Wikipedia Zero project to achieve free Internet fee playback;
    • Second, you'll need to enlist the assistance of an administrator to delete and then restore the file. Based on your record of public communications, it doesn't appear to have been successful.

--Cwek (talk) 00:57, 16 November 2022 (UTC)[reply]

Thank you for you comment. I am not proposing to upload any movies, it's just books. They are tiny in file size compared with the vast amount of pictures and videos uploaded to the site every day. I have asked some admins to rapid delete some recently published books uploaded by mistake. It only needs to add a template to the deleted file page so you cannot see them. Upload for Freedom (talk) 06:14, 16 November 2022 (UTC)[reply]
"You asked", but did they promise? The plan seemed complex and relied on trick. All uploaders need to know this plan, upload files according to this plan and mark them; all administrators need to know this plan, and after deleting files, they need to add a mark to a page where the file has been deleted and does not exist; These files should be restored periodically. The plan looks pretty, but I don't look good. At least some admins and power users have been advised not to upload files to Commons like this, Or it should have a better place to save them. However, I think, whether the foundation considers this problem, it can set up a main data center that does not store data in the United States to store these documents or documents that do not meet the copyright requirements of the United States, so as to avoid this problem. These issues are more likely to be recommended to the Foundation, and less likely to lead to actual action here. --Cwek (talk) 03:33, 17 November 2022 (UTC)[reply]
  •  Comment, as of writing this "Category:Undelete in 2023" contains 239 (two-hundred-and-thirty-nine) pages and a sub-category, we have "Undelete in 2XXX" pages going on for over a century. Honestly, I think that if user "Upload for Freedom" had just contacted user "Yann" directly and said "Hey, I want to upload these books, can you speedy delete them for me and tag them for undeletion later?" That this entire project would have been successful without so many nay-sayers, simply because I assume that because we're all volunteers here that some volunteers feel like they would be wasting their free time on deleting and then undeleting files, despite the fact that there are users more than willing to do that. In fact, admins generally do what they want because of vague policies, for example deleting a page of a "non-contributor" with the justification "deleted page File:Profile pic for self.png (Personal photo by non-contributors (F10)) " despite this "non-contributor" having 20,405 global edits as of writing this, no wonder lots of users disengage from the system without appealing. Actual abusers won't stop their abuses but good faith users who feel the blunt and have a lot to offer end up leaving. Regarding the comment "there are some problems: **First, Commons is not a bill-free storage space. Using deletion to hide save files, I don't know how the foundation will comment, but they are very dissatisfied with some people uploading pirated movies to the foundation's server, and then using the Wikipedia Zero project to achieve free Internet fee playback" This was actually done by a ring of users seeking to use the Wikimedia Foundation's servers to stream films illegally, but every deleted file still stays on the Wikimedia Foundation's servers, this is why the "Undelete" option is even possible. Neither server space nor potential server abuse are a problem here.
Uploading a file to be undeleted later isn't a bad thing because the Wikimedia Commons isn't a short-term project designed exclusively for people who need it right now, the future undeletion system recognises the educational value of storing in-copyright files indefinitely but "invisibly" for future generations. The idea that "These files are better hosted at the Internet Archive" is flawed, in fact user "" started an entire import campaign to import books from the Internet Archive because their Library was in jeopardy over legal issues. I simply don't see why this project is controversial as we host thousands of files that are currently planned to be undeleted later and probably tens of thousands of files that will be undeleted at some point in the future. If volunteers are willing to invest their time into making this project work then I don't see why people who don't want to "waste their time" on this are wasting their time arguing against it. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 12:39, 17 November 2022 (UTC)[reply]
It appears that the Commons has always had this practice of deleting files pending entry into the public domain with categery mark until the time is up for reinstatement. This practice is not complicated, it just need to apply for deletion immediately after uploading and explain that it needs to be restored when it expires, and the administrator should know this practice and deal with it accordingly. If this practice is always feasible, I don't think it is necessary to find another way, but follow the existing practice. --Cwek (talk) 00:41, 18 November 2022 (UTC)[reply]

Books uploaded! Do you want more?[edit]

@Yann, 源義信, 虹易, 2600:6C40:59F0:85F0:390F:3E3A:E100:669C, Donald Trung, Zoozaz1, and Yinyue200: Thank you for your support! I have uploaded 0.2 million files with SSID. I have filtered files for uploading using the 3 criteria:

  • All books published before 1950.
  • Books without date in metadata. I have filtered those likely to be old books.
  • Books published in 1950 and later. I have filtered those likely to be reprints of old books. Plus old news by Xinhua News agency.

There are many more books that did not match the above criteria but in public domain. I have uploaded the entire books list here. The table contains uploading status. (已上传 and 未上传)

https://easyupload.io/cvyppu

If you want seeing more books uploaded, please put the SSID of the book (the first column in the table) here: User:Upload for Freedom/SSID. I will upload the books. NOTE: THE BOOKS COULD DISAPPEAR ANY TIME. I WILL ALSO REMOVE MY UPLOADING ENVIRONMENT SOON. I CAN ONLY RESPOND REQUESTS 20 DAYS FROM NOW.--Upload for Freedom (talk) 02:25, 10 January 2023 (UTC)[reply]

After that, I will filter author who died within 50 years based on wikidata for deletion and put a list of the books on Wikimedia Commons.--Upload for Freedom (talk) 02:19, 10 January 2023 (UTC)[reply]

@Upload for Freedom: Hi, Thanks for the update. IMO there should be the date in the description, i.e. File:CADAL02079034 明史(一).djvu. Do you plan to upload all books listed in Commons:Library back up project/file list/NLC/民國圖書/01 (and other pages)? And in File:SSID-10000424 使藏紀程.pdf, we need either the original publication date, or the date(s) for the creator (best is to use the Creator templates). Regards, Yann (talk) 10:01, 10 January 2023 (UTC)[reply]
Hi, Yann. NLC books are uploaded by the user 虹易. I hope he/she will upload those books as well. Meta data for the year of creation is unavailable for some books. Some of them are ancient and some are new. I have manually identified those likely to be old (in chunks) and uploaded them. It was dull! I will make a list containing author information from wikidata.--Upload for Freedom (talk) 13:12, 10 January 2023 (UTC)[reply]
@虹易 and Upload for Freedom: For File:CADAL02079034 明史(一).djvu, the date is in Category:明史, but it should be in the description as well. Yann (talk) 15:10, 10 January 2023 (UTC)[reply]

words[edit]

(著|编|撰|補|註|译|編|辑|譯|纂|修|輯|校|述|注|等|作|编辑|编译|编著|编纂|編輯|編譯|編著|編纂|标点|處編|德著|等编|等译|等著|等撰|合译|合著|校订|校阅|选注|译述|譯述|原著|主编|主編|選註|標點|集解|奉敕撰)


--Upload for Freedom (talk) 13:20, 4 February 2023 (UTC)[reply]

[一二三四五六七八首九十百千万]+[卷回類種種]$--維基小霸王 (talk) 11:59, 27 June 2023 (UTC)[reply]

Failed uploads[edit]

Hi, I saw this list, and I wonder if I could help. What was the process and the error? Regards, Yann (talk) 11:41, 7 May 2023 (UTC)[reply]

Upload request[edit]

--Kcx36 (talk) 08:42, 15 May 2023 (UTC)[reply]

  • low-resolution thumbnail[2], tile[3], and an existing pdf directory with possible files [4]. I failed to infer the possbile filenames in that pdf directory, though. 虹易 (talk) 12:53, 4 September 2023 (UTC)[reply]
    @虹易: 云南省图书馆的一些资源会提供给国家图书馆网站发布,您之前已经上传了726册,现在国图又更新了127种、约527册,请问您是否有时间继续上传?我从云南古籍数字图书馆爬取了国图更新的书籍的元数据[5],希望能将其与国图的元数据整合后上传。非常感谢!
    国图网站已发布了云南古籍数字图书馆三分之一([726+527]/3781)的书籍。国图的清晰度不如云南古籍数字图书馆,但好处是水印小。--Kcx36 (talk) 17:00, 8 February 2024 (UTC)[reply]
  • @Kcx36: 谢谢提醒!NLC的图书和分类繁杂,元数据结构不一。之前已上传的涉及数十个分类,每个都需视情形调整,耗时许久。目前暂时还没有持续自动地跟进更新的良策。我会先把目前新增的图书上传。--虹易 (talk) 01:21, 24 March 2024 (UTC)[reply]
  • Dunhuang Manuscripts [6]. 2000 year old Dunhuang manuscripts stolen and destroyed for money. The surviving ones still behind a paywall. What a shame. They are all from gobal libraries. We should find the manuscripts from the sources and upload here to make them freely avaliable. --維基小霸王 (talk) 14:06, 17 May 2023 (UTC)[reply]

https://ndl.iitkgp.ac.in/result?q={%22t%22:%22sourceOrganization%22,%22k%22:%22%22,%22s%22:[],%22b%22:{%22browse%22:%22sourceOrganization%22,%22filters%22:[%22sourceOrganization=\%22Digital%20Library%20of%20India%20(DLI)\%22%22]}}

The open items are "This content has been donated by DLI and to the best of our knowledge the content is copyright free. However, if anybody has any copyright violation concern it may be brought to the notice of NDLI with relevant evidence at content-admin@ndl.gov.in and NDLI will do the needful." so I think can be uploaded here. They are avalible in pdf without the need of login.

The difficulty is to obtain the item list. There token is posted when seeing more items. I can't find code for decoding the token.--維基小霸王 (talk) 13:05, 18 February 2024 (UTC)[reply]

盖蒂博物馆在 CC0 下发布 8.8 万幅艺术图] https://www.getty.edu/art/collection/search?open_content=true --維基小霸王 (talk) 11:46, 16 March 2024 (UTC)[reply]

https://zjdy.zjdafw.gov.cn/col/col11/index.html 浙江地方志--維基小霸王 (talk) 12:39, 2 May 2024 (UTC)[reply]

https://lbezone.hkust.edu.hk/rse/ 香港科技大学古籍和其他内容--維基小霸王 (talk) 02:35, 3 May 2024 (UTC)[reply]

The English Reports[edit]

See s:Portal:The English Reports for a list of files. Yann (talk) 12:17, 21 May 2023 (UTC)[reply]

National Diet Library of Japan[edit]

NDL book id[edit]

I am trying to obtain book ids of Rare Books and Old Materials from Japanese NDL website. There are 82,948 "Available without login" items[7]. However, only 10k is avaliable in the list. After many times of applying different options, I can only find 65333 ids: Commons:Library back up project/ndl/rare. Can any one help to find the other ids? @Midleading and Jlhwung: 維基小霸王 (talk) 08:55, 30 May 2023 (UTC)[reply]

@維基小霸王: I have identified 82893 items so far by enumerating search conditions. They are now included in the list. I am still trying to retrieve the few missing ones.虹易 (talk) 10:18, 30 June 2023 (UTC)[reply]
THANKS A LOT! 維基小霸王 (talk) 10:20, 30 June 2023 (UTC)[reply]
I'm happy to help. Updated again. Still several missing. I discontinued the effort since it is infeasible to find more within an acceptable time. You may try to check the related books (a section in book pages) when fetching metadata. 虹易 (talk) 14:21, 30 June 2023 (UTC)[reply]
@虹易: My upload has began: Category:Books_from_NDL_digital_collection, Category:Images_from_NDL_digital_collection.
Could you please get id from these two collections by enumerating search conditions please? https://dl.ndl.go.jp/collections/A00001?pageNum=0 https://dl.ndl.go.jp/collections/A00016?pageNum=0 . You may put them here: Commons:Library back up project/ndl/Allied Occupation Commons:Library back up project/ndl/Books. I may upload when I finish the current batch. --維基小霸王 (talk) 02:13, 10 July 2023 (UTC)[reply]
Thanks for your contributions. I am too busy these days. I will try when I am free.--虹易 (talk) 17:52, 10 July 2023 (UTC)[reply]
Take your time. 維基小霸王 (talk) 03:53, 11 July 2023 (UTC)[reply]
I am working on this. --虹易 (talk) 01:33, 2 September 2023 (UTC)[reply]
@維基小霸王: IDs updated. I scraped all items from 1 to 14000000. The several missing ones in A00003 are also found. A00001 is splitted into Commons:Library_back_up_project/ndl/Books/01 and Commons:Library_back_up_project/ndl/Books/02. I have no clue why there are still thousands of missing ones in A00001. I would check the log latter. --虹易 (talk) 03:37, 17 September 2023 (UTC)[reply]
Thanks a lot! 維基小霸王 (talk) 12:50, 20 September 2023 (UTC)[reply]
@虹易: Since you scraped all items from 1 to 14000000, can you also list other downloadable items as well?--維基小霸王 (talk) 11:08, 19 October 2023 (UTC)[reply]
@維基小霸王: :I uploaded them to here (compressed with zstd). --虹易 (talk) 06:26, 22 October 2023 (UTC)[reply]
That's really helpful! 維基小霸王 (talk) 02:23, 23 October 2023 (UTC)[reply]
@虹易: Some pages seems to be missing: Commons:Library_back_up_project/file_list/NDL/title_could_not_update. For example, zstdgrep 10301792 internet.jsonl.zst give no result while the page exists in NDL. Can you please look again and provide the remaining json files? I have uploaded 350k files so far. I want a compelete back up for old materials. --維基小霸王 (talk) 13:54, 6 December 2023 (UTC)[reply]
The missing one is very useful info. I will check my log for any clues.--虹易 (talk) 05:43, 13 December 2023 (UTC)[reply]
I have identified the missing ones. I will upload a diff later.--虹易 (talk) 10:46, 4 January 2024 (UTC)[reply]
@維基小霸王: :The missing ones are here.--虹易 (talk) 02:59, 7 January 2024 (UTC)[reply]
@虹易 Pages like 753435 still does not exist. Could you check again please? 維基小霸王 (talk) 13:31, 8 January 2024 (UTC)[reply]
@維基小霸王: Sorry for that. Those id ranges were splitted and scraped separately. The previous missing range are 10000001~10985999 while I just realize there are another missing range 1~802112(here). They are missed due to two irrelevant reasons. I think the data is complete for now.--虹易 (talk) 08:10, 9 January 2024 (UTC)[reply]
Great! These historical files will have one more back up. 維基小霸王 (talk) 09:08, 9 January 2024 (UTC)[reply]

Upload books licensed by the Commission of the Agency for Cultural Affairs, Japan[edit]

I am uploading National Diet Library from Japan. So far I have only uploaded very old rare books. Some books (example 1908 work) in the website are licensed by the Commission of the Agency for Cultural Affairs (インターネット公開(裁定)). This is related to the Article 67 of Japanese copyright law:

(著作権者不明等の場合における著作物の利用)

第六十七条

公表された著作物又は相当期間にわたり公衆に提供され、若しくは提示されている事実が明らかである著作物は、著作権者の不明その他の理由により相当な努力を払つてもその著作権者と連絡することができないときは、文化庁長官の裁定を受け、かつ、通常の使用料の額に相当するものとして文化庁長官が定める額の補償金を著作権者のために供託して、その裁定に係る利用方法により利用することができる。

2 前項の規定により作成した著作物の複製物には、同項の裁定に係る複製物である旨及びその裁定のあつた年月日を表示しなければならない。

ChatGPT translation:

(Use of copyrighted material in cases of unknown rights holder, etc.)

Article 67

When it is evident that a copyrighted work has been made available to the public for a considerable period or has been presented to the public, but the copyright owner cannot be contacted due to reasons such as being unknown, the Director-General of the Agency for Cultural Affairs may make a decision. In such cases, the Director-General of the Agency for Cultural Affairs shall deposit a compensation amount determined by the Director-General, which is equivalent to the normal usage fee, on behalf of the copyright owner, and the work may be used in accordance with the decision.

Copies of works created under the provisions of the preceding paragraph must indicate that they are copies subject to the decision of the same paragraph and the date on which the decision was made.

Since these works are old and deserve preservation, I hope to upload these works here as well. If they include copyrighted works (occasionally), one can nominate deletion. Is that OK? 維基小霸王 (talk) 01:15, 26 October 2023 (UTC)[reply]

Upload Complete[edit]

869,738 files (77.96 TB) from National Diet Library of Japan have been uploaded. All old books (audio files not included) have been uploaded. I want to thank @虹易: for obtaining the meta files, @Midleading: for providing the upload information, and @Jlhwung: for asking. And most importantly, my deepest gratitude to the National Diet Library of Japan for their dedication to scanning these historical documents and sharing them with the world. --維基小霸王 (talk) 14:08, 16 March 2024 (UTC)[reply]

Thank you for uploading. They are very helpful for encoding historical characters. Jlhwung (talk) 16:00, 16 March 2024 (UTC)[reply]

digital representation of (P6243) statements are being added to files. Then information about the book from Wikidata can be displayed on Wikimedia Commons. Midleading (talk) 13:37, 23 October 2023 (UTC)[reply]

Thanks for your effort! Can you add author information by wikidata id to NDL books please? Some books contain "Creator: NDLNAId" data. You can obtain wikidata ID by query NDL Authority ID (P349). Example: File:NDL1111655_自由の考察.pdf has the creator id "00001132", which is linked to Akegarasu Haya (Q4700582), which was added to the author (P50) of the file. You can consider to do it after the entire batch finishes. It would take months.--維基小霸王 (talk) 07:02, 17 November 2023 (UTC)[reply]
Why not create an item for the book and add author to the book instead? Then the author is added only once. Midleading (talk) 09:19, 17 December 2023 (UTC)[reply]
That is a good idea. I will provide you the information spreadsheet after the upload. 維基小霸王 (talk) 06:06, 25 December 2023 (UTC)[reply]
If the number of individual books is too many to be created on Wikidata, adding information here could be a better idea. Midleading (talk) 10:05, 4 January 2024 (UTC)[reply]
Additionally, "Subject: NDLSHId" can be added by looking up NDL Authority ID (P349). example: File:NDL1908621_満洲事変満五年.pdf.--維基小霸王 (talk) 04:31, 10 January 2024 (UTC)[reply]
Now this file is wrongly interpreted by templates that this file is the subject rather than a book about the subject. I think Wikidata items of type version, edition or translation (Q3331189) for every NDL book are needed, like File:紅樓夢(一).djvu. Midleading (talk) 08:03, 12 January 2024 (UTC)[reply]
I am confused. Is there something wrong with the template or the property used? You can modify as it should be. 維基小霸王 (talk) 04:11, 16 January 2024 (UTC)[reply]
The upload is not completed. There are still some files. I will go on 2 weeks later. 維基小霸王 (talk) 05:15, 3 February 2024 (UTC)[reply]
Thanks Midleading (talk) 07:19, 3 February 2024 (UTC)[reply]
@Midleading: Also, you can add JPNO (P2687) from "Source Identifier: JPNO" field, NDL Bib ID (P1054) from "Source Identifier: NDLBibID"; also you can add language of work or name (P407) (usually Japanese (Q5287), but NDL treats some Classical Chinese books as Japanese, so you may use kanas in title as heuristic), place of publication (P291) (usually Japan (Q17)) and publication date (P577) (the string date like "大正6" can be added as qualifier object named as (P1932)). In addition you may add a Japanese caption (which may be the file name without ID).--GZWDer (talk) 02:40, 5 February 2024 (UTC)[reply]
I filed the request for permission on Wikidata to request a permission to import data there. Midleading (talk) 13:11, 5 February 2024 (UTC)[reply]


Book classifications[edit]

I plan to add book classifications for book categories using NDC according to NDL classfications. Example: Category:NDC10 912.7. Because the classification were made using 3 editions (NDC8, NDC9, NDC10), they will be added respectively. In the future, China classification according to China national library can be added as well. 維基小霸王 (talk) 11:04, 16 January 2024 (UTC)[reply]

@虹易 Could you give me a spreadsheet with two columns: category name and 中圖分類 from NLC to me? 維基小霸王 (talk) 02:32, 17 January 2024 (UTC)[reply]
@維基小霸王: Sure.[9] Let me know if you need other info. 虹易 (talk) 09:08, 17 January 2024 (UTC)[reply]
@虹易 I mean book category names. For example, for File:NLC416-01jh000001-9467 易理鑰.pdf, category name and 中圖分類 should be 易理鑰 and B221.5 維基小霸王 (talk) 10:47, 17 January 2024 (UTC)[reply]
I can get them from &action=raw so no need anymore.--維基小霸王 (talk) 02:03, 24 January 2024 (UTC)[reply]
@Midleading 您若有興趣,可將Category:Books classified by sibu Classification與wikidata連接。表格已經發給您了。 維基小霸王 (talk) 09:13, 30 January 2024 (UTC)[reply]
这些分类名称的语言既不是中文又不是英文,恐怕不符合分类的命名规则。 Midleading (talk) 10:22, 30 January 2024 (UTC)[reply]
邮件已经回复 Midleading (talk) 11:56, 30 January 2024 (UTC)[reply]
模板分爲三個部分處理:分類類型、ID和名稱。对于四部分类法,没有ID,因此将中文看成模板中的ID,英文看成名稱。WIKIMEDIA COMMONS是多语言网站,规定标题使用通用语言英语。此四部分类法是中国古代分类方法,英语有时无法传递完整的中文意思。现在这样做,兼顧了英語漢語。後面的怎麽做,請發表意見。 維基小霸王 (talk) 14:04, 31 January 2024 (UTC)[reply]

Language in titles[edit]

I will create category of library classifications. An example can be seen at Category:NDC10_912_Drama,where NDC10 is the classification, 912 is the ID of the class, Drama is the name. The template can automatically recognize the category and display relevant classifications. See documentation: {{Library classification navigation}}.

@Midleading and 虹易: What language should we use in titles? Original language or English? While Commons:Naming categories says "Category names should generally be in English", however, these are rarely used outside original language. "For subjects of only local relevance, proper names in the original language are used generally." I will create Library categories used in China and Japan respectively. Should I use original language or English? Please help me decide. It will be difficult to move after the creation. So let's make everything clear now. --維基小霸王 (talk) 13:54, 31 January 2024 (UTC)[reply]

Since there is no reply, I will use the English translations.--維基小霸王 (talk) 04:45, 2 February 2024 (UTC)[reply]

Add category using spreadsheet[edit]

I uploaded a spreadsheet for categorizing books. The spreadsheet contains additional information within it. It has sheets for NDL books, zh and jp book categories. Please help to add.

Download the spreadsheet: You'll find the spreadsheet link here: https://drive.google.com/file/d/1soRx78FG8JcxXn9lN72tFIfvIl5A2oG4/view?usp=sharing

Add categories: There are designated columns named "CAT_1", "CAT_2", and "CAT_3" where you can add new categories for the books. Be sure to only add new categories that don't already existing ones.

Share it back: Once you've added your categories, share the updated spreadsheet back with me. I will add by bot.

Important Note: I will assume you've double-checked your additions and won't be reviewing them for errors.

If you decide to add, please leave a message here first about what you will be adding to avoid duplicate works. 維基小霸王 (talk) 12:31, 19 March 2024 (UTC)[reply]

How to transclude split PDF files in the main namespace?[edit]

Although large PDF files are split, how should these be transcluded? It is not very efficient, but should I merge them to create a new, larger file and upload it? CES1596 (talk) 05:06, 24 March 2024 (UTC)[reply]

@CES1596: Where do you want to transclude? In Wikisource for transcription or for inserting images to Wikipedia? See Commons:Library back up project/use in other projects. --維基小霸王 (talk) 01:53, 25 March 2024 (UTC)[reply]
@維基小霸王: In Wikisource for transcription with the <pages/> tag. I am afraid this tag does not support the merging of split files. CES1596 (talk) 04:34, 25 March 2024 (UTC)[reply]
You can use multiple <pages/> tag. Like <pages file=file1 pages=1-100 /> <pages file=file2 pages=101-200 />.
I am afraid medaiwiki render very large PDF files ineffiently. That's why I split very large files before upload. 維基小霸王 (talk) 12:15, 25 March 2024 (UTC)[reply]
@維基小霸王: I have tried that method, but it seems to create a gap between the text before and after. CES1596 (talk) 12:24, 25 March 2024 (UTC)[reply]
OK. There should be a sotware update. I have proposed in https://phabricator.wikimedia.org/T360890 .--維基小霸王 (talk) 13:04, 25 March 2024 (UTC)[reply]
Thank you. I hope it works out well.
Another point. You need to decide which of the multiple files should be treated as the main page source.CES1596 (talk) 13:28, 25 March 2024 (UTC)[reply]
You can add comment in phabricator. 維基小霸王 (talk) 14:30, 25 March 2024 (UTC)[reply]
@維基小霸王: An example in the main namespace has been added to phabricator. In the main namespace, the gap appears to be larger. CES1596 (talk) 20:11, 25 March 2024 (UTC)[reply]
@維基小霸王: Regarding this matter, could you tell me what specific inefficiencies are caused by the increase in file size? If it is a problem that will be solved in the future, could it be better to consolidate them into one file? In that case, you would have to re-enter the information, but shouldn't that be considered as an option? CES1596 (talk) 08:55, 14 April 2024 (UTC)[reply]
I think smaller files are better for maintaining. A temporary solution of this matter would be to put the entire paragraph cut between files to the last page of the first file. How about that? 維基小霸王 (talk) 10:57, 14 April 2024 (UTC)[reply]
Do you mean to join the split files together and merge them into one file? If that does not cause any problems, it would be very helpful. CES1596 (talk) 11:11, 15 April 2024 (UTC)[reply]

@CES1596: While I can't stop you from merging files, but I would not recommend doing that. Bigger files are not good for maintaining. If you spot there is a page flip in a page, it would be harder to fix if it's a large file. Sometimes, it would result in a super large file. For example, NDL938109 小学校教授細目 has 1535 pages. It would make it harder to locate a specific page.

For transcription, I think you can just put the text in next file in the previous file for splitted files. For example, a paragraph has part 1 in the last page of file A and part 2 in the first page of file B, just put them together in the last page of file A, instead of splitted between the two files. That would not cause problem in displaying in Wikisource, while not requiring the additional effort and resources in creating a large merged file.--維基小霸王 (talk) 06:58, 17 April 2024 (UTC)[reply]

@維基小霸王: Thank you for your reply. However, I am confused about the specific image. I would be grateful if you could provide some concrete example. Do you mean that there will be some spacing at the border, as in the examples in phabricator?CES1596 (talk) 08:35, 17 April 2024 (UTC)[reply]
The problem is that even the less voluminous 200-300 page books that would make up the bulk of the library are split into multiple files. These are usually transcluded using a single file of a few tens of megabytes. CES1596 (talk) 09:02, 17 April 2024 (UTC)[reply]
I have tested in s:ja:利用者:維基小霸王/sandbox and it looks OK. Just put the paragraph in the first page of the 2nd file to the last page of the 1st file. In the 2nd file, start with the new paragraph. 維基小霸王 (talk) 09:45, 17 April 2024 (UTC)[reply]
Thank you for the sample. However, this seems to be a very tight constraint. For now we may need to use different files uploaded for transclusion. CES1596 (talk) 10:08, 17 April 2024 (UTC)[reply]
I have already stated the benefits of smaller files. If you have to merge them, I can't stop you. 維基小霸王 (talk) 11:31, 17 April 2024 (UTC)[reply]
We will probably continue to use the smaller sized single files provided by the NDL, as we have done in the past. However, the back-up of high-resolution images is significant in itself. CES1596 (talk) 11:57, 17 April 2024 (UTC)[reply]