User:CalRis25

Useful Links

User:CalRis25/sandbox
User:CalRis25/Test-Abaculus-Information
User:CalRis25/Test-Abaculus-CSV
Commons:Structured data
- Commons:Structured_data/Modeling
- Commons:Structured data/Modeling/Illustrations: This includes an example of the necessary property/value-pairs or illustrations.
- Commons:Structured data/Modeling/Copyright
- Commons:Statements
Commons:Guide_to_batch_uploading
Commons:OpenRefine
- Commons:OpenRefine/Uploading_files_with_OpenRefine
Wikidata:
- d:Wikidata:WikiProject Books
OpenRefine:
Commons:Pattypan: Alternative to *OpenRefine*.
- Commons:Pattypan/Simple manual

The Illustrated Companion

This document explains the image requirements of user CalRis25 for the images to Anthony Rich's ILLUSTRATED COMPANION TO THE LATIN DICTIONARY, AND GREEK LEXICON (1849, a digital version can be found at Archive.org) and tries to document the upload process (including problems and solutions).

Anthony Rich published the book Illustrated Companion to the Latin Dictionary, and Greek Lexicon (short RICH) in 1849. It is an encyclopedia explaining in English (Latin) (head)words representing (mainly) visible objects connected with the arts, manufactures, and every-day life of the Greeks and Romans (with a heavy emphasis on the Romans).

RICH goes WIKI

I want to create a Wiki-version of the book. To do so I already transcribed (from scratch and without using OCR) and proofread the entire text (adding internal links while doing so, thus allowing to link from one article to another). I am using custom tags, which allows me to automatically create output files in various formats. Implemented right now: plain text, HTML. Implementing Wikitext should not be a problem.

I have not yet finally decided which of the Wikimedia-projects will be the future home of RICH. Wikiversity seems to be the best option. I will decide upon where to put RICH once the images have been uploaded. However, I still need the images. I made scans (1200 dpi) from an original copy of the 1849-book, in order to have images of sufficiently high quality for this project. Some numbers:

Number of image files: ca. 1900
Total file size: ca. 983.5 MB

Problems to be solved

Concerning these images a few problems need to be solved and questions answered:

How do I go about uploading so many images? This concerns especially the meta data, which I can provide, but entering these by hand is somewhat daunting due to the large number of images concerned. Possible solutions (I am using Linux, by the way):
- Use of a desktop application (listed at Commons:Upload_tools#Standalone_desktop_applications).
- Use of a script (see the page Commons:Command-line upload): This is probably the best option, because the interaction with a program can be kept to a minimum.
  - Can anyone recommend a script for Linux, which allows uploading in bulk including supplying meta data (from a text file, I guess)?
  - I am going to use OpenRefine, which seems to be the recommended way to do batch uploading (and batch editing). See Commons:OpenRefine.
- See also the Mediawiki Commons' FAQ.
What about the meta information needed (see below for specific questions)?
What naming scheme should I use for the files (see below for specific questions)?
And which categories should I use (see below for specific questions)?

Meta Data of the Flickr-images

Images which are uploaded to Wikimedia Commons require meta information. One of the above mentioned Flickr-images shows some of these data:

Description: with the following included data about...
- Identifier
- Title
- Year
- Authors
- Subjects
- Publisher
- Contributing Library
- Digitizing Sponsor
- View Book Page: This includes a link to the respective page of the Archive.org-book.
- About this book: Catalog entry
- View all images: Links to Flickr (not relevant)
- Text Appearing Before Image
- Text Appearing After Image
- Note About Images
Date
Source
Author
Permission

Meta Data for the new images

Meta data can be supplied using templates. Of these, the {{Information}}-template (see Template:Information) is the most pertinent, because the template {{Book}} is used for book pages, not for images appearing on one of these pages.

The parameters of the {{Information}}-template with some open questions:

description: My description probably will contain similar information as the Flickr-images (see above example). However, instead of the Text Appearing Before/After Image I probably will use the complete text of the article (with the reference to the image specially marked, e.g. in bold). Note: The longest article is 14.3 KB in size (in untagged plain text format). According to Template:Information short descriptions are preferred, but I believe that for my purpose the whole article is better, because it provides the necessary context for the image. The file name (see below), after all, mostly refers to some aspect of the image.
- 14.3 KB is awfully big for this. I think you should seriously consider approaches other than sticking things that long in descriptions. - Jmabel ! talk 16:47, 24 August 2024 (UTC)
  - You're probably right. I will stick to something shorter, probably: Image 1 on page 1 of Anthony Rich's Illustrated Companion to the Latin Dictionary... (see Source) as used in the article Abaculus. plus perhaps a short extract from the article.
date: Is this the date when I extracted the image from the page scanned by me? Or is this the date, when the book was published, in which the image appears?
- Date of publication, no question about it. - Jmabel ! talk 16:47, 24 August 2024 (UTC)
  - In that case, I will use "1849". Thank you, Jmabel. CalRis25 (talk) 10:52, 25 August 2024 (UTC)
source: Judging from Template:Information, this should be {{Self-scanned}}, since I scanned the pages from the book and then extracted the image, plus the tag {{Cite book}}. Am I right? See below for the cite book-parameters for this book.
- I shall try the following: {{Own scan}} plus a {{cite book}}-statement with all the relevant information (inluding a link to an Archive.org-scan of the book).
author: The actual creator of the image is unknown (probably not the author), should I use the author of the book instead? Or is it me, because I scanned the page and extracted the image? Or more than one tag, and using which template?
- {{author}}. - Jmabel ! talk 16:47, 24 August 2024 (UTC)
  - Thank you for the recommendation. I shall use {{Author|original|{{Creator|Wikidata=Q18649008}}}}. That creates a nice box with the relevant information. CalRis25 (talk) 10:52, 25 August 2024 (UTC)
permission: Which copyright tag should I use? The page Commons:Copyright tags/General public domain suggests the tag {{PD-US-expired}} to indicate that the work is in the public domain of the USA. Since the book was published outside of the USA, we need another copyright tag for that country, here for the United Kingdom. The page Commons:Copyright tags/Country-specific tags suggests {{PD-UK-unknown}} (according to Commons:Copyright_rules_by_territory/United_Kingdom the UK's standard copyright term is Life + 70 years. Anthony Rich, the author of the book (not of the images) died in 1891). So, should the permission field consist of these two copyright tags: {{PD-US-expired}} and {{PD-UK-unknown}}? And where do I put these copyright tags? The {{Information}}-template does not seem to have a respective field.
- For 1849, it is reasonable to assume that the unkonwn author has been dead for over 70 years, which is what UK law requires. I believe {{PD-Art|PD-UK}} is probably the best you can do, but you might ask at COM:Village pump/Copyright. - Jmabel ! talk 16:47, 24 August 2024 (UTC)
  - From what I read, {{PD-US-expired}}{{PD-scan|PD-old-100|deathyear=1891}} seems to be best and the Sandbox-output looks fine. Thank you, CalRis25 (talk) 10:52, 25 August 2024 (UTC)

{{Cite book}}}}-tag for RICH

{{cite book
|last         = Rich
|first        = Anthony
|authorlink   = :en:Anthony Rich
|title        = The illustrated companion to the Latin dictionary, and Greek lexicon : forming a glossary of all the words representing visible objects connected with the arts, manufactures, and everyday life of the Greeks and Romans 
|url          = https://archive.org/details/illustratedcompa00richuoft
|accessmonth  = 8
|accessyear   = 2024
|format       = PDF
|oclc         = 848290283
|location     = London
|publisher    = Longman, Brown, Green, and Longmans
|year         = 1849
|lang         = en
}}

Categorization

Some considerations concerning the categorization of these images:

Due to the large number of images it will not be possible to put these images into the most specific categories right from the start as suggested by Commons:Categories (sub-section Over-categorization).
There should be a category, which includes all images from the book (possibly even page scans). Such a category actually exists: Category:The illustrated companion to the Latin dictionary, and Greek lexicon. It contains some of the Flickr-versions of the images, strangely enough merely 49 images.
!
- This would be what we usually call a "source category": a hidden maintenance category, in addition to the topical categories. See Category:Source categories (flat list) for examples. - Jmabel ! talk 16:50, 24 August 2024 (UTC)
  - Thank you for the tip. CalRis25 (talk)

The Category:The illustrated companion to the Latin dictionary, and Greek lexicon used by the Flickr-images belongs to the following categories:

Therefore, these five categories should be enough. Question: Is it enough to use the Category:The illustrated companion to the Latin dictionary, and Greek lexicon in the meta data of the images, or are all five categories necessary?

Use only Category:The illustrated companion to the Latin dictionary, and Greek lexicon. The other four should be parents of that. - Jmabel ! talk 16:52, 24 August 2024 (UTC)
- Thank you, I will do so. CalRis25 (talk) 11:11, 25 August 2024 (UTC)

File naming

For my local files I used the following naming scheme: RICH-PAGE_NUMBER-NUMBER_ON_PAGE-LEMMA.jpg (example: RICH-015-1-AES_THERMARUM.jpg), with…

RICH = Identification of the book. For the real file name one might use something like this: Anthony_Rich-Illustrated_Companionion_to_the_Latin_Dictionary_and_Greek_Lexicon. But that would amount to 79 characters, only for this part.
PAGE_NUMBER = zero-padded number of the page in the book (not of the page in the PDF-file)
NUMBER_ON_PAGE = single-digit number of the image in order of appearance on the page.
LEMMA = Latin headword of the article the image belongs to. Right now, the lemma is in all caps because that is the way the headwords are given in the book. It seems to be preferred, that only the first character of the first word is capitalized, e. g. Aes thermarum instead of AES THERMARUM. Switching to these lemmas is no problem.

This naming scheme makes it possible to reliably indicate, which image is exactly where in the book, and allows for sorting according to the place in the book (not by the topic which is indicated by the lemma). Of course, the "RICH"-bit needs to be expanded to more clearly indicate the book. However, this naming scheme tells little about the image itself, because the lemma is a) in Latin, and b) refers only to a part of the image.

See Commons:File naming for more about this topic. Importantly, only the first 20 characters are being displayed on the category pages (out of up to 240 characters, provided that the file name sticks to ASCII-characters). This guide gives some suggestions:

Files that form parts of a whole (such as scans from the same book or large images that are divided into smaller portions due to Commons’ upload size restriction) should follow the same naming convention so that they appear together, in order, in categories and lists.
It gives the following suggestions for batch uploads: {title} ({source}), {title} - {source} {id}, and {brief_description}, {year}. However, it is unclear to me, whether title, source, etc. are actual templates I should use, or whether they are to be replaced by me with the correct information (whatever that is). Adopting the second of these suggestions, a file name might look like this: AES THERMARUM – Rich, Illustrated companion to the Latin dictionary and Greek lexicon, 15, 1.jpg.

Question: Does anyone have a suggestion for the naming of such images, perhaps based from a large batch upload having similar requirements?

I will try out the following naming scheme for files:
- [Lemma] [Number of meaning within article].[Number of image within the meaning] - Anthony Rich, Illustrated Companion to the Latin Dictionary, p. [Page number in original book].jpg.
- Example: Persona 5.1 - Anthony Rich, Illustrated Companion to the Latin Dictionary, p. 494.jpg.

Data needed for OpenRefine's "Schema"

Determining the data needed for *OpenRefine*'s "Schema" for the illustrations from 1849 definitely is not easy. But after searching Commons for a while I found, that the relevant information can mostly be found in the following pages:

d:Help:Copyrights: Here the section 1.1 (*Public Domain Works*) is relevant. The page Structured data/Modeling/Illustrations includes an example with property/value pairs for copyright status both in the US and countries with pma rules ("pma" = *post mortem auctoris*).
Structured data/Modeling/Illustrations: The examples-section of this page gives property/value-pairs for most of the relevant information.
d:Wikidata:WikiProject_Books/en#Work_item_properties might be relevant, especially the section *Work item properties*.

For the book itself, I created a data item at Wikidata: Q130084517.

User:CalRis25

Contents

Useful Links

The Illustrated Companion

RICH goes WIKI

Problems to be solved

Meta Data of the Flickr-images

Meta Data for the new images

{{Cite book}}}}-tag for RICH

Categorization

File naming

Data needed for OpenRefine's "Schema"

Navigation menu

User:CalRis25

Useful Links

The Illustrated Companion

RICH goes WIKI

Problems to be solved

Meta Data of the Flickr-images

Meta Data for the new images

{{Cite book}}}}-tag for RICH

Categorization

File naming

Data needed for OpenRefine's "Schema"

Navigation menu

Search