Template talk:Authority control/2012

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Another parameter: French National Library

Any objection to add the authority reference of the en:Bibliothèque nationale de France (French National Library)? It is also often retrieved in the VIAF. It uses an en:Archival Resource Key, for example ark:/12148/cb11907966z for Victor Hugohttp://catalogue.bnf.fr/ark:/12148/cb11907966z

(As you may know, we have a cooperation project going on with the Library, and authority data is a field they wanted to explore with us. This would be a nice first step).

Jean-Fred (talk) 17:00, 16 September 2011 (UTC)

If there is a way to look it up and there is a plan to actually use it than I think it is fine. Do you think there would be a way to to add it to large number of categories? In case someone can create a list of out categories and corresponding ARK numbers, I can add them.--Jarekt (talk) 17:40, 16 September 2011 (UTC)
That is the challenge, but the aim would defintely be to add ARK numbers in mass by bot (or semi-automatically like the German back then with the PND). Thanks for the offer, I'll keep it in mind. ;-) Jean-Fred (talk) 17:47, 16 September 2011 (UTC)
They decided against using BNF numbers at de.wiki because they thought it would be too much work, while the BNF authority can be accessed using VIAF.I am not really convinced by that since the same holds for the Library of Congress, the National Library of Sweden and the PND. So to me if we really want to make it simple we should also remove at least SELIBR and PND (outside de.wp) Beside, BNF entry on the BNF websites are a bit different from those on the VIAF website. They have no translation and there is no format choice but it looks better and there are some additional fields (source, occupation, address (!)). Compare [1] and [2].
BNF (and other libraries authority numbers are provided by VIAF (bottom field of each entry). I dont know if a bot can retrieve them or we should try to collaborate with~"VIAF"s. --Zolo (talk) 20:52, 16 September 2011 (UTC)
Someone could go through all the numbers in the http://catalogue.bnf.fr/servlet/autorite?ID=xxxxxxx website, scrape the names and dates and use them to match with names and dates in category:people by name. Or alternatively ask BNF for a dump from their database with such information. In order to copy IDs from German Wikipedia I had to do a lot of matches like that between our category names and their article names - I used the matching criteria: 1) name matches and dates of birth/death match or are off by one year 2) name matches and one date of birth/death match or is off by one year and the other date is unknown. --Jarekt (talk) 21:21, 16 September 2011 (UTC)
Interesting, the German move, I did not know about that. Thanks for the information Zolo.
Asking a dump to the BnF is definitely in the realm of possible : they already gave us the right to do whatever we want with those authority data, and I know they offered some XML dumps. Should not be too hard to go forward with that. Matching names and birth dates was what I had in mind too, good to know it actually work. I’d be intersting to have a look at your matching code Jarek, is it on the Toolserver SVN?
Jean-Fred (talk) 22:10, 16 September 2011 (UTC)
The "code" is very low tech and rather multistage:
  1. I used this code with AWB to scrape information from all Category:People_by_name categories. I saved results in an Excel spreadsheet.
  2. I used similar script to scrape people data from En Wiki: dates, and interwiki links to DE wiki. I went through all articles that commons categories link to that did not have German link and used names on commons to guess article name on EN wiki. I saved results in the same excel spreadsheet.
  3. I found a lot of matches and added them as interwiki links to commons categories. I also got much more interwiki links to DE Wiki.
  4. I went through DE wiki and scraped Authority data for articles I have interwiki links too. I added them to Commons categories , galleries and Creator templates using AWB.
  5. Now I am going through all the rest of German articles (by guessing that article name is the same as category on commons) and scraping authority control data and dates, and I will match them with commons categories
As a result I have some spreadsheets with people data from commons, EN and DE wikis, I can share. --Jarekt (talk) 02:11, 17 September 2011 (UTC)

✓ Done I added BNF, ARC and ULAN parameters. --Jarekt (talk) 03:33, 21 September 2011 (UTC)

Thanks. I put an example at Category:Francis Duranthon. I’d like your input though. ARK are usually used as the whole URI, like ark:/12148/cb12786851v (which may be resolved by any resolver), where 12148 is the code for the BnF. But that makes quite big labels in the template. Shall we use only cb12786851v (in this example), even if this may be a it less correct? Jean-Fred (talk) 08:55, 21 September 2011 (UTC)
The BNF displays things like Notice n° : FRBNF11896954 with only "11896954" showing in the URL when accessed through a search on the BNF website. Wouldn't these numbers be sufficient ? It is more readable and more convenient for copy paste. --Zolo (talk) 09:39, 21 September 2011 (UTC)
I agree with JeanFred that we should drop out the ark:/12148/ part as it will always be the same for Bnf, so can be built-in the template. But I think we should keep on using the ark suffix to identify the authority. The id shown in url may be easiest to access, but if BnF changes the software running its catalog this identifier should become unavailable, while the ark should always identify the same authority. And this number can be seen in the bottom left corner of the frame inside the permalink to be copied (http://catalogue.bnf.fr/ark:/12148/cb12449597q/PUBLIC) Symac (talk) 10:57, 21 September 2011 (UTC)
✓ Done, only the changing part is used by the template. I updated the documentation.
@Zolo: Yeah, as Symac said, the ARK is perennial. I think the readability is less of an issue with dropping the ark:/12148/, what do you think? Jean-Fred (talk) 12:17, 21 September 2011 (UTC)
It is already nicer without the ark:/12148/ but the BNF calls the notice '(FRBNF + 8 figures)', so I would imagine that other letters coming before or after this are not necessary to identify the person. That said I notice that one of these letters is not constant and I have no idea what it stands for.--Zolo (talk) 15:26, 21 September 2011 (UTC)
When using ark, everything that is after ark:/12148/ has to be set by the library, but they can define it as they want. At BnF I think they chose to use a prefix (cb for authorities from what I see) and after that an id they already had, the FRBN which is the local identifier for authorities, and in their algorithm they have added a checksum which is the last character. As we don't know how their algorithm calculates this last character (at least I didn't find) we need to use the ark in which it is already calculated. I think we could know more about this by getting in touch with them but for now, I don't see any other way than staying like that (maybe we should remove the cb but I'm not sure it would be really easier). I prefer to say in documentation : "keep everything between ark:/12148/ and PUBLIC. Symac (talk) 17:41, 21 September 2011 (UTC)
Thanks for the explanation. So let's keep it that way unless we can have a better suggeston from the BNF. I hope most of them can be bot-filled so that it doese not matter that much.--Zolo (talk) 04:37, 22 September 2011 (UTC)

The BNF has a new opendata website, using the same identifiers as the previous website and adding various info (old - new). Do I change the URL so that the template links to the new website ?--Zolo (talk) 11:03, 9 May 2012 (UTC)

Uh no actually it does not work for all authors yet, but it is worth keeping an eye on it. --Zolo (talk) 12:15, 9 May 2012 (UTC)

Dutch Biography Portal

Hi, the Dutch Biography portal has the ambition to grow its digital reach to all notable persons of the Netherlands who are no longer living. They are up to over 77000 biographies and growing. Many are not (yet) useful (i.e. only one biography is referenced, and often in Wikipedia articles, this is already the biography referenced), but many are extremely useful - 1) because they supply a picture (with attribution) of the person, and 2) they show all name spellings. See for example their entry on en:Margaret of Parma here.Would it be possible to include their "BPN" number in this Authority control template? See also en:template:BPN. Thanks, Jane023 (talk) 09:01, 26 February 2012 (UTC)

I am fine with adding it but I notice that en:template:BPN is not included in en:template:Authority control. Should such specialized databases be included in {{Authority control}} or be addeed to some other place (like the "reference" field of {{Creator}}). The en:Union List of Artist Names, that is already in {{Authority control}} is specialized too, but it is part of the VIAF project, so it may be a bit different. Any other thoughts ? -Zolo (talk) 19:20, 27 February 2012 (UTC)
  •  Support adding link to BPN. Ideally someone would also try to run a bot to populate those. --Jarekt (talk) 20:58, 27 February 2012 (UTC)
  •  Support On the contrary, I think it is more interesting to add very specialised databases, since where this is not much point in duplicating VIAF on Commons (though it is useful), it is even more awesome to have those. Jean-Fred (talk) 23:32, 27 February 2012 (UTC)
✓ Done. Agree with Jean-Fred actually but wanted to have other opinions. See also User:Lupo/resources for potential other links. --Zolo (talk) 07:57, 28 February 2012 (UTC)
Hi thanks! Sorry I am so slow to react, because I only just read this last week. I have started populating some creator templates and I do find this very useful. I put a question on the talk page of the en wiki version of this template, because I guess "my" BPN template should be merged in there (though I see most of my work has been done for painters with no Authority control template yet - sigh). I am confused by the TSURL parameter - this seems to only go to the German wikipedia - why? Thanks in advance (putting page on my watchlist now) Jane023 (talk) 10:01, 6 November 2012 (UTC)

Another parameter : SUDOC

SUDOC is the database of the French libraries, and holds authority records as well (eg Victor Hugo). Those are distinct from the BnF (French National Library) ones, and is also indexed by VIAF. What about adding it? Jean-Fred (talk) 18:28, 10 March 2012 (UTC)

I would say no. More precisely, I would suggest that we should either include all VIAF participants or stop adding them. There are about 20 of them and it seems a bit too much. --Zolo (talk) 09:01, 13 March 2012 (UTC)
I would agree. Lately when adding Authority control I mostly focus on VIAF and databases not connected to it. --Jarekt (talk) 16:55, 13 March 2012 (UTC)
I certainly agree (as I stated above) with adding more non-VIAF databases.
And, as for me, I am also fine with adding all VIAF participants. Firstly, because I see no real problem with that. And secondly, because as great as VIAF may be, it has its errors − for an example I just encountered see 228722568 vs. 122141667 − and I hate replicating those.
Jean-Fred (talk) 15:48, 17 March 2012 (UTC)
VIAF often have multiple records per single person. I think they are working on fixing those, but they are LONG way from being done. The only issue I have with adding too many is that it looks too crowded when added to Creator templates, otherwise the more the merrier. --Jarekt (talk) 19:36, 17 March 2012 (UTC)
I noticed there are lots of duplicates with painters, and for BPN numbers this is also the case. This is where the TSURL should be able to help those organizations clean up their act if they are able to implement a linkback to Wikipedia. Jane023 (talk) 10:03, 6 November 2012 (UTC)

"Alternative" records − like IMDb?

Hi all,

I meant to ask this for a long time.

This template is about “library catalogs and other authority files”, and I believe it was meant to hold, er, “serious” records from established institutions and everything.

Now, what about adding some more informal databases? I can think of:

  • IMDb? For example linking Kevin Bacon to imdb:nm0000102. In fact, I noticed that the eg the BnF often lists IMDb as part of "sources" section (see bnf:cb139817766)
  • MusicBrainz? This is the free database for music, including artists. Their IDs are used by the BBC, Amazon uses their data too − they are now a key actor of the all semantic web / linked data thing.

Thoughts?

Jean-Fred (talk) 16:22, 17 March 2012 (UTC)

I am all for it. I might even write some tools for it, although probably not anytime soon.--Jarekt (talk) 14:20, 18 March 2012 (UTC)

As for IMDb: [3] and [4] contain a mapping from PND to IMDb "numbers", obtained by exploiting de:Template:IMDb_Name and en:Template:IMDb_Name (the overlap is quite huge). Unfortunately this covers only about 40% of the existing articles alone in de: with IMDb_Name templates. For MusicBrainz the situation is comparable on a much smaller scale: de and en each have about 600 links of which I could associate about one fourth with PND numbers. However [5] links to complete mappings between MusicBrainz and de:wp and en:wp (and PND, GKD), proably created in late 2011 by matching complete dumps of all involved databases. -- Gymel (talk) 23:37, 20 March 2012 (UTC)

Interesting, thanks.
Something I am not sure to get: what is the point of using the PND as intermediary? Can’t we just loop on en:Template:IMDb_Name (and de:), follow the CommonsCat link and adding the imdb ID on this category? This way we do not lose all WP articles linking to IMDb but not to the PND − true, all articles do not bear a CommonsCat (or have a corresponding CommonsCat for that matter), but I tend to think this is less than the PND-linked articles (though I could totally be wrong).
Jean-Fred (talk) 23:57, 20 March 2012 (UTC)
It just happened that these PND<->IMDb mappings (methodically restricted to Wikipedia-mentioned-persons) do already exist as a flat file (the en: IMDb templates were selected and matched against Normdaten templates in de: using interwiki links). A direct mapping without the especially lossy PND intermediate is preferable, of course. Since de: and en: both have about 40k transclusions of IMDB_Name I thought lists of any kind might be more suitable than a catscan/AWB approach. -- Gymel (talk) 00:50, 21 March 2012 (UTC)
A different thought: Assume Identifiers x (e.g. PND ones) are already (to a certain extent) incorporated into the template on commons. Using a concordance x <-> y (eg. IMDb) one can simply "pull in" these additional identifiers without constructing again adhoc mappings between articles and relevant commons categories (Simply identifying without further reasoning or matching is what authority data is about, right?). This "pulling in" might be simpler than the direct approach, albeit one would have to use additional mappings (e.g. LCCN?) to get a better coverage. However I have no idea whether a two-step approach (pull in 50% or 80% utilizing already established authority data, work harder for the remainder) has any advantage over doing everything uniformly the (slightly) harder way. -- Gymel (talk) 01:14, 21 March 2012 (UTC)
May be the best way would be to write a bot to:
  1. go through all sub categories of Category:People by name
  2. if interwiki links are present than go to EN and DE Wikis
  3. Look for authority control templates, PND templates IMDb templates, etc. and harvest the data.
  4. Add the harvested data to Commons authority control template
--Jarekt (talk) 01:48, 21 March 2012 (UTC)
Sounds for me like the right thing to do and suitable for execution on a regular schedule. The bot should add any data not yet present in the authority template on commons and either update existing information ( Support: new biographies in de: initially often contain a copy of some other articles Normdaten template which usually will be fixed within some days. also  Support: as the authority files improve there sometimes is reason to correct authority numbers in the templates and commons would benefit from simply updating from de: or en:.  Oppose: the bot would have to cope with conflicting assignments, especially in cases where the contents on commons: were changed manually) or leave it alone (precautions / latency with respect to very new articles must be implemented) but note duplicates and conflicts encountered on a service page for intellectual postprocessing. This second variant would allow better fine-tuning on commons, it is not clear to me whether this is needed at all and it certainly is much more demanding in terms of human postprocessing ("authority work").
As e.g. MusicBrainz support in de: and en: is comparatively thin (of the 47k individual identifications with en: mentioned above only several hundreds are explicit in the articles) a one-time, separate import could be worthwile (I will count the potential number for commons in the next couple of days). This is since commons would allow for MusicBrainz ID strings as "authority" data where de: and en: don't and thus set an editorial barrier for noting any existing identification. In the long run one could hope that DbPedia provides such identifications.
The "Personensuche" on de: was mentioned here already (cf. Johannes Brahms) and one can see ("Auf anderen Webseiten" and "Literatur") how authority numbers (only PND at the moment) from the authority template are employed to provide links to sources often not even mentioned in the article (I deliberately choosed this example since you almost never have MusicBrainz Links in articles about classical composers). Here I see an analogy to the ISBN search facilities built into the software: There is no need that articles, templates, or categories directly deal with GBS or LibraryThing identifiers since there is a tool providing the transition based on a smaller set of broadly applicable identifiers (here: ISBN only). (Also note: the ISBN search implements the transition by proposing searches to the user where the toolserver Personensuche already knows the outcome and only displays those links that will give a result) -- Gymel (talk) 08:40, 21 March 2012 (UTC)

Anybody knows why LCCN is split into 3 parts?

Something I never understood was why is LCCN split into 3 parts? It seems to me that perfectly fine string like "n79022889" than we manually convert it into "n/79/22889" only so we can latter recreate "n79022889" used to call the website. I guess we need to keep the current format so we are compatible with other wikipedias using this template (and can exchange codes), but does anybody know what is the purpose of this? Also may be we should accept alternative LCCN codes like "n79022889". --Jarekt (talk) 14:10, 21 March 2012 (UTC)

Cf. the description of the structure of the LC Control Number (for authority data. [6] gives even more background): Printed Card form was n79-51955, keying in MARC21 is n 79051955 (note the double spaces!) but they switched to four-digit years in 2001(!) like n2001-50268 on printed cards keyed as n 200105028 (note the shrunken padding with single space after "n"). (Mainly because of the included spaces?) there is no official standard how to transform these call numbers into something friendly to link: Compare < http://www.worldcat.org/identities/lccn-n95-96250 > with < http://lccn.loc.gov/n95096250 >, < http://id.loc.gov/authorities/names/n95096250.html > and < http://viaf.org/viaf/sourceID/LC%7Cn+95096250 >. To support maximum linking to different targets one has to know the organisation of the LCCN into three distinct components and one has to know about the two different flavors and how to deduce them from the number given. There are scripts which can perform the necessary analysis but it probably is beyond the capabilities of the template language. Therefore (and since also "n95096250" is by no means an "official" form) users are requested to make the structure explicit by inserting slashes at the apprpriate positions. -- Gymel (talk) 15:56, 21 March 2012 (UTC)

icon at orlabs.oclc.org

Hi! Please compare

  1. http://orlabs.oclc.org/identities/lccn-n50-30508
  2. http://worldcat.org/identities/lccn-n50-30508 (with url variants as "www" and skin(? etc.?) variants as "wcidentities")
    1. http://worldcat.org/wcidentities/lccn-n50-30508 (used in VIAF pages

for META TITLE (for bookmarks etc.) "Ėrenburg, Ilʹi︠a︡ 1891-1967 [WorldCat Identities]".
I like the "orlabs.oclc.org" icon. Today one can distinguish such pages in FF tabs from "viaf.org" tabs.
Question: Schould we change the url? This would invalidate many users bookmarks. Regards
‫·‏לערי ריינהארט‏·‏T‏·‏m‏:‏Th‏·‏T‏·‏email me‏·‏‬ 00:17, 26 April 2012 (UTC)

VIAF dataset = open data

VIAF dataset is available under the Open Data Commons Attribution License. Jean-Fred (talk) 06:45, 11 May 2012 (UTC)

Nice to hear that but I am not quite clear about what it means ? Does it just mean we can automatically extract the links provided in each VIAF entry (which is already more or less what we do :|)? ULAN homepage contains the following statement "Copyright © 2010 The J. Paul Getty Trust. All rights reserved. The ULAN and the other Getty vocabularies are made available via the Web browsers to support limited research and cataloging efforts. ". --Zolo (talk) 07:46, 11 May 2012 (UTC)

See Help:Gadget-VIAFDataImporter for a new gadget simplifying addition of Authority control templates to pages. Many of us were testing this tool for last several months. --Jarekt (talk) 14:32, 8 June 2012 (UTC)

Biology

I wonder if biology databases should be included here. They are probably noy called "authority" but the idea is the same. --Zolo (talk) 16:53, 11 June 2012 (UTC)

Same with books. I created {{Book authority control}} at some point. May be {{Biology authority control}}? --Jarekt (talk) 17:08, 11 June 2012 (UTC)
Actually, after some discussion with user:Liné1, it sounds that it would be a bit complicated to do because additional information (like name of the species in the database) are added by this template. That would be quite a lot of work and sounds like a rather low-priority. --Zolo (talk) 08:35, 21 June 2012 (UTC)

Museofile

Unless there is opposition to it, I will add the Museofile database for French museums (for small institutions, it is often the best online reference - and it is crosslinked with Joconde). I am a bit concerned about what seems to be a lack of stability : [7] and the ID is only 4 digits long. There are currently 1315 entries, so I hope it will remain stable for a while. --Zolo (talk) 08:55, 13 June 2012 (UTC)

Veraltete Parameter / deprecated authority data

Die veralteten Parameter "GKD-V1" und "GKD-V2" werden nicht mehr verwendet. Ich habe sie auf der doc-Seite gestrichen. Auch aus dem (gesperrten) Quelltext können sie entfernt werden. --Kolja21 (talk) 00:32, 13 October 2012 (UTC)

I will look into this. --Jarekt (talk) 03:06, 13 October 2012 (UTC)

Thanx! --Kolja21 (talk) 13:10, 15 October 2012 (UTC)

PND

The Universal Authority File became operational in April 2012 and integrates the content of the following authority files which are discontinued since:

At the time of its introduction (“GND-Grundbestand” from 5 April 2012), the GND holds 9.493.860 files, including 2.650.000 personalized names.

  • on de.WP Template:PND is out of use or gets corrected to GND. (see de:Vorlage:PND)
  • on Commons users are still adding PNDs. If you are working with Help:Gadget-VIAFDataImporter or something similar, you are still adding PNDs, even if there are GNDs. (example) But PNDs are not supported anymore and GNDs are to prefer to PNDs. Same goes for GKD, SWD and DMA-EST.
  • There is no use of adding PND, GKD, SWD and DMA-EST anymore. So may someone fix {{Authority control}} and/or the Gadget(s) to prevent unnecessary work/edits? --PigeonIP (talk) 16:44, 11 December 2012 (UTC)
OK, I will study the German template and update our PND/GND/GKD/SWD etc. Than we can ask Help:Gadget-VIAFDataImporter to update his code. --Jarekt (talk) 18:20, 11 December 2012 (UTC)
By German template you are referring to de:Vorlage:Normdaten? --PigeonIP (talk) 19:44, 11 December 2012 (UTC)
Yes, A lot changed there since I looked at it last. --Jarekt (talk) 03:20, 12 December 2012 (UTC)
The documentation in German displays {{Authority control|LCCN=|GND=|TYP=|PND=|SELIBR=|VIAF=}}. Should PND be changed to BNF? --PigeonIP (talk) 18:17, 13 December 2012 (UTC)
Do you really mean BNF? BNF is French and PND German library index. BTW, I removed unused and retired GKD, SWD and EST fields. --Jarekt (talk) 19:00, 13 December 2012 (UTC)
Thank you for that.
Yes, I do really mean the French library index. Since GND is the "up to date" German library index, the copy-template should inherit GND only (not the similar PND as second option). In the description of the template, you are referring to the BNF at 3rd position, but in the copy-template it is missing. (I am referring to the section of documentation between 1. Verwendung and 1.1. Parameter) --PigeonIP (talk) 19:22, 13 December 2012 (UTC)
  1. {{Authority control|LCCN=|GND=|TYP=|PND=|SELIBR=|VIAF=}} (with deprecated field PND)
  2. {{Authority control|LCCN=|GND=|TYP=|PND=|SELIBR=|VIAF=}}
  3. {{Authority control|LCCN=|GND=|TYP=|SELIBR=|VIAF=}} or
  4. {{Authority control|LCCN=|GND=|TYP=|BNF=|SELIBR=|VIAF=}} (with additional field BNF)
as introduced in:

The following authority authority files are supported :

  1. Gemeinsame Normdatei (GND)
  2. Library of Congress Control Number (LCCN)
  3. Notice d'autorité personne by the Bibliothèque nationale de France (BNF)
  4. SELIBR by the National Library of Sweden
  5. Virtual International Authority File (VIAF)
  6. many more...

mass upload of Authority control templates

For those that did not notice, I am running a bot which copies Authority control templates from any Wikipedia that has them. The bot follows the interwiki links and if one of the articles has an Authority control template than it is copied either to the Creator template or the category. A week ago we had ~35K templates and now we are at 51k. With still more to come. At some latter stage we should add a code to copy individual fields to already existing templates. --Jarekt (talk) 18:29, 11 December 2012 (UTC)

Jarekt you are awesome! I did notice that this template recently expanded, but I think it's the next best thing since hotcat. Jane023 (talk) 12:39, 13 December 2012 (UTC)
✓ Done We are now at ~73k categories with {{Authority control}} templates and interwiki links. --Jarekt (talk) 14:06, 18 December 2012 (UTC)