User:LennardHofmann/GSoC 2022/Report 3

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

In the last two weeks, I cleaned up the Lua code of {{Wikidata Infobox}} and added some features requested on its talk page. As promised in the last report, I released the new infobox for community testing; see here for the announcement and changelog. I also tested the infobox on all pages that link to the sandbox module, and it seems to perform well: Most category pages take roughly 2 seconds to load.

Wikidata performance[edit]

You might wonder why the new infobox performs the "expensive" call mw.wikibase.getEntity('Q42') instead of calling the "non-expensive" function mw.wikibase.getBestStatements whenever needed.

The short answer is that we have to call getEntity in order to put all labels and descriptions into an invisible HTML element so that searching for the page becomes easier. But getBestStatements (and getAllStatements) are actually also pretty slow on their first run. Check this out:

local starttime = os.clock()
mw.wikibase.getBestStatements('Q42', 'P31') -- usually takes 25–45 ms
print(os.clock() - starttime)

So why isn't getBestStatements marked as expensive? Because it's pretty fast when called on a Wikibase entity that has already been loaded:

local item = mw.wikibase.getEntity('Q42')   -- usually takes 50–90 ms
mw.wikibase.getBestStatements('Q42', 'P31') -- takes 0.7 ms

Or alternatively:

mw.wikibase.getBestStatements('Q42', 'P18') -- usually takes 25–45 ms
mw.wikibase.getBestStatements('Q42', 'P31') -- takes 0.7 ms

As you can see, using getEntity still comes with a significant performance cost as it needs to convert the whole entity into a Lua table, but if you're calling WikidataIB._getValue over 300 times, using getEntity might save time overall, as it allows you to avoid unnecessary calls with if item.claims[pid].

Luckily, fetching labels and sitelinks from unloaded entities is much faster than fetching statements, especially if the entity is large. However, if you want to generate a wikilink to a Commons category based on a QID, you often need to fetch the entity's topic's main category (P910), category related to list (P1754), and Commons category (P373) statements (see d:User:Mike Peel/Commons linking for details). This is why generating Commons links from large entities is slow.

TL;DR: getEntity isn't much slower than getBestStatements. Avoid fetching statements from unloaded Wikibase entities when possible, but fetching labels and sitelinks is fine.

Previous post: Report 2Next post: Report 4