Commons:File description page regular expressions

From Wikimedia Commons, the free media repository
(Redirected from User:Rocket000/Regexes)
Jump to navigation Jump to search

Shortcut: COM:REGEX This is a list of some regular expressions for localisation and general fixes for bots to do. Some of these are fairly trivial and should be combined with other tasks. Regexes marked as minor should not be run alone. If you have any regexes you use and would like to share, please add them below.

Everything is case-insensitive unless specified otherwise. The expressions should be executed from top to bottom. If any of these cause problems, please report it on the talk page. They're reasonably tested but no guarantees.

Localization/Internationalization

[edit]
Shortcut

Headings

[edit]
Task Find Replace Notes
"Summary" heading Add "== {{int:filedesc}} ==" to file pages where it is missing [Minor], ideally done after all regex changes
"Summary" heading (?:Краткое[ _]+)?описание|Beschreibung\,[ _]+Quelle|Quelle|Beschreibung|वर्णन|sumario|descri(ption|pción|ção do arquivo)|achoimriú)( */ *(?:summary|(?:Краткое[ _]+)?описание|Beschreibung\,[ _]+Quelle|Quelle|Beschreibung|वर्णन|sumario|descri(ption|pción|ção do arquivo)|achoimriú))? *\:? *\1</source> $1 {{int:filedesc}} $1 [MultiLine]
"Licensing" heading )?(za(?: +d\'uso)?|Лицензирование|li[zcs]en[zcs](e|ing|ia)?(?:\s+information)?( */ *(za(?: +d\'uso)?|Лицензирование|li[zcs]en[zcs](e|ing|ia)?(?:\s+information)?))?|\{\{\s*int:license\s*\}\})(\]\])? *\:? *\1</source> $1 {{int:license-header}} $1 [MultiLine]
"Original upload log" headings history)|file ?history|ursprüngliche bild-versionen) *\:? *\1</source> $1 {{original upload log}} $1 [MultiLine]
Remove duplicate headings <syntaxhighlight lang="text" enclose="none">^ *(\=+) *(.*?) *\=+ *[\r\n]+\=+ *\2 *\1 *$</source> $1 $2 $1 [MultiLine]; Run multiple times

Multilingual tags

[edit]
Task Find Replace Notes
{{Unknown}} \s*(?:author|artist)\s*=\s*)(?:unknown?|\{\{\s*unknown\s*\}\}|\?+|unkown|unidentified|αγνωστος|sconosciuto|ignoto|desconocido|inconnu|inconnue|not given|not known|desconhecido|unbekannt|неизвестно|Не известен|neznana|nieznany|непознат|okänd|sconossùo|未知|ukjent|onbekend|nich kennt|ലഭ്യമല്ല|непознат|نه‌ناسرا|descoñecido|不明|ignoto|óþekktur|tak diketahui|ismeretlen|nepoznat|לא ידוע|ûnbekend|tuntematon|نامعلوم|teadmata|nekonata|άγνωστος|ukendt|neznámý|desconegut|Неизвестен|ned bekannt|غير معروف|невідомий)\s*?\;?\.?\s*?(\ \r|\n)</source> $1{{unknown|author}}$2
{{Own}} (part 1) \s*source\s*=\s*)(?:own work)?\s*(?:-|;|</?br *[/\\]?>)?\s*(?:own(?: work(?: by uploader)?)?|(?:œuvre |travail )?personnel(?:le)?|self[- ]made|création perso|selbst fotografiert|obra pr[òo]pia|trabajo propr?io)\s*?(?:\(own work\))?\.? *(\ \r|\n)</source> $1{{own}}$2
{{Own}} (part 2) (\|\s*source\s*=\s*)(?:\{\{\s*[a-z]{2,3} *\|)? *(?:own(?: work(?: by uploader)?)?|travail personnel|self[- ]made|création perso|selbst fotografiert|obra pr[òo]pia|trabajo propr?io) *(?:\}\})? *(?:\{\{\s*[a-z]{2,3} *\|)? *(?:\(?(?:own *work)\)?)? *(?:\}\})?(\||\}\}|\r|\n) (broken! Example: "{{Information | source = selbst fotografiert }}\newline" $1{{own}}$2
{{Own}} (part 3) \s*source\s*=\s*)(?:own[^a-z]*work|opera[^a-z]*propria|trabajo[^a-z]*propio|travail[^a-z]*personnel|eigenes[^a-z]*werk|eigen[^a-z]*werk|собственная[^a-z]*работа|投稿者自身による作品|自己的作品|praca[^a-z]*pw[łl]asna|Obra(?:[^a-z]*do)?[^a-z]*pr[oó]prio|Treball[^a-z]*propi|Собствена[^a-z]*творба|Vlastní[^a-z]*dílo|Eget[^a-z]*arbejde|Propra[^a-z]*verko|Norberak[^a-z]*egina|عمل[^a-z]*شخصي|اثر[^a-z]*شخصی|자작|अपना[^a-z]*काम|נוצר[^a-z]*על[^a-z]*ידי[^a-z]*מעלה[^a-z]*היצירה|Karya[^a-z]*sendiri|Vlastito[^a-z]*djelo[^a-z]*postavljača|Mano[^a-z]*darbas|A[^a-z]*feltöltő[^a-z]*saját[^a-z]*munkája|Karya[^a-z]*sendiri|Eget[^a-z]*verk|Oper[aă][^a-z]*proprie|Vlastné[^a-z]*dielo|Lastno[^a-z]*delo|Сопствено[^a-z]*дело|Oma[^a-z]*teos|Eget[^a-z]*arbete|Yükleyenin[^a-z]*kendi[^a-z]*çalışması|Власна[^a-z]*робота|Sariling[^a-z]*gawa|eie[^a-z]*werk|сопствено[^a-z]*дело|Eige[^a-z]*arbeid|პირადი[^a-z]*ნამუშევარი)\;?\.? *(\ \r|\n)</source> $1{{own}}$2
{{Own}} (part 4) \s*source\s*=\s*)(((?:\'\'+)?)([\"\']?)(?:selbst\W*erstellte?s?|selbst\W*gezeichnete?s?|self\W*made|eigene?s?)\W*?(?:arbeit|aufnahme|(?:ph|f)oto(?:gra(?:ph|f)ie)?)?\.?\4\3) *(\ \r|\n)</source> $1{{own}}$5
{{Self-photographed}} \s*source\s*=\s*)(?:self[^a-z]*photographed|selbst[^a-z]*(?:aufgenommen|(?:f|ph)otogra(?:f|ph)iert?)|投稿者撮影|投稿者の撮影)\s*?\.? *(\ \r|\n)</source> $1{{self-photographed}}$2
{{Anonymous}} \s*author\s*=\s*)(?:anonym(?:e|ous)?|anonyymi|anoniem|an[oòóô]n[yi]mo?|ismeretlen|不明(匿名)|미상|ανώνυμος|аноним(?:ен|ный художник)|neznámy|nieznany|مجهول|Ананім|Anonymní|Ezezaguna|Anonüümne|אלמוני|អនាមិក|Anonimas|അജ്ഞാതം|Анонимный автор|佚名)\s*?\.?\;?\s*?(\ \r|\n)</source> $1{{anonymous}}$2
{{Unknown photographer}} \s*author\s*=\s*)(?:unknown\s*photographer|photographer\s*unknown)\s*?\;?\.?\s*?(\ \r|\n)</source> $1{{unknown photographer}}$2
{{Private collection}} \s*gallery\s*=\s*)private(?: collection)? *(\ \r|\n)</source> $1{{private collection}}$2
{{See below}} \s*permission\s*=\s*)(?:see\s*below|див\.?\s*нижче|дивись\s*нижче)\s*?\;?\.?\s*?(\ \r|\n)</source> $1{{see below}}$2
Task Find Replace Notes
{{Original description page}} I is|was) \[(?:https?:)?\/\/(?:www\.)?((?:[a-z\-]+\.)?wik[a-z]+(?:\-old)?)\.org\/w((?:\/shared)?)\/index\.php\?title\=(?:[a-z]+)(?:\:|%3A)([^\[\]\|}{]+?) +here(?:\]\.?|\.?\])(\s+All following user names refer to (?:\1(?:\.org)?\2|(?:wts|shared)\.oldwikivoyage)\.?)?</source> {{original description page|$1$2|$3}}
{{Original description page}} II %3A)([\w\%\-\.\~\:\/\?\#\[\]\@\!\$\&\'\(\)\*\+\,\;\=]+?)(?:| [^\]\n]*)\](?:\s*\,?\s*before it was transferr?ed to commons)?\.?</source> {{original description page|$1|$2}}
{{Original description page}} III \s*([a-z\-]+\.w[a-z]+)\s*\|\s*[^}\|\[{]+\}\})\s*using\s*\[\[\:en\:WP\:FTCG\|FtCG\]\]\.?</source> $1{{transferred from|$3||[[:en:WP:FTCG|FtCG]]}} $2

Technique translations

[edit]

These mainly apply to paintings and other artistic works.

Task Find Replace Notes
Oil on canvas \s*technique\s*=\s*)(?:\{\{\s*(?:en|de) *\|)? *(?:oil[ -]on[ -]canvas|öl[ -]auf[ -]leinwand) *(?:\}\})?(\ \r|\n)</source> $1{{technique|oil|canvas}}$2
Oil on wood \s*technique\s*=\s*)\{\{\s*de *\|\s*öl[ -]auf[ -]holz\s*\}\}(\ \r|\n)</source> $1{{technique|oil|wood}}$2
Oil on oak \s*technique\s*=\s*)\{\{\s*de *\|\s*öl[ -]auf[ -]eichenholz\s*\}\}(\ \r|\n)</source> $1{{technique|oil|panel|wood=oak}}$2
Oil on panel \s*technique\s*=\s*)(?:\{\{\s*en *\|)? *oil[ -]on[ -]panel *(?:\}\})?(\ \r|\n)</source> $1{{technique|oil|panel}}$2
Watercolor \s*technique\s*=\s*)\{\{\s*de *\|\s*aquarell\s*\}\}(\ \r|\n)</source> $1{{technique|watercolor}}$2
Fresco \s*technique\s*=\s*)\{\{\s*de *\|\s*fresko\s*\}\}(\ \r|\n)</source> $1{{technique|fresco}}$2

{{Information}} fields

[edit]
Task Find Replace Notes
"Description" cleanup \s*description\s*=)\s*(?:\{\{\s*description missing\s*\}\}|\s*description missing\s*?|(?:\{\{\s*en *\|) *(?:)?no original description(?:)? *(?:\}\})|(?:)?no original description(?:)? *) *(\ \r|\n)</source> $1$2
"Permission" cleanup 1 \s*permission\s*=)\s*((?:\'\')?)(?:-|—|下記を参照|see(?: licens(?:e|ing|e +section))?(?: bell?ow)?|yes|oui)\s*?\,?\.?;?\s*?\2\s*?(\ \r|\n)</source> $1$3
"Permission" cleanup 2 \s*permission\s*=)\s*\{\{(?:en\|)?\s*?see\sbell?ow\s*?\}\}\s*?(\ \r|\n)</source> $1$2
"Other versions" cleanup \s*other[_ ]versions\s*=)\s*(?:)?(?:-|—|no|none?(?: known)?|nein|yes|keine|\-+)\.?(?:)? *(\ \r|\n)</source> $1$2
"Source" cleanup \s*source\s*\=\s*[^*]+?)\n?\*\s*uploaded\s+by\s+\[\[user\:[^\]]+]](\ \r|\n)</source> $1$2 File Upload Bot (Magnus Manske) was adding these but they can already be found in the filehistory of each uploaded file.

Dates

[edit]

Most plausible years

[edit]

Most digital photos are dated after 2000. So the most plausible year is <syntaxhighlight lang="text" enclose="none">(200[0-9]|201[0-9])</source>. For example 19082006 gets translated into 2006-08-19.

Task Find Replace Notes
Conversion (yyyy[ -/.]mm[ -/.]dd) \s*date\s*=\s*)(?:created|made|taken)? *(200[0-9]|201[0-9])(-| |/|\.|)(0[1-9]|1[0-2])\3(1[3-9]|2[0-9]|3[01])(\ \r|\n)</source> $1$2-$4-$5$6
Conversion (yyyy[ -/.]dd[ -/.]mm) \s*date\s*=\s*)(?:created|made|taken)? *(200[0-9]|201[0-9])(-| |/|\.|)(1[3-9]|2[0-9]|3[01])\3(0[1-9]|1[0-2])(\ \r|\n)</source> $1$2-$5-$4$6
Conversion (mm[ -/.]dd[ -/.]yyyy) \s*date\s*=\s*)(?:created|made|taken)? *(0[1-9]|1[0-2])(-| |/|\.|)(1[3-9]|2[0-9]|3[01])\3(200[0-9]|201[0-9])(\ \r|\n)</source> $1$5-$2-$4$6
Conversion (dd[ -/.]mm[ -/.]yyyy) \s*date\s*=\s*)(?:created|made|taken)? *(1[3-9]|2[0-9]|3[01])(-| |/|\.|)(0[1-9]|1[0-2])\3(200[0-9]|201[0-9])(\ \r|\n)</source> $1$5-$4-$2$6

Other plausible years

[edit]

Try those after applying the above! For example 19781706 gets translated into 1978-06-17.

Task Find Replace Notes
Conversion (yyyy[ -/.]mm[ -/.]dd) \s*date\s*=\s*)(?:created|made|taken)? *(1[89][0-9]{2})(-| |/|\.|)(0[1-9]|1[0-2])\3(1[3-9]|2[0-9]|3[01])(\ \r|\n)</source> $1$2-$4-$5$6
Conversion (yyyy[ -/.]dd[ -/.]mm) \s*date\s*=\s*)(?:created|made|taken)? *(1[89][0-9]{2})(-| |/|\.|)(1[3-9]|2[0-9]|3[01])\3(0[1-9]|1[0-2])(\ \r|\n)</source> $1$2-$5-$4$6
Conversion (mm[ -/.]dd[ -/.]yyyy) \s*date\s*=\s*)(?:created|made|taken)? *(0[1-9]|1[0-2])(-| |/|\.|)(1[3-9]|2[0-9]|3[01])\3(1[89][0-9]{2})(\ \r|\n)</source> $1$5-$2-$4$6
Conversion (dd[ -/.]mm[ -/.]yyyy) \s*date\s*=\s*)(?:created|made|taken)? *(1[3-9]|2[0-9]|3[01])(-| |/|\.|)(0[1-9]|1[0-2])\3(1[89][0-9]{2})(\ \r|\n)</source> $1$5-$4-$2$6
Task Find Replace Notes
Conversion ({{date|yyyy|mm|dd}}) \s*date\s*=\s*)(?:created|made|taken)? *\{\{\s*date\|([0-9]{4})\|(0[1-9]|1[012])\|(0?[1-9]|1[0-9]|2[0-9]|3[01])\}\}(\ \r|\n)</source> $1$2-$3-$4$5 {{Date}} function is built-in
Unknown date \s*(?:date|year)\s*=\s*)(?:unknown?(?:\s*date)?|\?|unbekannte?s?(\s*datum)?)</source> $1{{unknown|date}}
{{other date|century}} \s*(?:date|year)\s*=\s*)(\d\d?)(?:st|nd|rd|th) *century *(\ \r|\n)</source> $1{{other date|century|$2}}$3
{{other date|~}} \s*(?:date|year)\s*=\s*)(?:cir)?ca?\.? *\s?(1\d{2})[\-\?] *(\ \r|\n)</source> $1{{other date|~|${2}0|${2}9}}$3
{{other date|~}} \s*(?:date|year)\s*=\s*)(?:cir)?ca?\.? *(\d{4}) *(\ \r|\n)</source> $1{{other date|~|$2}}$3
{{other date|?}} \s*(?:date|year)\s*=\s*)(?:unknown|\?+)\.? *(\ \r|\n)</source> $1{{other date|?}}$2
{{Original upload date}} (original upload date) \d{4}\-\d{2}\-\d{2}\}\})\s*(?:\(original\s*upload\s*date\)|\(\s*first\s*version\s*\);?\s*\{\{\s*original upload date\|\d{4}\-\d{2}\-\d{2}\}\}\s*\(\s*last\s*version\s*\))</source> $1
{{Original upload date}} & {{According to EXIF data}} \s*date\s*=\s*)(?:\{\{\s*date\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\}\}|(\d{4})\-(\d{2})\-(\d{2}))\s*\(\s*(original upload date|according to EXIF data)\s*\)\s*?(\ \r|\n)</source> $1{{$8|$2$5-$3$6-$4$7}}$9
{{Original upload date}} I \s*date\s*=\s*)\{\{\s*date\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\|\s*(\d+)\s*\}\}\s*\(\s*first\s*version\s*\)\;?\s*\{\{\s*date\s*\|\s*\d+\s*\|\s*\d+\s*\|\s*\d+\s*\}\}\s*\(\s*last\s*version\s*\)</source> $1{{original upload date|$2-$3-$4}}
{{Original upload date}} II \s*date\s*=\s*)(\d{4})\-(\d{2})\-(\d{2})\s*\(\s*first\s*version\s*\)\;?\s*(\d{4})\-(\d{2})\-(\d{2})\s*\(\s*last\s*version\s*\)</source> $1{{original upload date|$2-$3-$4}}
{{Original upload date}} III \s*date\s*=\s*\(?\s*)(?:Uploaded\s*on\s*Commons\s*at\s*[\d\-]*\s*[\d:]*\s*\(?UTC\)?\s*\/?\s*)?Original(?:ly)?\s*uploaded\s*at\s*([\d\-]*)\s*[\d:]*</source> $1{{original upload date|$2}}
{{other date|s}} \s*date\s*=\s*)(\d{1,3}0)\s*s</source> $1{{other date|s|$2}}
{{other date|after}} \s*date\s*=\s*)(?:after|post|بعد|desprès|po|nach|efter|μετά από|después de|pärast|پس از|après|despois do|לאחר|nakon|dopo il|по|na|após|după|после)\s*(\d{4})</source> $1{{other date|after|$2}}
{{other date|before}} \s*date\s*=\s*)(?:before|vor|pre|до|vör|voor|prior to|ante|antes de|قبل|Преди|abans|před|før|πριν από|enne|پیش از|ennen|avant|antes do|לפני|prije|prima del|пред|przed|înainte de|ранее|pred|före)[\s\-]*(\d{4})</source> $1{{other date|before|$2}}
{{other date|or}} \s*date\s*=\s*)(\d{4})\s*(?:or|أو|o|nebo|eller|oder|ή|ó|või|یا|tai|ou|או|vagy|または|или|അഥവാ|of|lub|ou|sau|или|ali|หรือ|和)\s*?(\d{4})</source> $1{{other date|or|$2|$3}}
{{other date|between}} \s*date\s*=\s*)(?:sometime\s*)?(?:between)\s*(\d{4})\s*(?:and|\-)?\s*?(\d{4})</source> $1{{other date|between|$2|$3}}
{{other date|spring}} \s*date\s*=\s*)(?:primavera(?:\s*de)?|jaro|forår|frühling|spring|printempo|Kevät|printemps|пролет|Vörjohr|früh[ \-]?jahr|voorjaar|wiosna|primăvara(?:\s*lui)?|весна|pomlad|våren|spring)\s*(\d{4})</source> $1{{other date|spring|$2}}
{{other date|summer}} \s*date\s*=\s*)(?:estiu|léto|somero|verano|Kesä|été|verán|estate|лето|zomer|lato|verão(?:\s*de)?|vara(?:\s*lui)?|poletje|sommaren|sommer|summer)\s*(\d{4})</source> $1{{other date|summer|$2}}
{{other date|fall}} \s*date\s*=\s*)(?:fall|autumn|tardor|podzim|Efterår|Herbst|aŭtuno|otoño|Syksy|outono(?:\s*de)?automne|outono|autunno|есен|Harvst|herfst|jesień|toamna(?:\s*lui)?|осень|jesen|hösten)\s*(\d{4})</source> $1{{other date|fall|$2}}
{{other date|winter}} \s*date\s*=\s*)(?:winter|hivern|zima|Vinter|vintro|invierno|Talvi|hiver|inverno(?:\s*de)?|зима|iarna(?:\s*lui)?|зима|zima|vintern)\s*(\d{4})</source> $1{{other date|winter|$2}}
{{other date|circa}} \s*date\s*=\s*)(?:[zc]ir[kc]a|ungefähr|about|around|vers|حوالي|cca|etwa|περ\.?|cerca\s*de|حدود|noin|cara a|oko|około|около|c[\:\. ]?a?[\:\. ]?)\s*(\d{3,4})(?:\s*\-\s*(?:[zc]ir[kc]a|ungefähr|about|around|vers|حوالي|cca|etwa|περ\.?|cerca\s*de|حدود|noin|cara a|oko|około|около|c[\:\. ]?a?[\:\. ]?)?\s*(\d{3,4}))?</source> $1{{other date|circa|$2|$3}}
empty argument fix circa\|\d+)\|\}\}</source> $1}}
{{other date|circa}} \s*date\s*=\s*)(?:[zc]ir[kc]a|ungefähr|about|around|vers|حوالي|cca|etwa|περ\.?|cerca\s*de|حدود|noin|cara a|oko|około|около|c[\:\. ]?a?[\:\. ]?)\s*(\d{3,4})</source> $1{{other date|circa|$2}}
(from metadata) \s*date\s*=\s*)\{\{\s*ISOdate\s*\|\s*([\d\-]+)\s*\}\}\s*\(\s*from\s*metadata\s*\)</source> $1{{according to EXIF|$2}}

Junk cleanup

[edit]
Task Find Replace Notes
{{ImageUpload}} removal <syntaxhighlight lang="text" enclose="none">\s*\n?</source> [Minor]
Uncategorized comment <syntaxhighlight lang="text" enclose="none"> * *</source> [Minor]; Usually left behind after categorizing
"Categories" comment <syntaxhighlight lang="text" enclose="none"> * *\n?</source> [Minor]
"move approved by" \n)*?)(?:This image was moved from *\[\[:?(?:File|image):?[^\]\[{}]*\]\]\.?)?</source> $1
Useless templates (if they take no parameters) Art\.|bots|football[ _]+kit|template[ _]+other|s|tl|tlxs|template|template[ _]+link|temp|tls|tlx|tl1|tlp|tlsx|tlsp|mbox|tmbox(?:\/core)?|lan|jULIANDAY|file[ _]+title|nowrap|plural|time[ _]+ago|time[ _]+ago\/core|toolbar|red|green|sp|other date|max|max\/2|str[ _]+left|str[ _]+right|music|date|cite[ _]+book|citation\/core|citation\/make[ _]+link|citation\/identifier|citation|cite|cite[ _]+book|citation\/authors|citation\/make[ _]+link|cite[ _]+journal|cite[ _]+patent|cite[ _]+web|hide in print|only in print|parmPart|error|crediti|fontcolor|transclude|trim|navbox|navbar|section[ _]+link|yesno|center|unused|•|infobox\/row)\s*\}\}</source>
Useless full URL \s*(?:https?:)?\/\/ticket\.wikimedia\.org\/otrs\/index\.pl\?Action\s*\=\s*AgentTicketZoom&(?:amp;)?TicketNumber\=(\d+)\s*\}\}</source> {{PermissionOTRS|id=$1}}
Unnecessary __NOTOC__ <syntaxhighlight lang="text" enclose="none">__ *NOTOC *__</source> [Case sensitive] [Minor]; Common.css prevents file pages from showing TOCs
Remove empty lang templates ab|ace|af|ak|als|am|an|ang|ar|arc|arz|as|ast|av|ay|az|ba|bar|bcl|be|bg|bh|bi|bjn|bm|bn|bo|bpy|br|bs|bug|bxr|ca|cbk-zam|cdo|ce|ceb|ch|cho|chr|chy|ckb|co|cr|crh|cs|csb|cu|cv|cy|da|de|diq|dsb|dv|dz|ee|el|eml|en|eo|es|et|eu|ext|fa|ff|fi|fiu-vro|fj|fo|fr|frp|frr|fur|fy|ga|gag|gan|gd|gl|glk|gn|got|gu|gv|ha|hak|haw|he|hi|hif|ho|hr|hsb|ht|hu|hy|hz|ia|id|ie|ig|ii|map-bms|ik|ilo|io|is|it|iu|ja|jbo|jv|ka|kaa|kab|kbd|kg|ki|kj|kk|kl|km|kn|ko|kr|krc|ks|ksh|ku|kv|kw|ky|la|lad|lb|lbe|lez|lg|li|lij|roa-rup|lmo|ln|lo|lt|ltg|lv|mdf|mg|mh|mhr|mi|mk|ml|mn|mo|mr|mrj|ms|mt|mus|mwl|my|myv|mzn|na|nah|nap|nds|nds-nl|ne|new|ng|nl|nn|no|nov|nrm|nso|nv|ny|oc|om|or|os|pa|pag|pam|pap|pcd|pdc|pfl|pi|pih|pl|pms|pnb|pnt|ps|pt|qu|rm|rmy|rn|ro|roa-tara|ru|rue|rw|sa|sah|sc|scn|sco|sd|se|sg|sh|si|sk|sl|sm|sn|so|sq|sr|srn|ss|st|stq|su|sv|sw|szl|ta|te|tet|tg|th|ti|tk|tn|to|zh-hans|tpi|tr|ts|tt|tum|tw|ty|tyv|udm|ug|uk|ur|uz|ve|vec|vep|vi|vls|vo|wa|war|wo|wuu|xal|xh|xmf|yi|yo|za|zea|zh|zh-hant|zh-hk|zh-min-nan|zh-sg|zu)\s*(?:|\ \s*1=)?\s*\}\} *(\ \r|\n)</source> $1 Ignores those followed by text (incorrect usage but still indicates the language)
Remove void parameter (wrong syntax) (\s*\ \}\})</source> $1$2
[edit]
Task Find Replace Notes
External to interwiki (part 1) (wikt)ionary|wiki(n)ews|wiki(b)ooks|wiki(q)uote|wiki(s)ource|wiki(v)ersity|wiki(voy)age)\.(?:com|net|org)/wiki/([^\]\[{|}\s"]*) +([^\n\]]+)\]</source> [[$2$3$4$5$6$7$8:$1:$9|$10]] Make sure not to touch credit lines which require a link to the file page. (Effectively a self-link which results in bold text after this regex)
External to interwiki (part 2) (incubator)|(quality))\.wikimedia\.(?:com|net|org)/wiki/([^\]\[{|}\s"]*) +([^\n\]]+)\]</source> [[$1$2$3:$4|$5]] See above
External to wikilink (local) net|org)/wiki/([^\]\[{|}\s"]*) +([^\n\]]+)\]</source> [[:$1|$2]] See above
Interlanguage sv|nl|de|fr|ru|it|es|ceb|vi|war|pl|ja|pt|zh|uk|ca|no|fa|fi|id|ar|cs|ko|ms|hu|ro|zh-yue|sr|tr|min|sh|kk|eo|eu|sk|da|lt|bg|he|hr|sl|hy|uz|et|vo|nn|gl|bat-smg|simple|hi|la|el|az|th|oc|ka|mk|be|new|tt|pms|tl|ta|te|cy|lv|ce|be-x-old|ht|ur|bs|sq|br|jv|mg|lb|mr|is|ml|pnb|ba|af|my|bn|ga|lmo|yo|fy|an|cv|tg|ky|nds-nl|sw|ne|io|gu|sco|bpy|scn|nds|ku|ast|qu|su|als|gd|kn|am|ckb|ia|nap|bug|wa|mn|pa|arz|mzn|si|zh-min-nan|yi|fo|sah|vec|sa|bar|nah|os|or|pam|hsb|se|li|mrj|mi|ilo|co|hif|bcl|gan|frr|bo|rue|mhr|glk|fiu-vro|ps|tk|pag|vls|gv|xmf|diq|km|kv|zea|csb|crh|hak|vep|sc|ay|dv|map-bms|so|nrm|rm|udm|koi|kw|ug|stq|bh|lad|wuu|lij|eml|fur|mt|szl|gn|pi|as|pcd|gag|cbk-zam|ksh|nov|ang|ie|nv|ace|ext|frp|mwl|ln|lez|sn|dsb|pfl|krc|haw|pdc|kab|xal|rw|myv|to|arc|kl|roa-tara|bjn|kbd|lo|ha|pap|av|tpi|mdf|lbe|jbo|na|wo|bxr|ty|srn|kaa|ig|nso|tet|kg|ab|ltg|roa-rup|zu|za|cdo|tyv|chy|tw|rmy|om|cu|tn|chr|bi|got|pih|sm|rn|bm|ss|mo|iu|sd|pnt|ki|xh|ts|zh-classical|ee|ak|ti|fj|lg|ks|ff|sg|ny|ve|cr|st|dz|ik|tum|ch|ng|ii|cho|mh|aa|kj|ho|mus|kr|hz):([^\]\[\|\}\{]+)\]\]</source> [[:$1:$2]] Interlanguage links in the File namespace do not make sense, categories should be used instead. Thus, convert to normal link and leave for manual cleanup.

Categories

[edit]

These are mainly to improve machine-readability when performing other category work.

Task Find Replace Notes
Normalize categories [^]]*)?\]\] *</source> [[Category:$1$2]] Run this before the other category fixes
Remove empty [[Category:]] <syntaxhighlight lang="text" enclose="none">\[\[category: *\]\](?:\n( *\[\[category:))?</source> $1
Remove double [[Category:[[Category:...]]]] <syntaxhighlight lang="text" enclose="none">\[\[category:(\[\[category:[^]]*\]\])[ ]*\]\]</source> $1
One category per line <syntaxhighlight lang="text" enclose="none">\[\[category:([^]]+)\]\] *\[\[category:([^]]+)\]\]</source> [[Category:$1]]\n[[Category:$2]] Run multiple times
Remove duplicates <syntaxhighlight lang="text" enclose="none">(\[\[[Cc]ategory:)([^]]+\]\])(.*?)\1\2\n?</source> $1$2$3 Run multiple times, case sensitive
Remove blank lines between categories <syntaxhighlight lang="text" enclose="none">(\[\[category:[^]]+\]\]\n)\n+(\[\[category:)</source> $1$2 [Minor]

Formatting

[edit]
Task Find Replace Notes
Delete surplus lines <syntaxhighlight lang="text" enclose="none">\n{3,}</source> \n\n [Minor]
Fix incorrect line break syntax <syntaxhighlight lang="text" enclose="none"></?br( )?(/)?\\?></source> <br$1$2> This fixes only incorrect syntax (so <br>, <br/>, and <br /> are preserved)
Remove {{}}, [[]], <gallery></gallery>, etc. \[\[\]\]|<gallery>\s*</gallery>|\[\[:?File *: *\]\])</source>

See also

[edit]