File:Frequency of digrams in Ukrainian words.png

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Original file(1,252 × 1,252 pixels, file size: 498 KB, MIME type: image/png)

Captions

Captions

Add a one-line explanation of what this file represents

Summary

[edit]
Description
English: Dot (.) represents beginning and the end of a word. Number under the bigram AB is % probability that given letter A, next letter would be B.

Diagram should be read per row. Total probability in row is 100%. First row represents probability of each character starting a word. Second - probability of characters that appear after "а".

Firsts column - probability that when we are on a character of that row, word ends. Not probability of a word ending with that character. So total sum of probabilities in column is not 100%.


Code to obtain this image:

words = open('/usr/share/dict/ukrainian').read().splitlines() # needs package wukrainian to be installed
itos = ".абвгґдеєжзиіїйклмнопрстуфхцчшщьюя'-"
stoi = {s: i for i, s in enumerate(itos)}
nchars = len(itos)

import torch
import random

N = torch.zeros((len(stoi), len(stoi)), dtype=torch.int32)
for w in words:
    chrs = ['.'] + list(w.lower()) + ['.']
    for c1, c2 in zip(chrs, chrs[1:]):
        i1 = stoi[c1]
        i2 = stoi[c2]
        N[i1, i2] += 1

P = N.float()
P = P / P.sum(1, keepdim=True)

import matplotlib.pyplot as plt
%matplotlib inline
# plt.imshow(N)
fig = plt.figure(figsize=(16, 16))
plt.imshow(P, cmap='Blues')
for i in range(nchars):
    for j in range(nchars):
        chstr = itos[i] + itos[j]
        plt.text(j, i, chstr, ha="center", va="bottom", color='gray')
        plt.text(j, i, '%.1f' % (P[i, j].item()*100.0), ha="center", va="top", color='gray')
plt.axis('off')
fig.savefig('uk_digrams.png', bbox_inches='tight')
Code following lesson The spelled-out intro to language modeling: building makemore, by Andrej Karpathy
Date
Source Own work
Author Bunyk

Licensing

[edit]
I, the copyright holder of this work, hereby publish it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current18:33, 10 August 2023Thumbnail for version as of 18:33, 10 August 20231,252 × 1,252 (498 KB)Bunyk (talk | contribs)Uploaded own work with UploadWizard

There are no pages that use this file.

File usage on other wikis

The following other wikis use this file:

Metadata