File:Graphique Zipf pour Ulysses.png
From Wikimedia Commons, the free media repository
Jump to navigation
Jump to search
Graphique_Zipf_pour_Ulysses.png (640 × 480 pixels, file size: 4 KB, MIME type: image/png)
File information
Structured data
Captions
Summary
[edit]DescriptionGraphique Zipf pour Ulysses.png |
English: log/log graph of rank/frequency of words in "Ulysses" by James Joyce (Zipf Law)
Français : graphique log/log de la fréquence des mots par leur rang dans "Ulysses" de James Joyce (Loi de Zipf) |
Date | |
Source | Own work |
Author | User: Xofc |
Method
[edit]Using "Ulysses" by James Joyce found on http://www.gutenberg.org/etext/4300
PERL code:
#!/usr/bin/perl
while ($line = <STDIN>)
{
$line =~ tr/[A-Z]/[a-z]/;
@WORDS_IN_LINE = split /\W+/, $line;
for (@WORDS_IN_LINE)
{
$word_freq{$_}++
}
}
sub numeriquement { $b <=> $a; }
foreach $freq (sort numeriquement values %word_freq)
{
printf("%-5d %d\n", $i++, $freq);
}
Or 'bash' code :
cat 4300-8.txt
|tr [A-Z] [a-z]|sed 's/[^a-z]/\n/g'|awk '/[a-z]/{print $1;}'|sort|uniq -c|awk '{print $1;}'|sort -rn|pr -n -t
# |tr "[A-Z]" "[a-z]" # convert to lower cases
# |sed 's/[^a-z]/\n/g' # one word per line : convert every non alpha by carriage_return
# |awk '/[a-z]/{print $1;}' # forget empty lines
# |sort|uniq -c # sort and count
# |awk '{print $1;}' # just remember the count (forget the word)
# |sort -rn # sort numerically, descending order
# |pr -n -t # put a line number (=rank)
This plot was created with Gnuplot by n.
Licensing
[edit]I, the copyright holder of this work, hereby publish it under the following licenses:
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.http://www.gnu.org/copyleft/fdl.htmlGFDLGNU Free Documentation Licensetruetrue |
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International, 3.0 Unported, 2.5 Generic, 2.0 Generic and 1.0 Generic license.
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.
You may select the license of your choice.
File history
Click on a date/time to view the file as it appeared at that time.
Date/Time | Thumbnail | Dimensions | User | Comment | |
---|---|---|---|---|---|
current | 17:11, 4 October 2009 | 640 × 480 (4 KB) | Xofc (talk | contribs) | {{Information |Description={{en|1=log/log graph of rank/frequency of words in "Ulysses" by James Joyce (Zipf Law)}} {{fr|1=graphique log/log de la fréquence des mots par leur rang dans "Ulysses" de James Joyce (Loi de Zipf)}} |Source=Own work by uploader |
You cannot overwrite this file.
File usage on Commons
There are no pages that use this file.
File usage on other wikis
The following other wikis use this file:
- Usage on fr.wikipedia.org
- Usage on ga.wikipedia.org
- Usage on gl.wikipedia.org
- Usage on pt.wikipedia.org
- Usage on tr.wikipedia.org