File:Heaps' Law on "War and Peace".svg

Size of this PNG preview of this SVG file: 662 × 491 pixels. Other resolutions: 320 × 237 pixels | 640 × 475 pixels | 1,024 × 759 pixels | 1,280 × 949 pixels | 2,560 × 1,899 pixels.

Original file ‎(SVG file, nominally 662 × 491 pixels, file size: 156 KB)

Captions

English

Add a one-line explanation of what this file represents

Summary[edit]

DescriptionHeaps' Law on "War and Peace".svg	English: Verification of Heaps' law on War and Peace. ```python import nltk import urllib.request from collections import Counter import matplotlib.pyplot as plt import numpy as np Download the corpus url = "http://www.gutenberg.org/files/2600/2600-0.txt" response = urllib.request.urlopen(url) long_txt = response.read().decode('utf8') import random Tokenize the text tokenizer = nltk.tokenize.RegexpTokenizer('\w+') tokens = tokenizer.tokenize(long_txt.lower()) tokens = tokens[940:] Prepare arrays to hold the counts of total words and unique words total_words = np.arange(1, len(tokens) + 1) unique_words = np.zeros(len(tokens)) Count unique words while progressing through the text word_set = set() for i, token in enumerate(tokens): word_set.add(token) unique_words[i] = len(word_set) Fit Heap's law: unique_words = K * total_words ^ beta log_total_words = np.log(total_words) log_unique_words = np.log(unique_words) beta, logK = np.polyfit(log_total_words, log_unique_words, 1) K = np.exp(logK) Print the estimated parameters print('K:', K) print('beta:', beta) Plot total words vs. unique words plt.figure(figsize=(8, 6)) plt.plot(total_words, unique_words, label='Empirical Data') plt.plot(total_words, K * total_words ** beta, '--', label=f'Heaps\' Law Fit: K={K:.2f}, beta={beta:.2f}') Tokenize the text tokenizer = nltk.tokenize.RegexpTokenizer('\w+') tokens = tokenizer.tokenize(long_txt.lower()) tokens = tokens[940:] random.shuffle(tokens) Prepare arrays to hold the counts of total words and unique words total_words = np.arange(1, len(tokens) + 1) unique_words = np.zeros(len(tokens)) Count unique words while progressing through the text word_set = set() for i, token in enumerate(tokens): word_set.add(token) unique_words[i] = len(word_set) Fit Heap's law: unique_words = K * total_words ^ beta log_total_words = np.log(total_words) log_unique_words = np.log(unique_words) beta, logK = np.polyfit(log_total_words, log_unique_words, 1) K = np.exp(logK) Print the estimated parameters print('K:', K) print('beta:', beta) Plot total words vs. unique words plt.plot(total_words, unique_words, label='Shuffled Empirical Data') plt.plot(total_words, K * total_words ** beta, '--', label=f'Heaps\' Law Fit for shuffled data: K={K:.2f}, beta={beta:.2f}') plt.xlabel('Total Words') plt.ylabel('Unique Words') plt.legend() plt.grid(True) plt.title('Verification of Heaps\' Law on "War and Peace"') plt.savefig("war and peace.svg", bbox_inches='tight', format='svg') ```
Date	18 July 2023
Source	Own work
Author	Cosmia Nebula

Licensing[edit]

I, the copyright holder of this work, hereby publish it under the following license:

This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.

You are free:

to share – to copy, distribute and transmit the work
to remix – to adapt the work

Under the following conditions:

attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.

File history

Click on a date/time to view the file as it appeared at that time.

	Date/Time	Thumbnail	Dimensions	User	Comment
current	22:54, 18 July 2023		662 × 491 (156 KB)	Cosmia Nebula (talk \| contribs)	Uploaded while editing "Heaps' law" on en.wikipedia.org

You cannot overwrite this file.

File usage on Commons

There are no pages that use this file.

File usage on other wikis

The following other wikis use this file:

Usage on en.wikipedia.org
- Heaps' law

Metadata

This file contains additional information such as Exif metadata which may have been added by the digital camera, scanner, or software program used to create or digitize it. If the file has been modified from its original state, some details such as the timestamp may not fully reflect those of the original file. The timestamp is only as accurate as the clock in the camera, and it may be completely wrong.

Width	529.210744pt
Height	392.514375pt

File:Heaps' Law on "War and Peace".svg

Captions

Captions

Summary[edit]

Licensing[edit]

File history

File usage on Commons

File usage on other wikis

Metadata

Structured data

Items portrayed in this file

depicts

creator

some value

copyright status

copyrighted

copyright license

Creative Commons Attribution-ShareAlike 4.0 International

source of file

original creation by uploader

inception

18 July 2023

Navigation menu

File:Heaps' Law on "War and Peace".svg

Captions

Captions

Summary[edit]

Licensing[edit]

File history

File usage on Commons

File usage on other wikis

Metadata

Navigation menu

Search