File:Evolutionary Data Measures Understanding the Difficulty of Text Classification Tasks.pdf
From Wikimedia Commons, the free media repository
Jump to navigation
Jump to search
Size of this JPG preview of this PDF file: 424 × 600 pixels. Other resolutions: 170 × 240 pixels | 339 × 480 pixels | 543 × 768 pixels | 1,239 × 1,752 pixels.
Original file (1,239 × 1,752 pixels, file size: 562 KB, MIME type: application/pdf, 27 pages)
File information
Structured data
Captions
Summary
[edit]DescriptionEvolutionary Data Measures Understanding the Difficulty of Text Classification Tasks.pdf |
English: Classification tasks are usually analysed and improved through new model architectures or hyperparameter optimisation but the underlying properties of datasets are discovered on an ad-hoc basis as errors occur. However, understanding the properties of the data is crucial in perfecting models. In this paper we analyse exactly which characteristics of a dataset best determine how difficult that dataset is for the task of text classification. We then propose an intuitive measure of difficulty for text classification datasets which is simple and fast to calculate. We show that this measure generalises to unseen data by comparing it to state-of-the-art datasets and results. This measure can be used to analyse the precise source of errors in a dataset and allows fast estimation of how difficult a dataset is to learn. We searched for this measure by training 12 classical and neural network based models on 78 real-world datasets, then use a genetic algorithm to discover the best measure of difficulty. |
Date | |
Source | Content can be found at arXiv.org (Dedicated link) (archive.org link) |
Author | Edward Collins, Nikolai Rozanov, Bingbing Zhang |
Licensing
[edit]This file is licensed under the Creative Commons Attribution 4.0 International license.
- You are free:
- to share – to copy, distribute and transmit the work
- to remix – to adapt the work
- Under the following conditions:
- attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
This media file is uncategorized.
Please help improve this media file by adding it to one or more categories, so it may be associated with related media files (how?), and so that it can be more easily found.
Please notify the uploader with {{subst:Please link images|File:Evolutionary Data Measures Understanding the Difficulty of Text Classification Tasks.pdf}} ~~~~ |
File history
Click on a date/time to view the file as it appeared at that time.
Date/Time | Thumbnail | Dimensions | User | Comment | |
---|---|---|---|---|---|
current | 16:45, 8 November 2018 | 1,239 × 1,752, 27 pages (562 KB) | Acagastya (talk | contribs) | User created page with UploadWizard |
You cannot overwrite this file.
File usage on Commons
There are no pages that use this file.
Metadata
This file contains additional information such as Exif metadata which may have been added by the digital camera, scanner, or software program used to create or digitize it. If the file has been modified from its original state, some details such as the timestamp may not fully reflect those of the original file. The timestamp is only as accurate as the clock in the camera, and it may be completely wrong.
Short title | |
---|---|
Image title | |
Author | |
Software used | LaTeX with hyperref package |
Conversion program | pdfTeX-1.40.17 |
Encrypted | no |
Page size | 595.276 x 841.89 pts (A4) |
Version of PDF format | 1.5 |