Commons:Lossy and lossless
A lossless file is a file that is encoded in a format that can be modified or re-encoded to another lossless format without loss of quality. This is opposed to a lossy file, which upon each re-encoding will deteriorate. Lossy files are usually compressed or otherwise altered in order to reduce filesize or to otherwise demand less resources when shown. Lossless files may also be compressed, but will do so in a way where no data is lost. Lossy encoding technologies remove detail and data, primarily that which is deemed indistinguishable to humans. When a lossy file is then used to create an alteration this lack or removal of data may become very evident. Considerable debate exists over what should be deemed indistinguishable, but that is not the subject of this guide. Lossless files are sometimes called "archival quality".
Many different types of media occur in both lossy and lossless formats, for example; sound files, images, videos etc. While using a lossy format is often completely in order or even advisable there are very clear exceptions. Wikimedia Commons accepts a variety of file formats of both lossy and lossless type. This page is intended as an instruction when to use one filetype over the other.
NOTE: A lossy file format can never be restored to a lossless version, and should not be "transcoded" (converted into a lossless format) as it only increases the filesize without any beneficial effect.
Why is lossless important?
[edit]A lossless file can be changed and cropped, edited, and resaved multiple times without losing quality. The Wikimedia Commons exist not only for the purpose of making files available for educational use immediately, but also in the future. Therefore it is important that Commons also carries lossless files that may be edited and adapted in the future for use in the different projects.
When to use lossless
[edit]Lossless formats are properly used for archival or editing purposes. This applies equally for different types of media such as sound, video or images. Lossless files can be created when taking photos (commonly known as RAW and may require color adjusting), when scanning, in studio recordings etc.
An illustrative example of why lossless files are used when editing are the the following closeups of the portrait of Félix Fénéon by Paul Signac. The following images highlight the pointillist technique used by Signac:
- The original file was a lossless TIFF at 6,229 × 4,973 pixels
- 1 - Lossy transcode TIFF file (lossless) converted to PNG (lossless) for archival. Later cropped and converted to JPEG (lossy) to show detail for use in an article.
- 2 - Lossless encode TIFF file (lossless) converted to JPEG (lossy) for use in an article. Later cropped and resaved in JPEG (lossy) to show detail for use in an article.
Note: TIFF is usually an uncompressed lossless format, PNG is a compressed lossless format. While TIFF files are larger, PNG files require decompression and on older computers might take a bit more computational power to display. JPEGs are lossy and compressed.
Lossy transcodes
[edit]These files should not be used for archival purposes and defeat the purpose of the lossless format. Neither can the file be encoded into a lossy format again without losing addional data. Only if this is the sole copy of this file should it be used for any purpose.
Video
[edit]When not to use lossless
[edit]Lossless files should not be used in articles such as on Wikipedia. If a file only exists in a lossless version a new lossy version should be uploaded and used. This is in order to decrease load on the end user, and especially so when the end user has limited bandwidth capacity. Additionally it decreases load on the Wikimedia servers, both bandwidth and processing power used for rendering. Stats needed, working on it!
A thumbnail of an image is larger if the image is presented as a png as opposed to a jpg. When it comes to thumbnails the actual quality of the image is not affected by the format significantly.
While not expressly forbidden; using lossless files in articles is poor practice because of the extra bandwidth cost it incurs on users. While the difference between downloading a 100Kb thumbnail vs a 1Mb thumbnail may seem unremarkable or even trivial it is not. Wikipedia Zero is the Wikimedia Foundations project that aims to spread free access to Wikipedia in developing countries, this because the data cost is often prohibitive for readers.
Not everyone is reached by Wikipedia Zero, and one good way to work towards increased access is through lowering filesize of thumbnails. This seldom effects the end-user in any other way than through a bandwidth saving.
- TIFFs are often large files, but the will automatically thumbnail as JPEGS on MediaWiki software, so there is no immediate reason to recreate the file as a JPEG for this purpose.
- PNGs thumbnail as PNGs and incur additional bandwidth cost on the end-user. Each file should be judged individually to see whether it should be used as a PNG or recreated as a JPEG. JPEG introduces artifacts, which are especially visible in images with text or simple graphics. (Example to be added)
- JPEG is the preferred format for most photo thumbnails and most non-technical graphics. It incurs the least bandwidth cost on the user.
In depth coverage
[edit]Images
[edit]Sound
[edit]The same phenomenon holds true for sound files. While Wikimedia Commons today does not allow lossless sound files they are similarly affected by compression, range reduction or removal of "inaudible" features. Also similarly this is often not immediately apparent. Lossy formats such as .ogg, .mp3, .aac etc. come in a multitude of different encoding standards, some of which where the loss of data is audible and some where it is not. Despite this they all have in common that multiple runs through the encoding process will degrade the quality significantly.
Preferred formats
[edit]Lossless audio codecs such as WAV (uncompressed), FLAC, ALAC etc. do not lose quality upon re-encoding. WAV is not preferred to its immense filesize due to lack of any compression whatsoever.
- FLAC is a lossless and open source/freely licensed format and codec that is supported by the MediaWiki software and Wikimedia Commons. It is the most preferable choice for archival purposes.
- Ogg Opus is a lossy format that is similarly free. It is the best choice for use in articles or other Wikimedia projects.
Other formats
[edit]- WAV or wave files are supported but due to their lack of compression they are immense and should not be used on the Commons for any purpose. They can be encoded in FLAC without loss of any data.
- Ogg Vorbis may be used if an Opus or FLAC version is not available.
Spectral analysis
[edit]A file that is lossy can never be returned to its lossless state, and it may not be immediately obvious if a lossless encoding of a sound file was created from a lossy source. In this case one may turn to spectral analysis. Lossy encodings often remove any sound over 20kHz (or a similar level depending on bitrate etc.). A lossless file that lacks any sound-information above a 17,5-25kHz is likely to be a lossy transcode.
Similarly some encodings will shelf at a certain level and reduce most sound above a certain frequency. A lossless file with a visible shelf is also likely to be a lossy transcode.
The open source tool Audacity can create spectral images for analysis and is available for Windows, Mac and Linux at web.audacityteam.org
spectrocompare.mp3.cr also has a number of images for analysis and has a downloadble python script that can create spectral images quickly and easily.
Video
[edit]Lossless video files can be quite large, and quickly approach the file size limit. As a result, it is rare for people to upload lossless video.
Preferred formats
[edit]- WebM with VP9/Opus is the preferred lossy format for video. Note that some software produces newer WebM files using AV1, which are not yet supported on commons.
Other formats
[edit]- Ogg Theora is an older lossy video format, which does not have as good quality as WebM. It is recommended all new videos use WebM
- Ogg can contain other, less common video formats, such as Dirac. These formats are generally less well supported, and the online video player won't recognize them. However it is possible (although probably not recommended) to upload lossless video in this manner.