Commons:Lossy and lossless compression

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Some formats accepted by Commons, such as the JPEG image format, the Ogg Vorbis audio format, the Ogg Speex audio format, the Ogg Theora video codec, and the DjVu scanned document format, perform lossy compression on the data they describe. The purpose of lossy compression is to dramatically reduce the file size small without significantly compromising perceived media quality.

However, Commons is not just a media repository, but also a media wiki, where users are expected to update images and upload new updated versions of images. In this context, generation loss is an important issue. Additionally, the specific functionality of the Mediawiki thumbnail engine needs to be considered when deciding how to encode images. This page covers issues of working with lossy and lossless encoding on Commons.

Working with lossy compression[edit]

Formats like JPEG which use lossy compression involve a conscious compromise between image quality and file size. Every time a JPEG is saved there is always some loss of image quality. Consecutive operations of opening, editing, and saving a JPEG can cause the image degradation to add up (this is called "generational loss"). JPEG should therefore normally be used only for the final versions of pictures, with lossless formats such as TIFF used during editing. However, in some cases this is not possible, and for some specific types of editing JPEGs can be edited with little or no loss of quality, if the right tools are used.

Keep in mind, you never know how many times an image will need to be edited (by you or others after you). However, do not think JPEG is to be avoided; its lossy compression has disadvantages, but it is also needed to make photos of people and scenes manageable when it comes to file size.

Use high-resolution images[edit]

Generally, Commons advises users to upload the highest possible resolution image. This is important for many reasons, but one often overlooked reason is that it makes it easier to cope with lossy compression. Lossless editing tools work at the level of 8×8 pixel blocks in JPEG images, and when these blocks represent a smaller portion of the image, these tools are more flexible. Additionally, they make it easy to produce a web-scale image with no artifacts by simply downscaling the image and saving it in a lossless format.

Use lossless editing tools[edit]

There exist a number of tools for performing lossless editing on JPEG images. For example, jpegtran, JPEGCrop (from JPEGclub.org), and toolforge:croptool can losslessly rotate JPEG images by a multiple of 90 degrees or losslessly crop it at block boundaries (block boundaries occur every 8 pixels); this is the best way to crop JPEGs, as it decreases image size and loses no quality in the remaining pixels.

There are also tools (such as BetterJPEG) that will allow you to edit a portion of an image and just re-save the affected blocks, causing no generation loss to the rest of the image. These are useful for erasing watermarks or captions that cover only a small part of the image. However, if you're making substantial changes to the image, it's actually better to save at high quality (see next section) because with these tools, the modified regions are saved at the same quality as the original JPEG. Interestingly, BetterJPEG also features a lossless "Convert to Black & White" feature that effectively discards the color information with no generation loss to the brightness data.

Save at high quality[edit]

If lossless editing tools are not an option, the next best defense against generational loss is to save your image with high quality - some say maximum quality level is best, others prefer to reduce it to something "close" to maximum, such as 11 in Photoshop or 95 in GIMP. Such images will still experience generational loss, but at a rate so slow as to be negligible. This may greatly increase the file size of the image, but this is mostly irrelevant, because the actual file size only matters for users who wish to download the original image; most users will only view the thumbnails produced by Mediawiki on the fly, which have a fixed quality setting, and so will not be significantly different in size.

Converting to a lossless format[edit]

Converting a lossy format to a lossless format, such as PNG, prevents unintended future generation loss, but requires changing the filename and so replacing all existing uses of the file. Moreover, the Mediawiki thumbnail renderer has issues creating thumbnails of PNGs: it performs sharpening on JPG thumbnails but not on PNG thumbnails, which therefore appear blurrier; it also creates PNG thumbnails instead of JPEG thumbnails, which for some media like photos can be much larger in filesize. For these reasons merely re-saving at maximum quality is preferable. If you wish to convert to a lossless format, you can deal with the thumbnail renderer by keeping both a lossy and a lossless version, and linking between them - the tags {{PNG with JPEG version}} and {{JPEG version of PNG}} facilitate this.

See also[edit]