What Is Compression?

Date First Published: 27th June 2023

Topic: Computer Systems

Computer Software

Article Type: Computer Terms & Definitions

Difficulty: Medium

Difficulty Level: 5/10

CONTENTS

How Does Compression Work?
Types Of Compression
How Many Times Can A File Be Compressed?
Challenges Of Compression
Difference Between Compression and Data Deduplication

Learn about what compression is in this article.

Compression is the process of reducing the number of bits needed to represent data to reduce the file size, saving storage space and speeding up file transfers. It is very useful when transferring data over the internet because it allows the same data to be sent in fewer bits, leading to less bandwidth needed to transfer the data and a reduced load time.

How Does Compression Work?

Compression works by a program using an algorithm to determine how to reduce the size of the data. An example of an algorithm is one that represents a string of bits with a smaller string of 0s and 1s and converts them.

Text compression works by removing all unnecessary characters or redundancies and inserting a single repeat character to indicate a string of repeated characters or replacing a smaller bit string with a frequently occurring bit string. Text compression can reduce a text file to around 50% of the original size. Compression can be performed on the data content or the whole transmission unit for data transfers.

Types Of Compression

There are two main types of compression. These include lossy and lossless compression.

Lossy compression

Lossy compression permanently deletes certain bits of data to reduce the file size. This process is irreversible, meaning that the file cannot be decompressed to restore it to its original form. This is why it is always recommended to keep a copy of the file before it was compressed so that it can be recovered in case the compression goes wrong. Lossy compression is most commonly used for video, audio, and image compression, where removing some redundant and unneeded data bits has little to no perceptible effect on the representation and quality of the content.

Sometimes, the data lost during compression can have a noticeable impact on the image or video. For example, a JPEG image that has been highly compressed may have blurry details, noise, and darker colours. The higher the compression level, the more blurry details, noise, and darker colours it will have. This can become a problem when sending the file for professional printing or for assessment as it may not pass the quality test. In that case, lossless compression may be preferred. However, most lossy compression algorithms have an option to adjust how compressed the file is so that a balance between quality and file size can be achieved.

Lossless compression

Lossless compression allows a file to be restored to its original state without losing a single bit of data when the file is decompressed. This means that the representation and quality of the files do not need to be compromised to distribute them online. Lossless compression is commonly used for executable files and documents, like text files and spreadsheets, where the loss of words and numbers would change the information. It is most commonly done using a file archiver software, like 7-Zip or WinRAR. However, lossless compression does not usually achieve the same file size reduction as lossy compression.

Lossless compression reduces the file size by rewriting the data more efficiently instead of discarding specific parts of the file to create a smaller version of the original file. This allows the original data to be perfectly decompressed with no loss of information.

How Many Times Can A File Be Compressed?

There is no limit on the number of times a file can be compressed. Any file can be compressed, but if the file cannot be compressed any further, then both lossless and lossy compression can actually increase the file size. There is no point in compressing a file more than once as the second and further compressions will produce a file size larger than the previous one.

Challenges Of Compression

Computing resources, including the CPU and the memory, are used in both the compression and decompression processes, which can slow down computers and cause problems with older computers that have limited computing resources. To minimise the impact of intensive compression and decompression tasks, compression vendors prioritise speed and resource efficiency optimisations towards the processes.

In addition, compressed files can be incorrectly detected as malware by antivirus software and block the files from being opened, preventing recipients from accessing them. With lossless compression, which is most commonly done using a file archiver program, like 7-Zip and WinRAR, file archive formats often only work with specific archive software which can cause compatibility issues, preventing the recipient from accessing the file.

Difference Between Compression and Data Deduplication

Compression is not the same thing as data deduplication. Data deduplication removes redundant data blocks, whilst compression uses an algorithm to reduce the bits required to represent data. Another difference between compression and data deduplication is that data deduplication is more useful with files that have a lot of redundant data, whilst compression is more useful for reducing the file size, whilst keeping the data the same.