A zip bomb, sometimes called the more generic, decompression bomb, is a compressed file archive that takes advantage of compression algorithms to store a massive amount of data in a relatively small file, so, when the file is decompressed, there is an "explosion" of data. Zip bombs are generally used for malicious purposes, but they can also be used as a teaching tool to demonstrate novel concepts behind lossless compression and optimization.
I don't remember exactly when I learned about zip bombs, but it was sometime in my late 30s. I had toyed with writing hard drive filling programs in my teens, but I found the concept of tricking virus scanners into doing the dirty work for you quite clever.
There are two common forms of zip bombs, those which contain a countable number of files, and those that contain recursive files.
Zip bombs which have a countable number of files work by taking advantage of the fact that lossless compression stores repeating patterns. For example, if you wanted to store a file which contained the letter "A" twenty times, it would look like this:
However, because the data uses a repeating pattern, you could also represent the file in a compressed form, like this:
As long as the program reading the file is instructed to repeat the letter the amount of times indicated by the number, you can store the same amount of data in a much smaller space. Consider then a file which contains the letter "a" one billion times which would require one billion bytes uncompressed, but could be compressed much smaller to this:
The zip archive specification also allows combining together patterns across multiple files, so the archive can repeat the same huge file over and over again while only needing a couple additional bytes for each repetition.
A popular example of a zip bomb of this type is 42.zip, so named because it has a file size of only 42,838 bytes. It decompresses to a whopping 4,503,599,626,321,920 bytes by repeating the same 4.3 GB file over a million times!
Recursive zip bombs use an interesting trick where the file that is extracted from the archive is identical to the archive itself. Their compressed archive looks something like:
The file inside this archive contains the data: "The file inside this archive contains the data"
Trying to decompress such an archive entirely results in an unending loop of decompression.
An example of a recursive zip bomb is r.zip which is a 440 byte archive, but, each decompression results in a file called r.zip which is the exact same 440 bytes.
Like most forms of information, zip bombs can be used for education and misconduct. Zip bombs are educational because, in order to create them, a person must learn and understand how the lossless compression works. In fact, since they require more hard drive space and CPU processing power than most computers are capable of, they can only be created by those who understand the process of compression, not by traditional means. However, they are typically used for malicious reasons.
Since they decompress to greater amounts of data than most hard drives can store, zip bombs can be used to very rapidly fill up a hard drive. This type of prank has been around since the early days of computing and can be created with only a couple lines of code. The trouble is, the program must be run by the victim, and even a novice is wary to run an unknown program. However, archive files are typically viewed as harmless because they're not actually executed, only viewed, but it is precisely for this reason they can be such a nuisance.
Zip bombs are primarily used to occupy virus scanners. A proper virus scanner will scan the contents of compressed archives in order to make sure they don't contain malicious software, but, due to the nature of zip bombs, the virus scanner may take days to scan a zip bomb, or it may hog all the system memory or crash if it's scanning a recursive zip bomb. While the virus scanner is hung up on the zip bomb, other malicious software can go unnoticed and infect the computer, or the user might get annoyed with the virus scanner hogging all their resources, and uninstall it.
Modern virus scanners are now effectively immune to zip bombs because they include algorithms to detect recursive zip files or zip files with nothing but repeating content, so they're no longer the threat they once were. However they can still cause havoc on indexing services, file searches, and any other automated process that doesn't take them into account.
This download contains two popular zip bombs, "42" and "r." 42 extracts to 4.5 petabytes of data and r is a recursive archive which decompresses to itself. Each file is password protected with the zip bomb's name in order to prevent accidental decompression. These files are safe to look at, but don't try to decompress them fully.