Compress and decompress files

Use gzip, gunzip, and tar to compress, decompress, and archive files.

Why compress files?

Raw data files and files created during analyses can be large (up to hundreds of GB).

Compressing files is an efficient way to save disk space.

Compress using gzip

The gzip command can be used to compress files.

gzip file1.txt

Compressing a file using the ‘gzip’ command.

By default, the original file is removed, and the file extension .gz is added to the original file name, to form the name of the new compressed file.

Decompress using gunzip

The gunzip command can be used to decompress files that were compressed using the gzip command.

gunzip file1.txt.gz

Decompressing a file using the ‘gunzip’ command.

By default, the original file is removed, and the file extension .gz is removed from the original file name, to form the name of the new compressed file.

Redirect to standard output

The option -c can be used in both commands gzip and gunzip for major benefits:

  • The original files are kept unchanged (i.e., not deleted).
  • The compressed or decompressed output is redirected to the standard output of the command, meaning that the symbol > can be used to redirect that standard output to any filename (circumventing the default behaviour of both commands).

For instance:

gzip -c file1.txt > compressed.txt.gz

Using the ‘-c’ option of the ‘gzip’ command.

Archive using tar

The tar command stands for tape archive.

It is an archiving file format that combines multiple files – and directories – into a single file, called a tar archive.

Optionally, tar archives can be further compressed during their creation – using, for instance, the gzip command.

The tar command can be used to create, modify, and extract files that are archived in the .tar format.

For instance, a directory and a file can be archived together as follows:

tar -czvf archive.tar.gz file1.txt dir1

Archiving files and directories using the ’tar’ command.

In particular:

  • The option -c creates a new archive.
  • The option -z further compresses the archive using the gzip command.
  • The option -v verbosely lists the files processed as they are being archived.
  • The option -f declares the name (and location) of the archive file to create.

Extract from a tar archive

As mentioned above, the tar command can also be used to extract files and directories from a tar archive.

For instance:

tar -xvzf archive.tar.gz

Extracting files and directories from a tar archive.

Stream compressed files

The zcat command can be used to stream the contents of compressed files to the standard output or to the standard input of a downstream commands.

This method is commonly used to bypass the need for a temporary decompressed copy of the file.

For instance:

zcat file1.txt.gz | head

Stream the contents of a compressed file.

Interactively scroll through compressed files

The zless command can be used as an equivalent to the less command for compressed files.

For instance:

zless file1.txt.gz

Interactively scroll the contents of a compressed file.

Final words

Many programs support gzip-compressed input files. For those programs, there is no need to decompress the files before use.