File types and extensions

Understand file types and extensions.

Plain text and binary files

File contents can be categorised in two major types:

  • Plain text
  • Binary

Plain text files contain only characters of readable materials that can be displayed by most text editors. However, plain text files cannot contain graphical representations, nor other computational objects (e.g., images, sounds).

Binary files describe computer files that are not plain text. Many different types of binary file formats are used to store virtually any type of of file content whatsoever (e.g., compiled computer programs, images, sounds, formatted documents).

In bioinformatics, binary file formats are commonly used to store compressed versions of equivalent plain text files (e.g., the SAM and BAM file formats).

File extensions

File extensions are suffixes appended to the end of filenames, to indicate the file format used in that particular file.

Many programs make use of file extensions in their input and output file names:

  • To parse the contents of input files according to their format.
  • To indicate the format used to write the contents of their output files.

Many file extensions have been – and continue to be – created to describe plain text file formats that structure their contents differently.

Examples of bioinformatics file formats include, among many others:

  • .fasta – biological sequence information.
  • .fastq – sequence information with quality scores.
  • .sam – alignments of sequences to a reference genome.
  • .gtf, .bed – genomic coordinates of sequence features (e.g., exons, peaks).