About Message Digests

Without diving too deep into the sometimes complex mathematical background of the algorithms the following section gives you an overview of the message digest technology. Message digests (aka checksums) are cryptographic hash functions which always return the same binary value for a fixed set of data (determinism). If two digest calculations return different results then the processed data sets must be different. The hash result is usually represented by a hexadecimal string of fixed length and can be stored e.g. based on the GNU or BSD file format.

MD5 example (hexadecimal result):

            23176e28d47e61777f6d246e459b795d

The most commonly used MD5 checksums always consist of a total of 16 single binary values (bytes), the generated hexadecimal strings are therefore 32 characters long. The input data can have arbitrary lengths while the results always have a fixed length. This is especially useful when digests are stored into a database or cache because the space needed can be accurately determined. The digest calculation can be based on different techniques called algorithms. The algorithms can be classified by their characteristic calculation result lengths.

Hash functions are used for many purposes. While they are deployed to ensure data integrity on the file level, hash functions particularly can be used for the encoding of passwords for operating systems and databases. In order to avoid the storage of plain-text passwords on insecure hard disks, the hash results are stored instead. The original passwords cannot be reconstructed from the message digests if algorithms like RIPEMD-160 are used which are known to be secure.

The following table shows the algorithms supported by the current Digester version. The list also shows the lengths of the respective calculation results in bytes and the typical time consumption (in seconds) to calculate a reference file of approximately 4 gigabytes. The following sections provide guidance on which algorithm to use and how the Digester software can be effectively deployed to ensure data integrity on the file level.

 

Algorithm Bytes    t/s Comments
MD2 16 1423   Designed for 8-bit computers
MD4 16 219 Used for password hashes for Windows
MD5 16 63 The most common algorithm besides SHA-1
RIPEMD-128    16 74 Based on principles of MD4
RIPEMD-160 20 100 Used within OpenPGP
RIPEMD-256 32 74 Variant of RIPEMD-128
RIPEMD-320 40 94 Variant of RIPEMD-128
SHA-1 20 107 Very often used, developed by the NSA
SHA-224 28 136 Developed by der NSA
SHA-256 32 139 Developed by der NSA
SHA-384 48 311 Developed by der NSA
SHA-512 64 310 Developed by der NSA
Tiger-192 24 291 Designed for 64-bit computers, e.g. Gnutella file sharing
Whirlpool 64 2082 Based on principles of AES


Regarding the shown typical time consumption values please note that the Digester software is able to calculate arbitrary algorithms simultaneously by utilizing a specific multiplexing technique. The input data is only read once. This allows to calculate multiple algorithms very efficiently. Since modern computers often lack fast hard disks while the calculations do not really challenge modern CPUs, the calculation and checking of digests can be done very efficiently with the Digester software. This makes the software ideal especially for the processing of large amounts of data while the number of simultaneously used algorithms has only little impact on the overall performance.

In the following sections the terms original files and digest files are used. Original files are those files for which the digests and signatures have been generated for. These files can be found on FTP servers or are created by backup software. Message digest files contain the digests that have been calculated for the original files. When comparing the digests from the digest files with the calculated digests e.g. after the download, deviations can be identified that indicate transmission errors or manipulated data.

Additional Links