Without diving too deep into the sometimes complex mathematical background of the algorithms the following section gives you an overview of the message digest technology. Message digests (aka checksums) are cryptographic hash functions which always return the same binary value for a fixed set of data (determinism). If two digest calculations return different results then the processed data sets must be different. The hash result is usually represented by a hexadecimal string of fixed length and can be stored e.g. based on the GNU or BSD file format.
MD5 example (hexadecimal result):
23176e28d47e61777f6d246e459b795d
The most commonly used MD5 checksums always consist of a total of 16 single binary values (bytes), the generated hexadecimal strings are therefore 32 characters long. The input data can have arbitrary lengths while the results always have a fixed length. This is especially useful when digests are stored into a database or cache because the space needed can be accurately determined. The digest calculation can be based on different techniques called algorithms. The algorithms can be classified by their characteristic calculation result lengths.
Hash functions are used for many purposes. While they are deployed to ensure data integrity on the file level, hash functions particularly can be used for the encoding of passwords for operating systems and databases. In order to avoid the storage of plain-text passwords on insecure hard disks, the hash results are stored instead. The original passwords cannot be reconstructed from the message digests if algorithms like RIPEMD-160 are used which are known to be secure.
The following table shows the algorithms supported by the current Digester version. The list also shows the lengths of the respective calculation results in bytes and the typical time consumption (in seconds) to calculate a reference file of approximately 4 gigabytes. The following sections provide guidance on which algorithm to use and how the Digester software can be effectively deployed to ensure data integrity on the file level.
Algorithm | Bytes | t/s | Comments |
---|---|---|---|
MD2 | 16 | 1423 | Designed for 8-bit computers |
MD4 | 16 | 219 | Used for password hashes for Windows |
MD5 | 16 | 63 | The most common algorithm besides SHA-1 |
RIPEMD-128 | 16 | 74 | Based on principles of MD4 |
RIPEMD-160 | 20 | 100 | Used within OpenPGP |
RIPEMD-256 | 32 | 74 | Variant of RIPEMD-128 |
RIPEMD-320 | 40 | 94 | Variant of RIPEMD-128 |
SHA-1 | 20 | 107 | Very often used, developed by the NSA |
SHA-224 | 28 | 136 | Developed by der NSA |
SHA-256 | 32 | 139 | Developed by der NSA |
SHA-384 | 48 | 311 | Developed by der NSA |
SHA-512 | 64 | 310 | Developed by der NSA |
Tiger-192 | 24 | 291 | Designed for 64-bit computers, e.g. Gnutella file sharing |
Whirlpool | 64 | 2082 | Based on principles of AES |
Regarding the shown typical time consumption values please note that the Digester software
is able to calculate arbitrary algorithms simultaneously by utilizing a specific multiplexing
technique. The input data is only read once. This allows to calculate multiple algorithms
very efficiently. Since modern computers often lack fast hard disks while the calculations do
not really challenge modern CPUs, the calculation and checking of digests can be done very
efficiently with the Digester software. This makes the software ideal especially for the
processing of large amounts of data while the number of simultaneously used algorithms has
only little impact on the overall performance.
In the following sections the terms original files and digest files are used. Original files are those files for which the digests and signatures have been generated for. These files can be found on FTP servers or are created by backup software. Message digest files contain the digests that have been calculated for the original files. When comparing the digests from the digest files with the calculated digests e.g. after the download, deviations can be identified that indicate transmission errors or manipulated data.