A cryptographic hash function is an algorithm that takes data and computes a string of bits from it called a “digest” (or commonly a “hash”). The algorithm has two characteristics: 1) it is extremely unlikely that two actual data samples (even nearly identical samples) will give the same digest and 2) it is computationally infeasible to construct input data that gives a specified digest. In practice, the digest value is relatively short, for example 16 bytes in one popular method.
One application of a cryptographic hash is to determine if a document has been altered. If the digest value of a document is known then any copy of a document can be validated by computing its digest value and comparing the digest to the known one. Another is to store passwords in a system: the digest is stored that can be compared to a user-entered password, but from which the password can’t be recovered except by brute force (guessing all possible passwords).
One often-used cryptographic hashing algorithm is called MD5. While MD5 isn’t used for high-security applications any more (SHA-2 is preferred), it’s still useful for consumer purposes. Here’s an example of its use. Take the White House PDF image of the President’s long-form birth certificate. When one computes the MD5 digest of the file, the result is:
(For a hoot, search for that sting with a search engine.)
One can look at comments at ObamaReleaseYourRecords about the PDF file and note that someone there computed the MD5 hash and that the value is the same as the one preceding. We’re both looking at the same file. Because of characteristic 2 of of the algorithm, you can rule out someone tinkering with the file in such a way that the digest doesn’t change.
The hash applies only to data in the file, not data about the file, such as your computer’s “modified date.” In theory, you can do most anything with the file and the hash won’t change. There is one area of caution. When transferring text files from a Unix-like system and a Windows-like system, transport software will sometimes convert the Unix Line Feed end of line character to the Windows standard of Carriage Return – Line Feed. If this happens, the file will LOOK the same when edited on both systems, but the hash will be different.
I bring this topic up because the birthers have discovered MD5, and one can see it being added to the confusion of objections about the long form PDF. A claim is made that one PDF file on the White House web site was replaced by another, proved by a different MD5 hash. There’s no way for me to know if the purported earlier file is actually from the White House or not because it is very simple to use a specialized editor to change any data inside a file.
This post at The Free Republic is interesting:
Filename : birth-certificate-long-form.pdf
MD5 : 34a7aeb10b7077520e5a976a02de877b
SHA1 : 94c685734363002c26c8c077c74f233f3f44aca9
CRC32 : a800cf57
Proving that this is still the same file, as the one downloaded at 8 AM Wed, 27 April.
Hang onto that – it might be useful one day.