Main Menu

Cryptographic hashing

A cryptographic hash function is an algorithm that takes data and computes a string of bits from it called a “digest” (or commonly a “hash”). The algorithm has two characteristics: 1) it is extremely unlikely that two actual data samples (even nearly identical samples) will give the same digest and 2) it is computationally infeasible to construct input data that gives a specified digest. In practice, the digest value is relatively short, for example 16 bytes in one popular method.

One application of a cryptographic hash is to determine if a document has been altered. If the digest value of a document is known then any copy of a document can be validated by computing its digest value and comparing the digest to the known one. Another is to store passwords in a system: the digest is stored that can be compared to a user-entered password, but from which the password can’t be recovered except by brute force (guessing all possible passwords).

One often-used cryptographic  hashing algorithm is called MD5. While MD5 isn’t used for high-security applications any more (SHA-2 is preferred), it’s still useful for consumer purposes. Here’s an example of its use. Take the White House PDF image of the President’s long-form birth certificate. When one computes the MD5 digest of the file, the result is:

34a7aeb10b7077520e5a976a02de877b

(For a hoot, search for that sting with a search engine.)

One can look at comments at ObamaReleaseYourRecords about the PDF file and note that someone there computed the MD5 hash and that the value is the same as the one preceding. We’re both looking at the same file. Because of characteristic 2 of of the algorithm, you can rule out someone tinkering with the file in such a way that the digest doesn’t change.

The hash applies only to data in the file, not data about the file, such as your computer’s “modified date.” In theory, you can do most anything with the file and the hash won’t change. There is one area of caution. When transferring text files from a Unix-like system and a Windows-like system, transport software will sometimes convert the Unix Line Feed end of line character to the Windows standard of Carriage Return – Line Feed. If this happens, the file will LOOK the same when edited on both systems, but the hash will be different.

I bring this topic up because the birthers have discovered MD5, and one can see it being added to the confusion of objections about the long form PDF. A claim is made that one PDF file on the White House web site was replaced by another, proved by a different MD5 hash. There’s no way for me to know if the purported earlier file is actually from the White House or not because it is very simple to use a specialized editor to change any data inside a file.

This post at The Free Republic is interesting:

Filename : birth-certificate-long-form.pdf

MD5 : 34a7aeb10b7077520e5a976a02de877b

SHA1 : 94c685734363002c26c8c077c74f233f3f44aca9

CRC32 : a800cf57

Proving that this is still the same file, as the one downloaded at 8 AM Wed, 27 April.

Hang onto that – it might be useful one day.

Learn more:

, ,

8 Responses to Cryptographic hashing

  1. avatar
    Obsolete August 7, 2011 at 11:56 pm #

    Is there any subject the learned birthers aren’t experts on?

  2. avatar
    Dr. Conspiracy August 8, 2011 at 11:18 am #

    I am pleased to announce the first product from Dr. Conspiracy.com: Dr. Conspiracy’s Hash Tool. This neat utility for Windows supports MD2, MD4, MD5, SHA-1, CRC-16 and CRC-32 hashing as well as computing the file size. The product includes a Windows installer.

    A download link has been added to the article above. Right-click the link and select Save As, then run the EXE installer.

    Freeware. Delphi XE source is available upon request.

  3. avatar
    J.Potter August 8, 2011 at 11:23 am #

    I can’t seem to find the related PayPal button ….. where’s the obligatory PayPal button? How the heck am I supposed to pay for this thing?!?! 😉

    Dr. Conspiracy:
    I am pleased to announce the first product from Dr. Conspiracy.com: Dr. Conspiracy’s Hash Tool. This neat utility for Windows supports MD2, MD4, MD5, SHA-1, CRC-16 and CRC-32 hashing as well as computing the file size. The product includes a Windows installer.

    A download link has been added to the article above. Right-click the link and select Save As, then run the EXE installer.

    Freeware. Delphi XE source is available upon request.

  4. avatar
    Daniel August 8, 2011 at 11:30 am #

    Dr. Conspiracy:
    I am pleased to announce the first product from Dr. Conspiracy.com: Dr. Conspiracy’s Hash Tool.

    Sorry Doc but as an IT professional it’s obvious that this is not a valid hash utility.

    In order to be eligible it needs two C+ Parsings

  5. avatar
    Dr. Kenneth Noisewater (Bob Ross) August 8, 2011 at 11:38 am #

    Hash tool? I’m disappointed I was expecting it to dish out a free prescription.

  6. avatar
    PaulG August 8, 2011 at 12:37 pm #

    If there’s anything birthers are good at, it’s making a hash of things.

  7. avatar
    El Diablo Negro August 8, 2011 at 9:27 pm #

    MD5 is old school encryption, it can be easily hacked. When I work with encryption, I use AES (128,192 or 256).

    AES is theoretically unhackable since no one has been able to crack it.

    “….That means in 48 years we can shave another 32 bits off the encryption armor which means 5 trillion future computers might get lucky in 5 years to find the key for RC5 128-bit encryption. But with 256-bit AES encryption, that moves the date out another 192 years before computers are predicted to be fast enough to even attempt a massively distributed attack. To give you an idea how big 256 bits is, it’s roughly equal to the number of atoms in the universe!”

    http://www.zdnet.com/blog/ou/is-encryption-really-crackable/204

    I say, good luck with that.

  8. avatar
    dunstvangeet August 8, 2011 at 10:24 pm #

    El Diablo Negro, MD5 isn’t encryption, it’s a Hash value. It means that it gives a defined length output, no matter the length of the input. Multiple files, put through this algorithm, will have the same MD5 value. However, it’s virtually impossible to see how the MD5 value will change by making a specific change to it. With MD5, you’ll get a specific output that is exactly 128 bits long (32 hexidecimal characters). It’s a quick way to see if something has changed, because a small change to the file will result in a substancially different MD5 value. Computer Forensic Techs actually use this to prove that there has been no changes to the file.

    By the way, MD5 is a one-way algorithm. That means that once you put it through, you can’t get it back (because if you’re reducing data to 128-bits, that must mean that there are multiple files that can theoretically produce the same MD5 hash value. There’s no determining from a MD5 hash file, the ability to reconstruct the file).

    AES is an actual encryption. What this means is that you’ll put this through the algorithm, and get a bunch of gobbly-gook that’s exactly the same file size as your previous file. They have two completely different jobs.