Saturday, May 05, 2007

md5-hash of strings are not the same with different charactersets

I maintain a website which hosts a forum, using popular forum software. This forum stores an md5-hash of the passwords of the users of this forum in a database. This website also has an admin section which is protected by a password. The passwords of the forum database are used to get access to the admin section. To do this the md5-hash of the submitted password is compared with the md5-hash stored in the database, exactly the same way as it is done by the forum software.

This week one of the users of the website reported to me that he was unable to access the admin section, but was still able to log in to the forum. The mechanism to check the password is identical for the admin section and the forum (as described in the previous paragraph), so at first I didn't understand why he couldn't access the admin section of the website. After some debugging I found out that the md5() function produced a different hash of the same password. It produced a correct hash, which was identical to the hash stored in the database, on the forum, but a different hash came up on the admin section.

I then remembered that the webserver (Apache 1.3) was upgraded a week earlier. The new webserver (Apache 2.0) uses a different default characterset (UTF-8), causing the website to work perfectly, but some special characters were replaced with question marks. This problem was solved by changing the characterset, in a .htaccess file in the directory of the forum, as the problem only occured there :

In .htaccess I added:

AddDefaultCharset ISO-8859-1

Only the forum used the old characterset, while the rest of the website, including the admin section, used the new default characterset of the webserver. Everything seemed to work fine.

Until this week, when that user couldn't access the admin section, while some other users still could login to the admin section. After some investigation I found out that the user that couldn't login used some special characters in his password. Then I started to realise that the md5() function must produce a different hash of the same string when it is encoded in a different characterset.
This makes perfect sense. In a lot of charactersets, normal alphanumeric characters (a-z, A-Z, 0-9) are in the same place, but some special characters like é or @, can have a different place in another characterset. When a string encoded in different charactersets contains special characters, it has a different value (on a binary/hexadecimal level). Thus when a hash is calculated of these strings, different hashes are produced.

Now that I understood what was happening I solved the problem by applying the same characterset to the entire website. The user who reported the problem was again able to login to the admin section.

No comments: