Perhaps some of the most annoying bugs to fix are charset encoding failures. If you’ve ever had to make an application that supports non-latin characters (and these days any serious application should) you probably know what I’m talking about. This is the kind of problem that you don’t learn to fix until you’ve experienced it yourself a few times, and sometimes it’s actually quite difficult to do so as any part of the application could be the culprit. It can even be some part that you yourself did not make.
![]()
It’s important to realize that most of these problems can be avoided by making sure everything is correctly stored into the database. It’s a bad thing to do any kind of character conversion upon retrieving data. (What if you also get the data at some other part of the application and forget to do the conversion there? What if the encoding changes yet again?) However, the problem cannot be solved just by making sure your cols are encoded in UTF8: you must make sure the connection itself is in UTF8, too. This is where set names comes in.
mysql> create database `a`;
Query OK, 1 row affected (0.07 sec)
mysql> use `a`;
Database changed
mysql> create table `test` ( `str` varchar(255) ) default charset=utf8;
Query OK, 0 rows affected (0.09 sec)
mysql> set names 'utf8';
Query OK, 0 rows affected (0.04 sec)
mysql> insert into `test` ( `str` ) values ( 'ä' );
Query OK, 1 row affected (0.03 sec)
mysql> set names 'latin1';
Query OK, 0 rows affected (0.00 sec)
mysql> insert into `test` ( `str` ) values ( 'ä' );
Query OK, 1 row affected (0.00 sec)
mysql> set names 'utf8';
Query OK, 0 rows affected (0.00 sec)
mysql> select `str` from `test`;
+------+
| str |
+------+
| ä |
| ä |
+------+
2 rows in set (0.00 sec)
As you can see, even though we wanted to insert the UTF8 string "ä" into the UTF8 column str, the data ended up being malformed under the latin1 connection.
Of course, this is only one possibility, but one that I’ve found to be the cause quite often. Keep in mind that you may not experience this problem until you deploy, as the server’s default connection might use a different encoding than your local server’s.