Strange characters in your website

Strange characters in your website

Have you ever had the issue when you’ve discovered strange characters in your website? After a recent migration from MySQL to MariaDB (an offshoot of MySQL) we discovered that some of our websites started showing the “” character next to spaces on some pages. Curiously this only effected sites built using the Silverstripe platform. For some reason WordPress sites were unaffected at this time however I have heard of this happening in WordPress as well.

After a little bit of research I discovered that this was due to an incorrect character encoding. When examining a database dump it became evident that not all tables had the same encoding, they were in a mix of Latin1 and utf8 and normally you want all the tables to be in utf8. Now that we knew what caused the issue, the next step was resolving it in a simple an preferably elegant way. Thankfully google came to the rescue.

After doing some further research and trying several different ways of resolving the issue I came across this article. After going through the article several times it came down to a few commands. Of course the most important command to perform whenever you doing anything to your database is a backup. In our case we created a complete copy of the site on our development systems so we could experiment with the commands. The following is the sequence that worked for us.

First create a backup or dump of the database specifying the character set. In this case we are forcing the incorrect character set. This is so we can convert the data. 

mysqldump --opt --default-character-set=latin1 --skip-extended-insert --user username --password password -r database.dump --log-error=log-database-error.log

Now that you have a dump of the database you can use the replace command on a Linux/Unix style operating system. Alternatively on a Windows based system just use something like notepad and do a search/replace.

replace "CHARSET=latin1" "CHARSET=utf8" "SET NAMES latin1" "SET NAMES utf8" < database.dump > database.dump.2

You can use the diff command to ensure that above command worked.

diff database.dump database.dump.2

Now since I’m the paranoid type when it comes to backups I made sure a backup of the database without any modifications was performed. In case something goes wrong. I cannot stress strongly enough about backups, we have all probably made a big oops like mistake only to find out we have no way of reversing it, so make sure you not only backup but ensure that the backup worked.

mysqldump -u username -p database > database.old.dump

Now that we’ve got that out of the way you can import the modified database back into the system. If all went to plan the strange characters should be gone. If it didn’t work just restore from the backup you just took. If you’ve got a large site and don’t want this knocking it out while running you can always create a second database and point the website to this when done. For the websites we had this issue with we had no downtime on running this against the live site.

mysql --default-character-set=utf8 -u username -p database < database.dump.2

And this it, I hope you got something out of the article. If you would like to provide feedback please feel free to leave a comment, even if you disagree with the content I’d love to hear from you.

Leave a Reply