Avoiding character set issues with PHP & MySQL (checklist)

Creating a MySQL powered websites, applications or content management systems often involves dealing with unpleasant character encoding related issues that are hard (and so not fun) to diagnose. In the worst case the behavior may even vary between your development and production environment.

I’ve never been digging deep enough into character encoding (since it’s not fun), but I plan to reading this promising blog post recommended somewhere at stackoverflow: Getting out of MySQL Character Set Hell

Before I get around to doing it (it’s quite long and detailed), here is a quick list of rules to follow I’ve came up with after a lot of trial and error. I might update it after I read the article.

The checklist

  • The database’s collation must be set to utf8_general_ci (or anything more language-specific, utf8_language_ci)
  • Every column in every table must have the same collation
  • Connection to the database included in every PHP page should be followed with this MySQL query: SET NAMES 'utf8' COLLATE 'utf8_general_ci'
  • PHP must provide a HTTP header before any output, which specifies encoding: header('Content-type: text/html; charset=utf-8');
  • The HTML header must specify encoding: <meta charset="UTF-8">

There is one special case that needs to be kept in mind:

  • Whenever unsing PHP’s htmlentities() function, you must specify UTF-8 encoding

The source

Just in case anyone is wondering, the stackoverflow question that brought me to the blog post is here, and the user adrienne, author of the best answer, lists these rules:

  • The DB connection is using UTF-8
  • The DB tables are using UTF-8
  • The individual columns in the DB tables are using UTF-8
  • The data is actually stored properly in the UTF-8 encoding inside the database (often not the case if you’ve imported from bad sources, or changed table or column collations)
  • The web page is requesting UTF-8
  • Apache is serving UTF-8

One comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s