Since WordPress 2.2, WordPress supports feature that allows the user to define both the WordPress database character set and collation with DB_CHARSET and DB_COLLATE values in wp-config.php file. With these values defined, WordPress will use the designated database character-set (charset) and database collation (i.e. sort order of the letters, numbers, and symbols of a character set) when connecting a database tables.

However, in existing WordPress installations that are upgraded from earlier version of WordPress or not explicitly set a Unicode UTF-8 charset collation, the default database character set is normally set up as Latin1 (default on almost all MySQL installation) with latin1_swedish_ci collation. If you run a bilingual or multilingual blog with WordPress, you may face problem on character encoding when your blog posts are written in other foreign languages, or when you export and backup the database and later attempt to re-import the database dump in the event of database failure or server migration and moving. The symptom is obvious, your WordPress posts or pages will contains garbled, weird and funny characters, sometime just lots of ????? (question marks), rendering the WordPress database with your hard work useless and output unreadable. (May cause also by wrong charset collation)

The best solution to the character encoding problem in WordPress is to convert the charset or database and collation to UTF-8 or Unicode. However, you CANNOT simply connect to MySQL via shell or phpMyAdmin and hoping all your scripts will convert nicely. As explain by WordPress database conversion guide, convert character sets requires using the the MySQL ALTER TABLE command. When converting the character sets, all TEXT (and similar) fields are converted to UTF-8, but that conversion will BREAK existing TEXT because the conversion expects the data to be in latin1, but WordPress may have stored Unicode characters in a latin1 database, and as a result, data could end up as garbage after a conversion!

The guide provides a very rough and vague guide as a solution on how to actually convert WordPress MySQL database tables from one character set to another, usually UTF-8. However, the guide actually works, although the process can be lengthy. To convert, the steps involved generally are to alter each and every TEXT and related fields inside every WP tables to BLOB, then alter the character set of database and finally change the BLOB fields back to TEXT. Looks easy, but how long it would take to convert so many fields on so many tables? Furthermore, you will also need to remember the original type and length or values of all fields.

andersapt has posted a conversion script named convert_to_utf8_sql_generator.txt (no longer available) which automatically generates a list of SQL statements and commands need to fully convert your WordPress database to UTF8 based on the guide. However, there seems to be a minor bug with the script, although the author claimed it worked, where in my case, it simply won’t generate the list of SQL commands to run due to the error “PHP Fatal error: Call to a member function get_results() on a non-object in convert.php on line 37”. Once fixed, with this script in hand we can easily and quickly convert the database, tables and fields to use utf8_general_ci collation.

Note: I have tried out UTF-8 Database Converter plugin (no longer available), but it’s a failure. It seemed like the author change to character set directly.

Guide to Convert WordPress Database Character Set to UTF8 (Unicode)

  1. Take the WordPress blog offline by placing a out-of-service or maintenance notice.
  2. Backup database – this is very important, nothing is guaranteed to work. If you’re using cPanel or other control panel, it’s best to perform a database backup from the control panel itself, where you can restore the database in one piece instead of by SQL statements, in the case of normal dump.
  3. Download the fixed convert_to_utf8_sql_generator script and save it with a PHP extension.
  4. Modify the script to input the database name your WordPress blog is using. Locate the following text:

    Tables_in_DATABASENAME

    The DATABASENAME in red is the only thing that you need to change to match your WordPress database name. It should looks like this after change, for instance,

    Tables_in_wp_techjourney
  5. Upload the convert_to_utf8_sql_generator.php (or you can rename to a shorter name such as convert.php) to the base root WordPress installation directory, where wp-config.php is also located.
  6. Now, call and browse the script from any web browser. To do this, simply add convert_to_utf8_sql_generator.php (or any name you give to the script) to the end of your blog URL (i.e http://www.mywebsite.com/convert_to_utf8_sql_generator.php) and press Enter. A long list of SQL statements will be generated on the web page.
  7. Ensure that your post_content and post title fields on wp_posts table DOES NOT belongs to any indexes or FULLTEXT indexes. Else the type of the fields may not be converted to BLOB with one of the errors list below. Some plugins, such as related posts tend to add indexes to these fields. In this case, temporarily drop the indexes.
    ERROR 1170 (42000): BLOB/TEXT column 'post_content' used in key specification without a key length
    ERROR 1283 (HY000): Column 'post_content' cannot be part of FULLTEXT index
  8. Login to your server shell by telnet or SSH. You can skip this part of using Unix shell if you intend to use phpMyAdmin to do the dirty work, but I have not tried it. So if you do, do feedback on whether it can be done.
  9. Connect to MySQL server from the shell.
  10. Issue the following command first in MySQL prompt:
    use DATABASENAME;

    Again, replace DATABASENAME in red to the actual WordPress database name.

  11. Then copy and paste the whole list of SQL statements auto generated by the conversion script, and paste them into the MySQL prompt. Each and every SQL command should now be processed and executed by MySQL one by one. You may need to press Enter key to finish off the last one.
  12. During the processing, the similar error messages related to key length as mentioned may appears. In my case, the conversion to BLOB failed with such message in the following fields:

    wp_categories.category_nicename
    wp_comments.comment_approved
    wp_links.link_visible
    wp_options.option_name
    wp_postmeta.meta_key
    wp_posts.post_status
    wp_posts.post_name
    wp_posts.post_type
    wp_usermeta.meta_key
    wp_users.user_login

    All these fields are unlikely to contains non-ASCII characters. And fields such as category_nicename (category slug) and post_name (post slug) have been URL encoded (where your URL with unsafe non-alphanumeric characters will be replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs). Initial encoding of byte codes and character assignments for UTF-8 is consistent with ASCII, so direct conversion of these fields to UTF8 should not bring too much problem.

  13. Edit the wp-config.php file to add in DB_CHARSET and DB_COLLATE definitions. Add the following two lines, preferably under the section of MySQL Settings:
    define('DB_CHARSET', 'utf8');
    define('DB_COLLATE', '');

    As explained in WordPress Codex, DB_COLLATE is left blank (null) so that the database collation will be automatically assigned by MySQL based on the database character set.

  14. Recreate the indexes and/or FULLTEXT indexes been dropped, if any.
  15. Activate the blog back into production mode.
  16. Check your blog to see if everything and every characters is okay.
  17. Delete the PHP script.