How to Convert Character Set and Collation of WordPress Database

Since WordPress 2.2, WordPress supports feature that allows the user to define both the WordPress database character set and collation with DB_CHARSET and DB_COLLATE values in wp-config.php file. With these values defined, WordPress will use the designated database characterset (charset) and database collation (i.e. sort order of the letters, numbers, and symbols of a character set) when connecting a database tables.

However, in existing WordPress installations that are upgraded from earlier version of WordPress or not explicitly set a Unicode UTF-8 charset collation, the default database character set is normally set up as Latin1 (default on almost all MySQL installation) with latin1_swedish_ci collation. If you run a bilingual or multilingual blog with WordPress, you may face problem on character encoding when your blog posts are written in other foreign languages, or when you export and backup the database and later attempt to re-import the database dump in the event of database failure or server migration and moving. The symptom is obvious, your WordPress posts or pages will contains garbled, weird and funny characters, sometime just lots of ????? (question marks), rendering the WordPress database with your hard work useless and output unreadable. (May cause also by wrong charset collation)

The best solution to the character encoding problem in WordPress is to convert the charset or database and collation to UTF-8 or Unicode. However, you CANNOT simply connect to MySQL via shell or phpMyAdmin and hoping all your scripts will convert nicely. As explain by WordPress database conversion guide, convert character sets requires using the the MySQL ALTER TABLE command. When converting the character sets, all TEXT (and similar) fields are converted to UTF-8, but that conversion will BREAK existing TEXT because the conversion expects the data to be in latin1, but WordPress may have stored unicode characters in a latin1 database, and as a result, data could end up as garbage after a conversion!

The guide provides a very rough and vague guide as a solution on how to actually convert WordPress MySQL database tables from one character set to another, usually UTF-8. However, the guide actually works, although the process can be lengthy. To convert, the steps involved generally are to alter each and every TEXT and related fields inside every WP tables to BLOB, then alter the character set of database and finally change the BLOB fields back to TEXT. Looks easy, but how long it would take to convert so many fields on so many tables? Furthermore, you will also need to remember the original type and length or values of all fields.

andersapt has posted a conversion script named convert_to_utf8_sql_generator.txt which automatically generates a list of SQL statements and commands need to fully convert your WordPress database to UTF8 based on the guide. However, there seems to be a minor bug with the script, although the author claimed it worked, where in my case, it simply won’t generate the list of SQL commands to run due to the error “PHP Fatal error: Call to a member function get_results() on a non-object in convert.php on line 37″. Once fixed, with this script in hand we can easily and quickly convert the database, tables and fields to use utf8_general_ci collation.

Note: I have tried out UTF-8 Database Converter plugin, but it’s a failure. It seemed like the author change to character set directly.

Guide to Convert WordPress Database Character Set to UTF8 (Unicode)

  1. Take the WordPress blog offline by placing a out-of-service or maintenance notice.
  2. Backup database – this is very important, nothing is guaranteed to work. If you’re using cPanel or other control panel, it’s best to perform a database backup from the control panel itself, where you can restore the database in one piece instead of by SQL statements, in the case of normal dump.
  3. Download the fixed convert_to_utf8_sql_generator.txt script and save it with a PHP extension.
  4. Modify the script to input the database name your WordPress blog is using. Locate the following text:

    Tables_in_DATABASENAME

    The DATABASENAME in red is the only thing that you need to change to match your WordPress database name. It should looks like this after change, for instance,

    Tables_in_wp_mydigitallife

  5. Upload the convert_to_utf8_sql_generator.php (or you can rename to a shorter name such as convert.php) to the base root WordPress installation directory, where wp-config.php is also located.
  6. Now, call and browse the script from any web browser. To do this, simply add convert_to_utf8_sql_generator.php (or any name you give to the script) to the end of your blog URL (i.e http://www.mywebsite.com/convert_to_utf8_sql_generator.php) and press Enter. A long list of SQL statements will be generated on the web page.
  7. Ensure that your post_content and post title fields on wp_posts table DOES NOT belongs to any indexes or FULLTEXT indexes. Else the type of the fields may not be converted to BLOB with one of the errors list below. Some plugins, such as related posts tend to add indexes to these fields. In this case, temporarily drop the indexes.

    ERROR 1170 (42000): BLOB/TEXT column ‘post_content’ used in key specification without a key length

    ERROR 1283 (HY000): Column ‘post_content’ cannot be part of FULLTEXT index

  8. Login to your server shell by Telnet or SSH. You can skip this part of using Unix shell if you intend to use phpMyAdmin to do the dirty work, but I have not tried it. So if you do, do feedback on whether it can be done.
  9. Connect to MySQL server from the shell.
  10. Issue the following command first in MySQL prompt:

    use DATABASENAME;

    Again, replace DATABASENAME in red to the actual WordPress database name.

  11. Then copy and paste the whole list of SQL statements auto generated by the conversion script, and paste them into the MySQL prompt. Each and every SQL command should now be processed and executed by MySQL one by one. You may need to press Enter key to finish off the last one.
  12. During the processing, the similar error messages related to key length as mentioned may appears. In my case, the conversion to BLOB failed with such message in the following fields:

    wp_categories.category_nicename
    wp_comments.comment_approved
    wp_links.link_visible
    wp_options.option_name
    wp_postmeta.meta_key
    wp_posts.post_status
    wp_posts.post_name
    wp_posts.post_type
    wp_usermeta.meta_key
    wp_users.user_login

    All these fields are unlikely to contains non-ASCII characters. And fields such as category_nicename (category slug) and post_name (post slug) have been URL encoded (where your URL with unsafe non-alphanumeric characters will be replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs). Initial encoding of byte codes and character assignments for UTF-8 is consistent with ASCII, so direct conversion of these fields to UTF8 should not bring too much problem.

  13. Edit the wp-config.php file to add in DB_CHARSET and DB_COLLATE definitions. Add the following two lines, preferably under the section of MySQL Settings:

    define(’DB_CHARSET’, ‘utf8′);
    define(’DB_COLLATE’, ”);

    As explained in WordPress Codex, DB_COLLATE is left blank (null) so that the database collation will be automatically assigned by MySQL based on the database character set.

  14. Recreate the indexes and/or FULLTEXT indexes been dropped, if any.
  15. Activate the blog back into production mode.
  16. Check your blog to see if everything and every characters is okay.
  17. Delete the PHP script.

29 Responses to “How to Convert Character Set and Collation of WordPress Database”

Pages: [2] 1 » Show All

  1. Americanas Blogas Obviented » Blog Archive » Wordpress 2.8 – brak ogonków. Odsyfianie bazy.
    September 29th, 2009 06:35
    29

    [...] http://www.mydigitallife.info/2007/06/23/how-to-convert-character-set-and-collation-of-wordpress-dat... [...]

  2. blog from OUR kitchen » I think I have finally gotten rid of the stray Âs
    September 25th, 2009 12:29
    28

    [...] then, third time lucky, I got the instructions at mydigitallife.info: how to convert character set and collation of wordpress database to work for me! [...]

  3. MySQL Tech Bleg | Neptunus Lex
    June 14th, 2009 00:15
    27

    [...] encoding changes from latin to UTF-8 after version Wordpress 2.2. This occurred in May 0f 2007, but the potential solution is dauntingly complex for a mere [...]

  4. daniel
    May 29th, 2009 03:59
    26

    How does one “temporarily drop the indexes”? I’m getting the error referenced in Step 7 above. Thanks.

Pages: [2] 1 » Show All

Leave a Reply

You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe to comments feature has been disabled. To receive notification of latest comments posted, subscribe to My Digital Life Comments RSS feed or register to receive new comments in daily email digest.

New Articles

Incoming Search Terms for the Article

wordpress character encoding - mysql change collation utf8 - mysql change collation all columns - change wordpress to charset=iso - how to change oracle database encoding to UTF8 - convert char to char in column - wordpress rss wrong character set - changing mysql db collation - dotclear to wordpress charset trouble - latin1_swedish_ci converter - windows server 2005 unicode russian character set csv - character encoding wordpress - qtranslate chinese convert wordpress database UTF-8 - wordpress russian characters display charset] - change encoding sql database wordpress - latin1_swedish_ci convert utf-8 - MySQL connection collation for wordpress - BLOB/TEXT column 'post_content' used in key specification without a key length - converter Collation sql server - mysql connection collation for arabic wordpress - mysql collation español mexico numbers - phpmyadmin convert collation - sql convert latin - wordpress config collate - wordpress charset - as collation convert - convertCharset usage - convert charset php - dump database in exact charset - how to define database characterset phpmyadmin - mysql convert data from utf-8 bytes -mysqldump - mysql character converting - new wordpress tables use swedish collation - office 2007 convert to utf8 - php convert charset - php charset - phpmyadmin how to change Collation - problème charset wordpress etch - sql CONVERT collation - url slug foreign characters - utf-c conversion phpmyadmin - wordpress blog charset - wordpress database encoding - blob latin2 utf8 - cannot be part of FULLTEXT index - change the character set in wordpress - collation utf8 unicode - convert database latin2 utf8 - convert charset of excel file - convertir collation -