Article Image
read

I recently had to find the rows in a database that contained any multibyte characters. My first thought was to leverage the difference between length() and char_length(). The query would be similiar to this:

Select pKey, dataCol
From myTable
Where length(dataCol) != char_length(dataCol)

To my surprise this query didn't work. After mulling it over in my mind, I tried a few debugging statements such as:

Select pKey,
  dataCol,
  length(dataCol),
  char_length(dataCol)
From myTable

That only reaffirmed that all return values of the char_length() were equal to the length() results.

Then it hit me, I (or rather the database) was using the latin1 character set. So I quickly cast the data into utf8 and it worked! Below is the resulting query.

Select pKey, dataCol
From myTable
Where length( cast(dataCol using utf8) ) !=
        char_length( cast(dataCol using utf8) )
Blog Logo

Jon Dowdle

Jon Dowdle currently works on the internet at a lovely place called InVision.


Published

Color

Jon Dowdle's Blog

The personal blog of Jon Dowdle.

Back to Overview