I recently had to find the rows in a database that contained any multibyte characters. My first thought was to leverage the difference between length() and char_length(). The query would be similiar to this:
Select pKey, dataCol From myTable Where length(dataCol) != char_length(dataCol)
To my surprise this query didn't work. After mulling it over in my mind, I tried a few debugging statements such as:
Select pKey, dataCol, length(dataCol), char_length(dataCol) From myTable
That only reaffirmed that all return values of the char_length() were equal to the length() results.
Then it hit me, I (or rather the database) was using the latin1 character set. So I quickly cast the data into utf8 and it worked! Below is the resulting query.
Select pKey, dataCol From myTable Where length( cast(dataCol using utf8) ) != char_length( cast(dataCol using utf8) )