基本概念
查看可用字符集
mysql> show character set;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
。。。。。。
+----------+-----------------------------+---------------------+--------+
40 rows in set (0.00 sec)
#或者查看character_sets
mysql> desc information_schema.character_sets;
+----------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------------+-------------+------+-----+---------+-------+
| CHARACTER_SET_NAME | varchar(32) | NO | | | |
| DEFAULT_COLLATE_NAME | varchar(32) | NO | | | |
| DESCRIPTION | varchar(60) | NO | | | |
| MAXLEN | bigint(3) | NO | | 0 | |
+----------------------+-------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
mysql字符集包括字符集和校对规则。
字符集定义mysql存储字符串的方式;校对规则用来定义比较字符串的方式。
可以使用show collation like 。。。或information_schema.collations来查看字符集的校对规则
mysql> show collation like 'gbk%';
+----------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+----------------+---------+----+---------+----------+---------+
| gbk_chinese_ci | gbk | 28 | Yes | Yes | 1 |
| gbk_bin | gbk | 87 | | Yes | 1 |
+----------------+---------+----+---------+----------+---------+
2 rows in set (0.00 sec)
#或者查看collations
mysql> desc information_schema.collations;
+--------------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------------+-------------+------+-----+---------+-------+
| COLLATION_NAME | varchar(32) | NO | | | |
| CHARACTER_SET_NAME | varchar(32) | NO | | | |
| ID | bigint(11) | NO | | 0 | |
| IS_DEFAULT | varchar(3) | NO | | | |
| IS_COMPILED | varchar(3) | NO | | | |
| SORTLEN | bigint(3) | NO | | 0 | |
+--------------------+-------------+------+-----+---------+-------+
6 rows in set (0.00 sec)
- 校对规则命名
- _ci 大小写不敏感
- _cs 大小写敏感
- _bin 二元 (比较基于字符编码的值而与language无关)
mysql> select case when 'A' collate utf8_general_ci = 'a' collate utf8_general_ci then 1 else 0 end;
+---------------------------------------------------------------------------------------+
| case when 'A' collate utf8_general_ci = 'a' collate utf8_general_ci then 1 else 0 end |
+---------------------------------------------------------------------------------------+
| 1 |
+---------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> select case when 'A' collate utf8_bin = 'a' collate utf8_bin then 1 else 0 end;
+-------------------------------------------------------------------------+
| case when 'A' collate utf8_bin = 'a' collate utf8_bin then 1 else 0 end |
+-------------------------------------------------------------------------+
| 0 |
+-------------------------------------------------------------------------+
1 row in set (0.00 sec)
字符集的设置
可以在my.cnf中设置
[mysqld]
character-set-server=utf8
也可以在启动时设置
# mysqld --character-set-server=utf8
查看当前字符集和校对规则
mysql> show variables like 'character_set_server';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_server | utf8 |
+----------------------+-------+
1 row in set (0.02 sec)
mysql> show variables like 'collation%';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database | utf8_general_ci |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)
连接字符集和校对规则
- 对于客户端与服务器交互,mysql提供三个参数:
- character_set_client 客户端
- character_set_connection 连接
- character_set_results 返回结果
大多数情况下三个参数值是一致的,这样才能确保数据写入和正确的读出。
可以使用如下命令设置三个参数,但是这是针对每次连接的
mysql> set names utf8;
Query OK, 0 rows affected (0.00 sec)
更简单的方式,是在my.cnf中设置,这样所有的连接默认使用utf8进行
[mysql]
default-character-set=utf8
字符集的修改步骤
应用在运行一段时间后,如果想更改字符集,不能使用”alter database character set “或”alter table character set *“来进行,因为该命令只对新数据有用,旧的数据没有效果。
step 1:导出表结构
#--default-character-set表示以什么字符串连接 -d表示只导出表结构
# mysqldump -uroot -p --default-character-set=utf8 -d dbname > createtab.sql
step 2:修改createtab.sql中表定义中的字符集为新字符集
step 3:确保记录不在更新,导出数据
#--quick:用于转储大表。强制mysqldump从服务器一次一行的检索表而不是一次检索所有行,并在输出前缓存在内存中
#--extended-insert:使用多行insert语法。可以加快插入速度。
#--no-create-info:不导出每个转储表的建表语句
#--default-character-set:按照原字符集导出数据,防止乱码
# mysqldump -uroot -p --quick --no-create-info --extended-insert --default-character-set=gbk dbname > data.sql
step 4:将set names gbk改成set names utf8
step 5:使用新的字符集创建新的数据库
mysql> create database dbname default charset utf8;
step 6:创建表
# mysql -uroot -p dbname < createtab.sql
step 7:导入数据
# mysql -uroot -p dbname < data.sql
注意:目标字符集最好是源字符集的超集,否则可能导致乱码