测试过程比较啰嗦,可以直接看文章底部的结论
测试环境
python 2.7.5
读写的文本
# -*- coding: utf-8 -*- poetry = """ 相思 唐代:王维 红豆生南国,春来发几枝。 愿君多采撷,此物最相思。 """
1、直接读写中文(正常)
# -*- coding: utf-8 -*- # 写入(正常) f = open("相思.txt", "w") f.write(poetry) f.close() # 读取(正常) f = open("相思.txt", "r") print(f.read()) f.close()
2、引入future 后读写中文(报错)
# -*- coding: utf-8 -*- from __future__ import unicode_literals, print_function # 写入(报错) f = open("相思.txt", "w") f.write(poetry) f.close() """ Traceback (most recent call last): File "code_demo.py", line 18, in <module> f.write(poetry) UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128) """ # 读取(正常) f = open("相思.txt", "r") print(f.read()) f.close()
3、引入future 后通过io读写中文(正常)
# -*- coding: utf-8 -*- from __future__ import unicode_literals, print_function import io # 写入(正常) f = io.open("相思.txt", "w") f.write(poetry) f.close() # 读取(正常) f = io.open("相思.txt", "r") print(f.read()) f.close()
4、使用codecs直接读写中文(正常)
# -*- coding: utf-8 -*- import codecs # 写入(正常) f = codecs.open("相思.txt", "w") f.write(poetry) f.close() # 读取(正常) f = codecs.open("相思.txt", "r") print(f.read()) f.close()
5、引入future 后使用codecs读写中文(报错)
# -*- coding: utf-8 -*- from __future__ import unicode_literals, print_function import codecs # 写入(报错) f = codecs.open("相思.txt", "w") f.write(poetry) f.close() """ Traceback (most recent call last): File "code_demo.py", line 19, in <module> f.write(poetry) UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-2: ordinal not in range(128) """ # 读取(正常) f = codecs.open("相思.txt", "r") print(f.read()) f.close()
6、引入future 后使用codecs指定编码后读写中文(正常)
# -*- coding: utf-8 -*- from __future__ import unicode_literals, print_function import codecs # 写入(正常) 需要制定编码 "utf-8" f = codecs.open("相思.txt", "w", "utf-8") f.write(poetry) f.close() # 读取(正常) f = codecs.open("相思.txt", "r") print(f.read()) f.close()
总结
环境 |
编码 |
读写方式 |
读情况 |
写情况 |
默认 |
ASCII |
open |
- |
- |
引入future |
unicode |
open |
- |
报错 |
引入future |
unicode |
io.open |
- |
- |
默认 |
ASCII |
codecs.open |
- |
- |
引入future |
unicode |
codecs.open |
- |
报错 |
引入future |
unicode |
codecs.open指定编码utf-8 |
- |
- |
所以,默认编码是ASCII码,正常情况下直接读写是没有问题的,如果引入新特性future.unicode_literals之后,文件的中文编码变成了unicode,原来的不支持unicode读写的方式就会报错
在Python2.7 中,内建的 open函数是没有encoding参数的,不能指定编码,只能通过io.open 来读写unicode编码的中文,或者通过codecs.open 指定编码方式
在Python3之后,内建的 open函数添加了encoding参数,可以直接指定编码,当然Python3的默认编码已经改为了unicode,没有那么多问题