Tuesday, July 26, 2011

[Update] 关于 Mac OS X 的 filesystem encoding

跟上一个问题有关。在 stackoverflow 上问了问题,得到了如下答复 [1]:

The problem is that MacOS X's default filesystem changes all filenames you give it to an unusual normalization form which does not use precomposed characters. 

根据 Python unicodedata 的文档 [2]:

The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0327 (COMBINING CEDILLA) U+0043 (LATIN CAPITAL LETTER C).

其中的 U+00C7 是 'Ç'

对应的 python 源代码我也更新了。

请参考:

No comments: