首页 > 技术 > Unicode真的是一个非常有趣的东西……

Unicode真的是一个非常有趣的东西……

原来Unicode当中还有这种字符:U+0489


Ò‰

看不到的话,估计是你的浏览器或者系统对于Unicode的支持还不够好,所以这个Unicode字符看不到。

OK,然后我们看看更有趣的:

先看看这篇文章:
http://www.tipotheday.com/2007/08/26/wtf-is-this-character/

然后打开这个链接看看你的网页和标题栏:
http://www.google.com/search?hl=en&q=%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%E2%80%AA%E2%80%AB%E2%80%AC%E2%80%AD%E2%80%AE%D2%89language+&btnG=Search

解释在这里:
http://en.wikipedia.org/wiki/Unicode_control_characters#Bidirectional_text_control
照顾不能上wikipedia的朋友们,贴过来:

Bidirectional text control


Unicode supports standard bidirectional text without any special characters. In other words Unicode conforming software should display right-to-left characters such as Hebrew letters as right-to-left simply from the properties of those characters. Similarly, the Unicode handles the mixture of left-to-right-text alongside right-to-left text without any special characters. For example, one can quote Arabic (“بسملة”) right alongside English and the Arabic letters will flow from right-to-left and the Latin letters left-to-right.. However, support for bidirectional text becomes more complicated when text flowing in opposite directions is embedded hierarchically. So that for example if one quotes an Arabic phrase that in turn quotes an English phrase. Other situations may complicate this when for example, an author wants the left-to-right characters overridden so that they to flow from right-to-left. While these situations are fairly rare, Unicode provides seven characters (U+200E, U+200F, U+202A, U+202B, U+202C, U+202D, U+202E) to help control these embedded bidirectional text levels up to 61 levels deep.

实际上在这两个例子当中,一圈逗号的字符只是一个幌子,真正起作用的是U+202B – U+202E一系列的转义字符,只是它们都是不可显示的,所以需要用一个幌子来让你可以用来拷贝。

更有趣的是,如果你在上面两个网页当中任何一个打开源代码看看,保证你会疯掉:源代码也已经反过来了……

但是,实际上,数据仍然是按照正常的顺序存在的,问题出在文本的渲染上面。

那么,如果所有的编辑器都能够follow Unicode的标准,那么我们怎么才能看到真正顺序的文本?这似乎成了一个悖论。

再感叹一下:I18N真的是个复杂问题……

标签: ,
  1. 本文目前尚无任何评论.
  1. 本文目前尚无任何 trackbacks 和 pingbacks.