Thinking in Arithmandar Way: L10N

顯示具有 L10N 標籤的文章。顯示所有文章

2011年3月17日

本地化語彙

每個地區經常都有一些各自的特殊慣用語，在生活習慣上也都有一些小差異。城鄉之間的差異也會造成生活習慣的不同。例如都會區裡的小餐館，常見要客人點完菜先付款，但城郊的小餐館通常你吃完要走再結帳，不急！

本來嘛，一群人在一個區域裡共同生活，本來就會逐漸發展出特有的互動方式，語言文字也是這樣逐漸產生。這個鄉可能大家在廟會時很熱情，隔壁鄉可能是在普渡時擴大慶祝。習慣也是會改變的，雖然現今法規規定車輛應等禮讓行人先行通行過斑馬線，在都會區裡，你行人可以比較放心的走，到了鄉下，你可得更加小心有無車輛。

至於語彙呢？想像我今天一早出門，遇上鄰居，「嘿，早安！」到了早餐店：「老闆，包子一個，米漿一杯，帶走。」到了公司，看見警衛：「早！」中午時，到了麥當勞：「我要一號餐，外帶。」下午到了 7-Eleven 買咖啡：「中熱拿，不要糖！」傍晚口渴，到飲料店：「綠茶一杯，無糖，去冰。」這裡出現類似的語彙有「早安 / 早」、「帶走 / 外帶」、「不要糖 / 無糖」等等。這些用語都是當地慣用 / 通用的，所以不管你怎麼說，大家都很容易的可以懂。

如果你在路上遇見個路人，問你「請問，要到 101 ，打車過去快還是搭公車？」我告訴你，這路人八成是個大陸人，因為「打車」不是說去敲打車子，是叫一部計程車。在台灣也會用「打」字，例如「打電話」、「打卡」，為什麼你就不會懷疑這是在講要把電話打壞，要把卡片打壞？因為我們習慣那麼講。但我們不講「打車」！

「大口吃遍台灣」的主持人阿松，他在點小吃時，常常會錯用量詞，例如「老闆，那個湯，我要一個，裡面用。」雖然量詞本來就不好學，一個、一碗、一份、一杯、一瓶、一張、一箱、一桶........有時確實講「一個」最快，即使聽起來很奇怪。這就好像用英語你要說 one cup, one meal, one beer, one cloth....基本上大家都懂，你不太需要去執著什麼 one piece of xxxx 之類的。

語彙差異在網路遊戲上也造成一些小困擾，例如最近很熱門的星海爭霸，台灣伺服器上有很多來自中國大陸的玩家，於是就有台灣玩家指出兩邊語彙跟講法的不一樣，例如：

中國大陸用語	台灣用語
造房子	建房子
出2隊機槍兵	生2隊機槍兵
來2隊小狗	來2隊小蟲

2009年7月7日

Will conversion between Simplified and Traditional Chinese work?

I know there are still lots of people did not know what's the difference between Simplified Chinese and Traditional Chinese. Well they can try some Google search to find lots of materials to help them understand more. I am not going to talk about that, but I am going to discuss "Will conversion between Simplified Chinese and Traditional Chinese work?"

It's not simply to find a characters mapping or some incomplete phrases mapping like below and then you will do a good conversion.

Above conversion seems fine, but it's not. Note on the last two phrases, people in China usually use "设置" and "默认" instead of those on above table.

Some people get upset when they read the translated articles which did not use the local phrases. Some phrases came from other companies, some came from different culture and different habits. Simply imagine why in some case you will describe something is very "big" but sometimes you will say "huge"? Although "huge" might be bigger than "big" in common, that's not my point, but just imagine it, people in different culture will have different way to express something.

Both "預設" and "默认" refer to English "Default". Personally I think both translation is correct. Let's take a look of the explanation from Cambridge Dictionary:

what exists or happens if you do not change it intentionally by performing an action

So maybe I can translate "Default" as "原始" or maybe "最初" in Chinese, as they are all similar meaning.

Keep in mind that even those phrases are all similar and may be used to express the same condition, but people still have some favor. It's something like what you prefer to each in the morning or what you get used to live with.

Some people though the conversion between Simplified and Traditional Chinese is something like converting "colour" to "color" (enGB to enUS). It is, in most of the cases. But not all. Maybe another example, how will you translate the "male / female" into Spanish? I learned that in some cases the translation "hombre / hembra" is just fine if you are going to describe a person's gender. But in some cases (some regions), there are used to described animals' gender.

One interested example for Chinese. In some casual conversation, if you are going to ask someone's body weight, you will say "你多重？" in Taiwan, and "你几斤？" in China. The later one sounds funny for Taiwan people because normally the term "斤" refers to the Taiwan-jin, one of the weight-metric and normally used in traditional markets. Something like "The pork is 1 Taiwan-jin weight." Although in China the term "斤" actually means the "kilogram".

So maybe someone or some company like Google can build up a very big phrases database to include most of the phrases mapping between Simplified and Traditional Chinese, but I am not sure if some techniques can resolve the culture issues. People in different cities or regions still talk differently. The way to describe something may still different. When you met your neighbor in the street, you say "Hey, how's everything?" as a simple polite hello. It's fine. Many people in Taiwan may say "Hey, did you finish your meal?" or "Are you full?". In Chinese it's "你吃飽了沒？". Sounds weird to you? It's just a polite hello asking, nothing big deal. But I don't think if you converting the Traditional Chinese into Simplified Chinese will work. People in China don't say "你吃饱了没？"

So, will conversion between Simplified and Traditional Chinese work?

No, I don't think so.

2008年8月11日

Talk about "Ready for Localization?" again

Recently my colleague from UK ask me (and our team) for a favor. They encounter a problem that they thought they have localized everything, but the result is many text still show in English. I guess he is a guy who only have a limited knowledge of codes, but he has tried his best to dig into the deeper level of the materials. The localization project seems did not involve any engineer, I would said the project was under estimated.

There are several reasons which result in the fact that many text are still in English. It's pretty bad. The material seems never being examined if it's I18N ready or G11N ready. Not to mention to have a pseudo localization test.

The main product is a set of flash files. Ideally all the front end presented text are stored in the back-end database. The ActionScripts codes will then pull out the text from database and generate several XML in run-time, and then the flash will refer to those XML to present the text.

I saw one of the common problem which most of the UI designer may also make the same mistake. Normally when a UI designer create a new object like a button or a lable (a text field), s/he will also put a default string with it. For example, when creating a cancel button, you might put the "Cancel" in the button's text field, and then in design mode it will just look like what it will be in the run-time mode. It seems to be fine, but actually it's not. It will be hard to detect which object has not yet been externalized or linked with external text source. Because you will think everything "looks fine" in your testing.

A UI designer may not have enough knowledge in I18N or G11N area. In some cases, it's fine. But they'd better learn more from these areas.

A simple way to improve the process and prevent from the problem is to use code-like text. For example, use "_CANCEL_" to be the default text for the button.

Since the product will be localized into many languages, never thinking about to leave the English text to be the default text. Treat English as another language. Load all the English text from external text source just like what you plan to do for all the other languages.

In this way, it will be easier to localize this product and less problem will be.

2008年4月25日

Ready for Localization?

Is your software or document ready for localization?
It usually is not that simple to localize anything. Giving you "Hello world" and ask you to localized it into several languages might already have some problems to be resolved first before localizing it.

People who speak English may not be aware of that the nouns have gender in several languages. People who wrote technical document in German might not be aware of that other people might not be able to know how to translate it properly due to it's too specific in a particular area.

Imagine if you receive this word "Entropy", how should you translate it properly if there is not enough context to be reference?
"Entropy" is a physic term and widely used in physics, thermodynamic, and some other fancy area such as image processing.
If you are familiar with PhotoShop, then you probably know the term "entropy" appeared there. Translator will need to look up many different areas' dictionaries and might also need to consult some professors in order to translate it properly.

Imagine I now send you "%s Power" and then ask you to translate it into French, the translated result may not be good due to I did not give you enough information. See below examples (translated results from Google Translate):
Maximum Power ==> La puissance d'attaque
Minimum Power ==> Moins d'énergie
Efficient Power ==> Puissance efficace

So, how can you translate it properly if I only give you "%s Power"? If I only give you that, that means it's not ready to be localized. Actually it's an I18N issue that the codes are trying to manipulate strings, which should not happen if you are going to localize it into several languages.

2008年4月2日

Doing L10N Testing, Asking If It's I18N Ready?

Still, some vendors are not aware of the localization industry, or maybe some vendors' employees are not actually familiar with that, but doing the tasks.

I saw some project team members including the PM and the testers are not aware what they are testing. They thought they are doing software L10N testing, but are they? And are they sure the software is I18N ready?

If you are doing L10N testing, please ask yourself first, do you think the software is I18N ready? Are you really very sure what you are asked to perform the testing is part of L10N testing?

Are you spending too much time on checking the functionalities are working fine or not in localized version? And do you discover many functional defects which are reproducible in more than one localized version or even the English version?

2007年11月12日

差異版本翻譯

之前提過關於同步翻譯（不是同步口譯）的主題，在這裡繼續提同步翻譯的想法。不過我想到其實有時這也是一種差異版本翻譯的概念。

翻譯領域裏藉由電腦輔助翻譯軟體（Computer-Aided Translation，通常簡稱 CAT）以及翻譯記憶庫（Translation Memory）的技術，可以有很多方式處理新舊版本之間既有翻譯的沿用。

舉例來說，假定以下為檔案A在1.0版時的前10行：
properties.A.01.name=File
properties.A.02.name=Edit
properties.A.03.name=View
properties.A.04.name=Format
properties.A.05.name=Tool
#
properties.B.06.name=Open
properties.B.07.name=Save
properties.B.08.name=Save as
properties.B.09.name=Close
並且其對應的中文翻譯為：
properties.A.01.name=檔案
properties.A.02.name=編輯
properties.A.03.name=檢視
properties.A.04.name=格式
properties.A.05.name=工具
#
properties.B.06.name=開啟
properties.B.07.name=儲存
properties.B.08.name=儲存為
properties.B.09.name=關閉

當下一個 1.1 版新增了一行，而使得前11行變成下面這樣：
properties.A.01.name=File
properties.A.02.name=Edit
properties.A.03.name=View
properties.A.04.name=Format
properties.A.05.name=Tool
properties.A.06.name=Insert
#
properties.B.06.name=Open
properties.B.07.name=Save
properties.B.08.name=Save as
properties.B.09.name=Close
我們的目標是既有的翻譯要能夠盡量的被沿用，以程式開發者的角度來看，假使比對 1.0 與 1.1 之間的差異，事實上是在第5 及第 6 行之間被新增了一行，其他部份並沒有異動，理論上在中文的翻譯檔案裏只需要對應的插入這新的第六行的中文對應翻譯即可。於是 1.1 版的中文應如下面所示：
properties.A.01.name=檔案
properties.A.02.name=編輯
properties.A.03.name=檢視
properties.A.04.name=格式
properties.A.05.name=工具
properties.A.06.name=插入
#
properties.B.06.name=開啟
properties.B.07.name=儲存
properties.B.08.name=儲存為
properties.B.09.name=關閉

不過事實上並沒有這麼簡單。版本之間的差異往往不會只有數行的新增這麼單純，複雜時數十行的新增、刪除、以及異動都可能會有，目視差異來進行手動的版本同步將不再可行並且易有疏漏。

不過，暫且沿用上面這較簡單的例子，一些翻譯的技術可以辨識出既有的翻譯對與新的來原是否相符，若相符則可沿用既有的翻譯。

然而，為了使翻譯能夠更精確，能夠處理相同的來源文字在不同地方具有不同的翻譯，部份的技術採用了上下文比對的方法，以滿足來源文字與目標文字一對多對應的需求。

簡單來說英文的 State ，查一查字典，它在不同地方將可能代表了「州」或是「狀態」。在地址輸入的欄位中把 State 翻譯成「狀態」可就不妙了。上下文筆對的方式便可以使翻譯配對能夠有更廣的選擇。

然而，也因為上下文比對的採用，像上面再簡單不過的例子，也可能導致一些既有的翻譯無法被配對而採用。某些上下文筆對的方式，是取前、後一至數行的文字作為參考文字，因此以下的兩個例子會被視為不同的文句（灰色文字表示為參考的上文或下文）：
Ａ：
properties.A.02.name=Edit
properties.A.03.name=View
properties.A.04.name=Format
properties.A.05.name=Tool
#
Ｂ：
properties.A.02.name=Edit
properties.A.03.name=View
properties.A.04.name=Format
properties.A.05.name=Tool
properties.A.06.name=Insert

於是，僅僅因為新增了 Insert 這一行到第六行，將可能導致第四行被認為在新的版本裏上下文與舊版不同，而既有的翻譯配對將不會自動被採用（被視為不是完美配對 Perfect Match）。即使它的相符率可能僅僅是降為100%或是99%，但是這也將可能導致額外的工作量增加。

過去提過的同步翻譯，或是協同翻譯的想法，其實是在討論像這種差異版本之間的翻譯應該怎樣來處理較好。以程式開發者的角度來看，當同時有多人在開發同一套程式時，開發人員經常要做的一件工作便是程式合併（code merge）的工作，必須進行版本間的差異比對，然後將自己的版本與另一人的版本、或是與主版本進行合併。

簡單的說，如果過去我曾經複製了檔案Ａ的 1.0 版，一個星期後檔案Ａ異動至 1.1 版，我可以藉由比對 1.0 與 1.1 之間的差異，發現是在第六行新增 Insert 這一筆，則我可以將這差異加入我手上的 1.0 版的第六行位置，而達到與 1.1 版同步的效果。

目前常見的翻譯軟體對於這種差異版本的翻譯沿用技術並不全然是如上述這般，這導致其實我們僅僅需要針對「properties.A.06.name=Insert」這一行進行翻譯，然後加入中文版裏，變得更加複雜。

這問題大致是因為現有的翻譯技術僅是基於翻譯對（Translation Pairs）來進行比對，頂多再加上該翻譯對的前後文作為參考資料，但並沒有真正去記錄檔案版本間的差異。

過去至今已有很多程式版本管控的系統，諸如 RCS (Revision Control System) 、 CVS (Concurrent Versions System) 、 SVN (Sub-Version) 、或是微軟的 Source Safe 等，儘管有些只是單純的版本管控，有些是為了可應付多人同時開發的版本管控，但大體而言都解決了版本間差異的管理。翻譯的技術如也能處理這種版本間的關連性，應該可以為翻譯的效果帶來不錯的改善。

簡而言之，上面提到的例子理想上僅需翻譯「properties.A.06.name=Insert」而得到「properties.A.06.name=插入」後，藉由 code merge 的概念再次加回目標語言中。

如果有次版本開發概念的人應該更容易理解。概念上就好比在程式的主版本支幹上，另行建立一個分枝（建立另一個語言的分枝），每當主支幹有異動時，藉由比對主支幹新舊版本的差異，將差異同步至分枝裏，而達到與主支幹同步的目的。

2007年11月9日

Upgrade path

9/14/2005

All softwares have their own upgrade path. A mature software will provide you very clear way that how you can do on your upgrading.

My previous colleague ask my opinion about the software developing project which I used to be part of the co-worker. Well, my opinion is I agree to provide a moe flexible way and more professional way that user can easier upgrade their software and configuration from old version to latest version. This is including upgrading to a newly-implemented localized version.

Imagine a German who can only live on the English software and try his best to read those unfriendly English strings. Now, there is a new German version. He will think about to upgrade to German version which is the most friendly stuff to him.

That's why I agree the way to upgrade to localized version should be provided. I wonder why some policymakers will make the decision that it's not provided!

2005年3月21日

To Leverage Translations and Apply Pseudo-Localization

My collegue ask me one localization question:
If the resource file has a great updated, and I have an old version's translation. How to leverage the translation, and also, how to apply pseudo-localization to those the newly added un-translated strings?

OK, here are the steps, seems to be a long list, at least it works.
To leverage translation from old version, you need to:
1. Create a new Catalyst TTK project file
2. Insert the original English RC file which was sent out for translation
3. Save this TTK file (OldRC.ttk)
4. Right-click on Resource.rc in Navigator window, and then select "Import Translations" function
5. Find the translated RC file which was translated from the same version as mentioned in step 2, use it as input file, wait for importing complete.
6. Save thsi OldRC.ttk
7. Create a new TTK project file
8. Insert the latest Resource.rc
9. Save this TTK file (NewRC.ttk)
10. Select from menu Tools -> Leverage Expert
11. Select OldRC.ttk as input file, wait for leveraging complete
12. Save this TTK file.
13. Use Extract function to extract the leveraged Resource.rc

To apply pseudo-localization to newly added un-translated strings:
1. Open the leveraged NewRC.ttk
2. Click from menu Tools -> Extract Terminology -> Glossary
3. Select "Tab Delimited", ,"From untranslated strings", "Include ampersands (hotkeys)", "Exclude locked/frozen strings". Then generate the glossary file.
4. Open the generated glossary file with Excel, there will be two columns of data, delete the second column, then save the file
5. In Catalyst, create a new TTK project file, insert the glossary file, then perform Pseudo translation. Save this TTK file.
6. Open the NewRC.TTK file again, use leverage function to leverage translation from the TTK file created in step 5.
7. Save NewRC.TTK, then extract Resource.rc

訂閱：文章 (Atom)

Thinking in Arithmandar Way

Categories