Thinking in Arithmandar Way: Translation

顯示具有 Translation 標籤的文章。顯示所有文章

2009年7月7日

Will conversion between Simplified and Traditional Chinese work?

I know there are still lots of people did not know what's the difference between Simplified Chinese and Traditional Chinese. Well they can try some Google search to find lots of materials to help them understand more. I am not going to talk about that, but I am going to discuss "Will conversion between Simplified Chinese and Traditional Chinese work?"

It's not simply to find a characters mapping or some incomplete phrases mapping like below and then you will do a good conversion.

Above conversion seems fine, but it's not. Note on the last two phrases, people in China usually use "设置" and "默认" instead of those on above table.

Some people get upset when they read the translated articles which did not use the local phrases. Some phrases came from other companies, some came from different culture and different habits. Simply imagine why in some case you will describe something is very "big" but sometimes you will say "huge"? Although "huge" might be bigger than "big" in common, that's not my point, but just imagine it, people in different culture will have different way to express something.

Both "預設" and "默认" refer to English "Default". Personally I think both translation is correct. Let's take a look of the explanation from Cambridge Dictionary:

what exists or happens if you do not change it intentionally by performing an action

So maybe I can translate "Default" as "原始" or maybe "最初" in Chinese, as they are all similar meaning.

Keep in mind that even those phrases are all similar and may be used to express the same condition, but people still have some favor. It's something like what you prefer to each in the morning or what you get used to live with.

Some people though the conversion between Simplified and Traditional Chinese is something like converting "colour" to "color" (enGB to enUS). It is, in most of the cases. But not all. Maybe another example, how will you translate the "male / female" into Spanish? I learned that in some cases the translation "hombre / hembra" is just fine if you are going to describe a person's gender. But in some cases (some regions), there are used to described animals' gender.

One interested example for Chinese. In some casual conversation, if you are going to ask someone's body weight, you will say "你多重？" in Taiwan, and "你几斤？" in China. The later one sounds funny for Taiwan people because normally the term "斤" refers to the Taiwan-jin, one of the weight-metric and normally used in traditional markets. Something like "The pork is 1 Taiwan-jin weight." Although in China the term "斤" actually means the "kilogram".

So maybe someone or some company like Google can build up a very big phrases database to include most of the phrases mapping between Simplified and Traditional Chinese, but I am not sure if some techniques can resolve the culture issues. People in different cities or regions still talk differently. The way to describe something may still different. When you met your neighbor in the street, you say "Hey, how's everything?" as a simple polite hello. It's fine. Many people in Taiwan may say "Hey, did you finish your meal?" or "Are you full?". In Chinese it's "你吃飽了沒？". Sounds weird to you? It's just a polite hello asking, nothing big deal. But I don't think if you converting the Traditional Chinese into Simplified Chinese will work. People in China don't say "你吃饱了没？"

So, will conversion between Simplified and Traditional Chinese work?

No, I don't think so.

2008年4月25日

Ready for Localization?

Is your software or document ready for localization?
It usually is not that simple to localize anything. Giving you "Hello world" and ask you to localized it into several languages might already have some problems to be resolved first before localizing it.

People who speak English may not be aware of that the nouns have gender in several languages. People who wrote technical document in German might not be aware of that other people might not be able to know how to translate it properly due to it's too specific in a particular area.

Imagine if you receive this word "Entropy", how should you translate it properly if there is not enough context to be reference?
"Entropy" is a physic term and widely used in physics, thermodynamic, and some other fancy area such as image processing.
If you are familiar with PhotoShop, then you probably know the term "entropy" appeared there. Translator will need to look up many different areas' dictionaries and might also need to consult some professors in order to translate it properly.

Imagine I now send you "%s Power" and then ask you to translate it into French, the translated result may not be good due to I did not give you enough information. See below examples (translated results from Google Translate):
Maximum Power ==> La puissance d'attaque
Minimum Power ==> Moins d'énergie
Efficient Power ==> Puissance efficace

So, how can you translate it properly if I only give you "%s Power"? If I only give you that, that means it's not ready to be localized. Actually it's an I18N issue that the codes are trying to manipulate strings, which should not happen if you are going to localize it into several languages.

2007年11月12日

差異版本翻譯

之前提過關於同步翻譯（不是同步口譯）的主題，在這裡繼續提同步翻譯的想法。不過我想到其實有時這也是一種差異版本翻譯的概念。

翻譯領域裏藉由電腦輔助翻譯軟體（Computer-Aided Translation，通常簡稱 CAT）以及翻譯記憶庫（Translation Memory）的技術，可以有很多方式處理新舊版本之間既有翻譯的沿用。

舉例來說，假定以下為檔案A在1.0版時的前10行：
properties.A.01.name=File
properties.A.02.name=Edit
properties.A.03.name=View
properties.A.04.name=Format
properties.A.05.name=Tool
#
properties.B.06.name=Open
properties.B.07.name=Save
properties.B.08.name=Save as
properties.B.09.name=Close
並且其對應的中文翻譯為：
properties.A.01.name=檔案
properties.A.02.name=編輯
properties.A.03.name=檢視
properties.A.04.name=格式
properties.A.05.name=工具
#
properties.B.06.name=開啟
properties.B.07.name=儲存
properties.B.08.name=儲存為
properties.B.09.name=關閉

當下一個 1.1 版新增了一行，而使得前11行變成下面這樣：
properties.A.01.name=File
properties.A.02.name=Edit
properties.A.03.name=View
properties.A.04.name=Format
properties.A.05.name=Tool
properties.A.06.name=Insert
#
properties.B.06.name=Open
properties.B.07.name=Save
properties.B.08.name=Save as
properties.B.09.name=Close
我們的目標是既有的翻譯要能夠盡量的被沿用，以程式開發者的角度來看，假使比對 1.0 與 1.1 之間的差異，事實上是在第5 及第 6 行之間被新增了一行，其他部份並沒有異動，理論上在中文的翻譯檔案裏只需要對應的插入這新的第六行的中文對應翻譯即可。於是 1.1 版的中文應如下面所示：
properties.A.01.name=檔案
properties.A.02.name=編輯
properties.A.03.name=檢視
properties.A.04.name=格式
properties.A.05.name=工具
properties.A.06.name=插入
#
properties.B.06.name=開啟
properties.B.07.name=儲存
properties.B.08.name=儲存為
properties.B.09.name=關閉

不過事實上並沒有這麼簡單。版本之間的差異往往不會只有數行的新增這麼單純，複雜時數十行的新增、刪除、以及異動都可能會有，目視差異來進行手動的版本同步將不再可行並且易有疏漏。

不過，暫且沿用上面這較簡單的例子，一些翻譯的技術可以辨識出既有的翻譯對與新的來原是否相符，若相符則可沿用既有的翻譯。

然而，為了使翻譯能夠更精確，能夠處理相同的來源文字在不同地方具有不同的翻譯，部份的技術採用了上下文比對的方法，以滿足來源文字與目標文字一對多對應的需求。

簡單來說英文的 State ，查一查字典，它在不同地方將可能代表了「州」或是「狀態」。在地址輸入的欄位中把 State 翻譯成「狀態」可就不妙了。上下文筆對的方式便可以使翻譯配對能夠有更廣的選擇。

然而，也因為上下文比對的採用，像上面再簡單不過的例子，也可能導致一些既有的翻譯無法被配對而採用。某些上下文筆對的方式，是取前、後一至數行的文字作為參考文字，因此以下的兩個例子會被視為不同的文句（灰色文字表示為參考的上文或下文）：
Ａ：
properties.A.02.name=Edit
properties.A.03.name=View
properties.A.04.name=Format
properties.A.05.name=Tool
#
Ｂ：
properties.A.02.name=Edit
properties.A.03.name=View
properties.A.04.name=Format
properties.A.05.name=Tool
properties.A.06.name=Insert

於是，僅僅因為新增了 Insert 這一行到第六行，將可能導致第四行被認為在新的版本裏上下文與舊版不同，而既有的翻譯配對將不會自動被採用（被視為不是完美配對 Perfect Match）。即使它的相符率可能僅僅是降為100%或是99%，但是這也將可能導致額外的工作量增加。

過去提過的同步翻譯，或是協同翻譯的想法，其實是在討論像這種差異版本之間的翻譯應該怎樣來處理較好。以程式開發者的角度來看，當同時有多人在開發同一套程式時，開發人員經常要做的一件工作便是程式合併（code merge）的工作，必須進行版本間的差異比對，然後將自己的版本與另一人的版本、或是與主版本進行合併。

簡單的說，如果過去我曾經複製了檔案Ａ的 1.0 版，一個星期後檔案Ａ異動至 1.1 版，我可以藉由比對 1.0 與 1.1 之間的差異，發現是在第六行新增 Insert 這一筆，則我可以將這差異加入我手上的 1.0 版的第六行位置，而達到與 1.1 版同步的效果。

目前常見的翻譯軟體對於這種差異版本的翻譯沿用技術並不全然是如上述這般，這導致其實我們僅僅需要針對「properties.A.06.name=Insert」這一行進行翻譯，然後加入中文版裏，變得更加複雜。

這問題大致是因為現有的翻譯技術僅是基於翻譯對（Translation Pairs）來進行比對，頂多再加上該翻譯對的前後文作為參考資料，但並沒有真正去記錄檔案版本間的差異。

過去至今已有很多程式版本管控的系統，諸如 RCS (Revision Control System) 、 CVS (Concurrent Versions System) 、 SVN (Sub-Version) 、或是微軟的 Source Safe 等，儘管有些只是單純的版本管控，有些是為了可應付多人同時開發的版本管控，但大體而言都解決了版本間差異的管理。翻譯的技術如也能處理這種版本間的關連性，應該可以為翻譯的效果帶來不錯的改善。

簡而言之，上面提到的例子理想上僅需翻譯「properties.A.06.name=Insert」而得到「properties.A.06.name=插入」後，藉由 code merge 的概念再次加回目標語言中。

如果有次版本開發概念的人應該更容易理解。概念上就好比在程式的主版本支幹上，另行建立一個分枝（建立另一個語言的分枝），每當主支幹有異動時，藉由比對主支幹新舊版本的差異，將差異同步至分枝裏，而達到與主支幹同步的目的。

2005年3月21日

To Leverage Translations and Apply Pseudo-Localization

My collegue ask me one localization question:
If the resource file has a great updated, and I have an old version's translation. How to leverage the translation, and also, how to apply pseudo-localization to those the newly added un-translated strings?

OK, here are the steps, seems to be a long list, at least it works.
To leverage translation from old version, you need to:
1. Create a new Catalyst TTK project file
2. Insert the original English RC file which was sent out for translation
3. Save this TTK file (OldRC.ttk)
4. Right-click on Resource.rc in Navigator window, and then select "Import Translations" function
5. Find the translated RC file which was translated from the same version as mentioned in step 2, use it as input file, wait for importing complete.
6. Save thsi OldRC.ttk
7. Create a new TTK project file
8. Insert the latest Resource.rc
9. Save this TTK file (NewRC.ttk)
10. Select from menu Tools -> Leverage Expert
11. Select OldRC.ttk as input file, wait for leveraging complete
12. Save this TTK file.
13. Use Extract function to extract the leveraged Resource.rc

To apply pseudo-localization to newly added un-translated strings:
1. Open the leveraged NewRC.ttk
2. Click from menu Tools -> Extract Terminology -> Glossary
3. Select "Tab Delimited", ,"From untranslated strings", "Include ampersands (hotkeys)", "Exclude locked/frozen strings". Then generate the glossary file.
4. Open the generated glossary file with Excel, there will be two columns of data, delete the second column, then save the file
5. In Catalyst, create a new TTK project file, insert the glossary file, then perform Pseudo translation. Save this TTK file.
6. Open the NewRC.TTK file again, use leverage function to leverage translation from the TTK file created in step 5.
7. Save NewRC.TTK, then extract Resource.rc

Thinking in Arithmandar Way

Categories

2009年7月7日

Will conversion between Simplified and Traditional Chinese work?

2008年4月25日

Ready for Localization?

2007年11月12日

差異版本翻譯

2005年3月21日

To Leverage Translations and Apply Pseudo-Localization

Visitors

Site Search

Blog Archive

Links

My Blog List

Who is Following