Objective
Gather information about different i18n systems (gettext and Zend_Translate, other?) to support the discussion of whether we should change or keep the current system used in Tiki (a simple PHP array).
Questions
General questions:
- What are the problems we have with the current system? And the benefits?
- Is there any feature we would like to have that is missing because it is impossible to implement with the current system?
- What are the features in Tiki that depend on the current i18n system?
- What other softwares (like Wordpress, Drupal, Mediawiki and Joomla) are using? Do they have anything similar to Interactive Translation? If so, how it works?
- What about translating strings in JS files?
About each i18n system:
- How it works?
- Is it natively supported in PHP?
- What should be done to change to this system?
- What are the pros and cons if we change to this system?
- Are they ltr <-> rtl safe and ready ?
Resources
- Preparing Translatable Strings - article from the gettext manual (a summary can be found at Wordpress i18n page)
- How to add support for context when using PHP's gettext functions
- Reasons to use a gettext implementation written in PHP instead of PHP gettext extension
- http://translate.sourceforge.net/wiki/
- http://code.google.com/p/piins-smarty/
- http://code.google.com/p/s24n/
- http://code.google.com/p/intsmarty/
- http://smarty.incutio.com/?page=SmartyGettext
Results
General
- What are the problems we have with the current system?
- It is not possible to have different translations for the same string in different contexts.
- Strings declared in JS files are not handled by the same system that handles strings declared in PHP files.
- lang/ weighs 8.2 MB for 6.2. As language.php is a text file, size reduction with binary formats must be possible.
- it's not easy to use some machine translation software to check or help with the translation of many words or strings. There seems to be many standard resources for handling strings based on .po and .mo files.
- size & memmory footprint for the server is probably higher than if we had one smaller php file with the strings needed for any tiki-*.php file. This other was the approach of phpnuke, and claroline dokeos, afaik. (Xavi)
- What are the benefits we have with the current system?
- It is simple and does not require any external tool.
- Easy to find/replace in a single file one misstranslated word for all the strings used in Tiki.
- Relatively easy to diff changes between language.php from a branch and language.php from trunk, and merge them with a preference of the old-version strings over the new version (regarding the conflicts). Then, update get_strings.php on trunk (to have the deleted non-translated strings re-written) so that language.php in trunk is ready again. (kdiff3 was great to do this).
- What are the features in Tiki that depend on the current i18n system?
- Exporting translations from the database to a language.php file
- get_strings.php
What are others using?
Drupal
Drupal has comparable interactive translation, just nothing (apparently) allowing to click on a string.
http://drupal.org/project/l10n_client
Drupal language packages are PO files. Drupal imports these in its database. For actual translation cache (presumably file-based) is used.
Joomla
Joomla has a custom system for translations. The translations for one language are stored in one directory with a main .xml file and a bunch of .ini files. In the code they use a "language label" as a token to be replaced by a string. In one of the .ini files you have the "language label" and its correspondent string in a specific language. For more information about this system check http://docs.joomla.org/Tutorial:Making_a_Language_Pack_for_Version_1.6 and http://docs.joomla.org/Language_Guidelines_for_3rd_Party_Extensions.
Mediawiki
Mediawiki use a PHP array to store translations. Translators use http://translatewiki.net, an online interface to translate strings. For more information see http://www.mediawiki.org/wiki/Localisation.
Wordpress
Wordpress uses its own gettext implementation (and not PHP gettext extension). Their implementation was initially based on php-gettext.
They decided to use a solution written in PHP instead of the PHP extension mainly because by the time of the decision (maybe a few years ago) too few servers had it installed.
The use of it is documented at http://codex.wordpress.org/I18n_for_WordPress_Developers. It supports plural and context (the latter is not supported by the PHP extension).
In their system, no string is declared in a JavaScript file. It is declared in a PHP file and accessed in the JS via a hash. To know more about how it works check http://codex.wordpress.org/I18n_for_WordPress_Developers#Handling_JavaScript_files.
When you download Worpdress you can download the version for a specific language or you can download the default English version and after you can download a language package.
phpBB
Typo3
Moodle
The default Moodle package is only in English. For other languages you have to download an specific language package from http://download.moodle.org/langpack/2.0/. Each package has several PHP files with the translations. They use a PHP array to store the translations.
There is a list of translations here. Having a place to list who is helping with the translations is a good idea.
Conclusions
During TikiFestBoston7 we invested some time in this issue. Although PHP has a built-in gettext extension it is not a good idea to use it due to the reasons listed in this link. So we tried Zend_Translate with gettext adapter and Zend_Cache. In terms of performance, it was just slightly worse than our current system.
Switching to gettext would bring a few benefits (like support for plural and context), but it is a big effort to migrate from one system to another. There is no gettext system for Smarty we can use out of the box without any modifications and there are a few Tiki features that rely on the language.php files.
So for now the decision is to keep our current system for translations and improve it, which has been done in Tiki8.