
    *************************************
    * SquirrelMail internationalization *
    *************************************

$Date: 2005/07/10 12:54:18 $
$Revision: 1.8 $

This document should explain how SquirrelMail internationalization works and
provide information about some aspects of implementation.

1. Supported languages
2. $languages array
3. XTRA_CODE functions
4. Display of different charsets
5. IMAP folder names
6. Plural forms
7. Language setup
8. Time zones

-------------------------------
1. Supported languages
-------------------------------
Valid language codes are:
* ar    - Arabic, windows-1256 charset
* bg_BG - Bulgarian, windows-1251 charset
* bn_IN - Bengali, utf-8 charset
* ca_ES - Catalan, iso-8859-1 charset
* cs_CZ - Czech, iso-8859-2 charset
* cy_GB - Welsh, iso-8859-1 charset
* da_DK - Danish, iso-8859-1 charset
* de_DE - German, iso-8859-1 charset
* el_GR - Greek, iso-8859-7 charset
* en_GB - British, iso-8859-15 charset
* en_US - English, charset depends on $default_charset
* es_ES - Spanish, iso-8859-1 charset
* et_EE - Estonian, iso-8859-15 charset
* eu_ES - Basque, iso-8859-1 charset
* fa_IR - Farsi, utf-8 charset
* fi_FI - Finnish, iso-8859-1 charset
* fo_FO - Faroese, iso-8859-1 charset
* fr_FR - French, iso-8859-1 charset
* he_IL - Hebrew, windows-1255 charset
* hr_HR - Croatian, iso-8859-2 charset
* hu_HU - Hungarian, iso-8859-2 charset
* id_ID - Indonesian, iso-8859-1 charset
* is_IS - Icelandic, iso-8859-1 charset
* it_IT - Italian, iso-8859-1 charset
* ja_JP - Japanese, euc-jp charset (emails are created in iso-2022-jp)
* ko_KR - Korean, euc-kr charset
* lt_LT - Lithuanian, utf-8 charset
* ms_MY - Malay, iso-8859-1 charset
* nb_NO - Norwegian (Bokmal), iso-8859-1 charset
* nl_NL - Dutch, iso-8859-1 charset
* nn_NO - Norwegian (Nynorsk), iso-8859-1 charset
* pl_PL - Polish, iso-8859-2 charset
* pt_BR - Portuguese (Brazil), iso-8859-1 charset
* pt_PT - Portuguese (Portugal), iso-8859-1 charset
* ro_RO - Romanian, iso-8859-2 charset
* ru_UA - Ukrainian Russian, koi8-r charset
* ru_RU - Russian, utf-8 charset
* sk_SK - Slovak, iso-8859-2 charset
* sl_SI - Slovenian, iso-8859-2 charset
* sr_YU - Serbian, iso-8859-2 charset
* sv_SE - Swedish, iso-8859-1 charset
* ug    - Uighur, utf-8 charset (some systems don't support Uighur system locale)
* th_TH - Thai, tis-620 charset
* tl_PH - Tagalog, iso-8859-1 charset (main translation is missing, only some plugins are translated)
* tr_TR - Turkish, iso-8859-9 charset
* uk_UA - Ukrainian, koi8-u charset
* zh_CN - Chinese Simplified, gb2312 charset
* zh_TW - Chinese Traditional, big5 charset

Charset totals:
* iso-8859-1   = 21
* iso-8859-2   = 8
* utf-8        = 5 
* iso-8859-15  = 2
* iso-8859-7   = 1
* iso-8859-9   = 1
* koi8-r       = 1
* koi8-u       = 1
* windows-1251 = 1
* windows-1255 = 1
* windows-1256 = 1
* tis-620      = 1
* gb2312       = 1
* big5         = 1
* euc-jp       = 1
* euc-kr       = 1

-------------------
2. $languages array
-------------------
$languages array is stored in functions/i18n.php and defines translations
that are enabled in SquirrelMail.

Format of array:
    $languages['language_code']['key'] = 'value'

Possible array key names:
* NAME      - Translation name in English. Any 8bit symbols must be html encoded.
* CHARSET   - Charset used by translation
* ALIAS     - 'language_code' should contain short language name
              (iso-639). 'value' should contain name of other 'language_code'
              that defines translation with NAME and CHARSET keys.
              Entry links short language form with long form (language+country).
              See: http://www.loc.gov/standards/iso639-2/langhome.html and
              http://www.iso.org/iso/en/prods-services/iso3166ma/02iso-3166-code-lists/list-en1.html
* ALTNAME   - Native translation name. Any 8bit symbols must be html encoded.
              Name is visible when $show_alternative_names is enabled.
* LOCALE    - Full locale name (in xx_XX.charset format or other format required
              by php gettext functions). From 1.4.4/1.5.1 'value' can contain
              array. If php version is older than 4.3.0, system uses only first
              locale name listed in array. First locale name must be compatible 
              with FreeBSD system locale names.
* DIR       - Text direction. Used to define Right-to-Left languages. Possible 
              values 'rtl' or 'ltr'. If undefined - defaults to 'ltr'.
* XTRA_CODE - translation uses special functions. (see chapter 3. XTRA_CODE functions)

Each 'language_code' definition requires NAME+CHARSET or ALIAS keys. Other keys are
optional.

----------------------
3. XTRA_CODE functions
----------------------
XTRA_CODE functions provide way to change interface behavior, when translation
requires special handling of some SquirrelMail functions. Functions are enabled
by setting XTRA_CODE option in $languages array and including appropriate
functions in functions/i18n.php. First part of function name is word listed in
$languages['language_code']['XTRA_CODE'] value. Second part is one of special
keywords. Possible keywords:
* _decode
Used in src/compose.php, src/i18n.php, src/view_text.php, functions/mime.php
Requires mbstring support

* _encode
Used in src/compose.php, src/read_body.php

* _encodeheader
Used in functions/mime.php
Returning function

* _decodeheader
Used in functions/mime.php
Returning function

* _downloadfilename
Used in functions/mime.php

* _utf7_imap_encode
Used in functions/imap_utf7_local.php
Returning function

* _utf7_imap_decode
Used in functions/imap_utf7_local.php
Returning function

* _strimwidth
Used in functions/mailbox_display.php
Returning function

* _wordwrap
Used in functions/strings.php (sqWordWrap)

--------------------------------
4. Display of different charsets
--------------------------------
When SquirrelMail generates html pages, it uses charset defined in translation
selected by end user. Interface can display emails encoded in different
charsets. In order to display characters that might be unsupported by user's
charset, SquirrelMail uses decoding functions that convert non us-ascii symbols
into html entities. All decoding functions are stored in functions/decode/
directory.

By default SquirrelMail includes decoding functions that support iso-8859-x,
windows-125x, utf-8, us-ascii, koi8-r, koi8-u, tis-620, ns-4551_1, iso-ir-111,
cp855 and cp866 charsets. Other decoding functions are distributed in separate
packages. Separate packaging of decoding functions is supported from
SquirrelMail 1.4.4 and 1.5.0. us-ascii decoding replaces all 8bit symbols with
question marks. utf-8 decoding function does not enable decoding of five and six
byte utf-8 symbols by default (code is commented) and replaces all incorrectly
formated 8bit symbols with question marks.

Some decoding functions might require php recode extension or php 4.3+ mbstring
extension. If your php installation does not support them, you might be using 
slower and cpu/memory intensive functions.

--------------------
5. IMAP folder names
--------------------
IMAP folder names use UTF7-IMAP charset. Folder names that are stored in
conf.pl must be encoded in UTF7-IMAP charset. SquirrelMail uses internal
functions that convert folder names from/to utf7-imap charset. By default those
functions work with iso-8859-1 charset. Other charsets are supported only 
when php mbstring extension supports them.

TODO: write independent implementation of charset to utf7-imap conversion.

---------------
6. Plural forms
---------------
From v.1.5.1 SquirrelMail includes support of plural forms. It allows to use
correct translation forms with numbers. For example. "We have %s squirrel on
the roof." and "We have %s squirrels on the roof." can be written in one 
function call without checking actual number for squirrels. Gettext functions
also deal with non English languages that might use different word forms for
two, five, ten or more units.

Support is provided by ngettext functions that exist in php gettext extension 
from php 4.2.0 and by ngettext function replacements from php-gettext classes 
(http://savannah.nongnu.org/projects/php-gettext). In order to use it correctly
when php gettext extension does not have ngettext support, SquirrelMail uses
bindtextdomain and textdomain wrappers that load missing functions.

If plugins want to use ngettext functions without increasing php requirements
to 4.2.0 with gettext support, they should require SquirrelMail 1.5.1, use 
sq_bindtextdomain function instead of bindtextdomain and use sq_textdomain 
function instead of textdomain function. If SquirrelMail wrapper functions 
are used, there is no need to issue sq_bindtextdomain when plugins reverts to
SquirrelMail domain.

More information about ngettext and plural forms can be found at:
http://www.gnu.org/software/gettext/manual/html_chapter/gettext_10.html#SEC150

-----------------
7. Language setup
-----------------
SquirrelMail uses set_up_language() function to setup language environment.
Environment is setup automatically when include/validate.php is loaded.

SquirrelMail gets interface language from three places:
 a) user preference. It is set in Options -> Display Preferences -> Language.
    preference uses language key. If user's preferences are not available (user
    is not logged in), system tries to extract language value from 
    'squirrelmail_language' cookie.
 b) default squirrelmail language that is set in configuration 
    ($squirrelmail_default_language variable).
 c) preferred language setting provided by browser. It is used only when default 
    squirrelmail language is set to empty string

If language information is not available, SquirrelMail falls back to US English
translation.

-------------
8. Time zones
-------------
If php install allows modifying environment variable TZ, SquirrelMail allows
end users to select different time zone in their preferences. It can be set in
Options -> Personal Information -> Your current timezone. Time zone is
setup automatically when include/validate.php is loaded.

If TZ variable can't be modified (php is running is safe mode and variable 
is not listed in php safe_mode_allowed_env_vars), user's time zone options are
not visible and interface uses default webserver's time zone.

SquirrelMail 1.5.0 and older store list of available time zones in 
locale/timezones.cfg. Since 1.5.1 standard times zones are moved to 
include/timezones/standard.php and time zone handling differs from older
SquirrelMail versions. Time zone configuration is controlled in SquirrelMail
configuration utility (conf.pl), 4. General Options -> 15. Time zone
configuration menu option. Administrator can select standard, strict, custom 
and custom strict time zone handling.

Standard handling does not differ from previous SquirrelMail versions and
SquirrelMail uses GNU C geographical location based time zone names. Strict
handling uses time zone codes with offset from GMT. Strict time zones should
work on systems that don't support GNU C time zone naming. Custom and custom
strict handling uses config/timezones.php file instead of
include/timezones/standard.php.

config/timezones.php file should store $aTimeZones array with different set of
time zones. See default time zone set in include/timezones/standard.php.For
example:

<?php
// World outside US border is a mirage

$aTimeZones=array();
$aTimeZones['America/New_York']['NAME']='US Eastern standard time';
$aTimeZones['America/New_York']['TZ']='EST5EDT';

$aTimeZones['America/Chicago']['NAME']='US Central standard time';
$aTimeZones['America/Chicago']['TZ']='CST6CDT';

// Oliver County, ND
$aTimeZones['America/North_Dakota/Center']['NAME']='US, Oliver County [ND]';
$aTimeZones['America/North_Dakota/Center']['TZ']='CST6CDT'; // CST since 1992

$aTimeZones['America/Denver']['NAME']='US Mountain standard time';
$aTimeZones['America/Denver']['TZ']='MST7MDT';

$aTimeZones['America/Los_Angeles']['NAME']='US Pacific standard time';
$aTimeZones['America/Los_Angeles']['TZ']='PST8PDT';

// Aliaska
$aTimeZones['America/Juneau']['NAME']='Aliaska, Juneau';
$aTimeZones['America/Juneau']['TZ']='NAST9NADT';
$aTimeZones['America/Yakutat']['NAME']='Aliaska, Yakutat';
$aTimeZones['America/Yakutat']['TZ']='NAST9NADT';
$aTimeZones['America/Anchorage']['NAME']='Aliaska, Anchorage';
$aTimeZones['America/Anchorage']['TZ']='NAST9NADT';
$aTimeZones['America/Nome']['NAME']='Aliaska, Nome';
$aTimeZones['America/Nome']['TZ']='NAST9NADT';
$aTimeZones['America/Adak']['NAME']='US, Aleutian Islands';
$aTimeZones['America/Adak']['TZ']='AST10ADT';

$aTimeZones['Pacific/Honolulu']['NAME']='US, Hawaii';
$aTimeZones['Pacific/Honolulu']['TZ']='UCT10';
$aTimeZones['America/Phoenix']['NAME']='US, Arizona';
$aTimeZones['America/Phoenix']['TZ']='MST7'; // gmt-7
$aTimeZones['America/Shiprock']['LINK']='America/Denver';

$aTimeZones['America/Boise']['NAME']='US, South Idaho';
$aTimeZones['America/Boise']['TZ']='MST7MDT';
$aTimeZones['America/Indianapolis']['NAME']='US, Indiana';
$aTimeZones['America/Indianapolis']['TZ']='EST5';
$aTimeZones['America/Indiana/Indianapolis']['LINK']='America/Indianapolis';
// Crawford County, Indiana
$aTimeZones['America/Indiana/Marengo']['NAME']='US, Crawford County [IN]';
$aTimeZones['America/Indiana/Marengo']['TZ']='EST5';
// Starke County, Indiana
$aTimeZones['America/Indiana/Knox']['NAME']='US, Starke County [IN]';
$aTimeZones['America/Indiana/Knox']['TZ']='EST5';
// Switzerland County, Indiana
$aTimeZones['America/Indiana/Vevay']['NAME']='US, Switzerland County [IN]';
$aTimeZones['America/Indiana/Vevay']['TZ']='EST5';
$aTimeZones['America/Louisville']['NAME']='US, Louisville [KY]';
$aTimeZones['America/Louisville']['TZ']='EST5EDT';
$aTimeZones['America/Kentucky/Louisville']['LINK']='America/Louisville';
// Wayne, Clinton, and Russell Counties, Kentucky
$aTimeZones['America/Kentucky/Monticello']['NAME']='US, Wayne, Clinton, and Russell Counties [KY]';
$aTimeZones['America/Kentucky/Monticello']['TZ']='EST5EDT';
// Michigan
$aTimeZones['America/Detroit']['NAME']='US, Michigan';
$aTimeZones['America/Detroit']['TZ']='EST5EDT';
// The Michigan border with Wisconsin switched from EST to CST/CDT in 1973.
$aTimeZones['America/Menominee']['NAME']='US, Menominee [MI]';
$aTimeZones['America/Menominee']['TZ']='CST6CDT';
?>

GNU C time zone naming should be supported by many Unix OSes. It is recommended
way of setting time zone, because it handles historical changes and daylight
savings specific to selected geographical location. Strict time zones might
provide inaccurate or outdated time zone settings.

If modifications in TZ environment are visible in your webserver's logs (time
offset is changed), make sure that you can reproduce such behavior in latest php
version and report bug to php developers. Issue can be fixed by blocking use of
time zone (php safe mode and TZ is not listed in safe_mode_allowed_env_vars
setting or forced_prefs plugin) or by attaching special php script with 
putenv('TZ=some time zone') call in php auto_append_file setting (suggestion is
not tested and you might have to fix all SquirrelMail exit calls).

Please note, that use of auto_append_file provides only temporally workaround
and does not fix your php setup. Script that runs as unprivileged user, should
be unable to affect webserver's logging system.

