Opened 7 years ago

Last modified 6 years ago

#4449 new defect

"Spanish" and other language names are not translated in My Settings language section

Reported by: manuq Owned by: manuq
Priority: Normal Milestone: 0.102.0
Component: Sugar Version: Unspecified
Severity: Unspecified Keywords:
Cc: erikos, cjl Distribution/OS: Unspecified
Bug Status: Unconfirmed

Description

gettext fails to find 'Spanish' translation because it doesn't exist. Having a look at the es.po file in upstream iso-codes, the one that exists is "Spanish; Castilian". In the es.po, for example, it is translated to "Español; Castellano"

http://anonscm.debian.org/gitweb/?p=iso-codes/iso-codes.git;a=blob;f=iso_639/es.po;h=ca63ee0f55be60ff35065037a8ab421d10429939;hb=HEAD#l1689

Here is where "Spanish" comes from: to construct the set of available locales (language, country, code), Sugar parses the output of locale -av command. For language Spanish it returns 'Spanish' string parsing locales like this:

locale: es_AR           archive: /usr/lib/locale/locale-archive
-------------------------------------------------------------------------------
    title | Spanish locale for Argentina
   source | RAP
  address | Sankt J<F8>rgens Alle 8, DK-1615 K<F8>benhavn V, Danmark
    email | bug-glibc-locales@gnu.org
 language | Spanish
territory | Argentina
 revision | 1.0
     date | 2000-06-29
  codeset | ISO-8859-1

Attachments (10)

ugly-4449.patch (661 bytes) - added by manuq 7 years ago.
Ugly workaround
langs_table.py (5.4 KB) - added by manuq 7 years ago.
Script that outputs a table for comparison
langs_table.txt (68.7 KB) - added by manuq 7 years ago.
Output of the script in my system
profile_gettext.py (2.4 KB) - added by manuq 7 years ago.
profile_babel.py (1.3 KB) - added by manuq 7 years ago.
profile_icu.py (848 bytes) - added by manuq 7 years ago.
do_profile.sh (204 bytes) - added by manuq 7 years ago.
profile_gettext-output.txt (10.5 KB) - added by manuq 7 years ago.
profile_babel-output.txt (32.4 KB) - added by manuq 7 years ago.
profile_icu-output.txt (6.2 KB) - added by manuq 7 years ago.

Download all attachments as: .zip

Change History (28)

Changed 7 years ago by manuq

Ugly workaround

comment:1 Changed 7 years ago by erikos

You are right, we do not get the right translation as we do not pass the full 'key'.

python -c "import gettext; print gettext.dgettext('iso_639', 'Spanish; Castilian')"
Español; Castellano

Same is true for other languages:

Dutch: http://anonscm.debian.org/gitweb/?p=iso-codes/iso-codes.git;a=blob;f=iso_639/nl.po;h=6bc9d95b7863b37ee56a0b9f833531b1e72bc088;hb=HEAD

python -c "import gettext; print gettext.dgettext('iso_639', 'Dutch')"
Dutch

python -c "import gettext; print gettext.dgettext('iso_639', 'Dutch; Flemish')"
Nederlands

Greek: http://anonscm.debian.org/gitweb/?p=iso-codes/iso-codes.git;a=blob;f=iso_639/el.po;h=76251d21bda768c32441f8e2b9aa669360a7c39e;hb=HEAD

[erikos@t61 ~]$ export LANGUAGE=el_EL.utf8
[erikos@t61 ~]$ python -c "import gettext; print gettext.dgettext('iso_639', 'Greek, Modern (1453-)')"
Ελληνικά
[erikos@t61 ~]$ python -c "import gettext; print gettext.dgettext('iso_639', 'Greek')"
Greek

comment:2 follow-up: Changed 7 years ago by erikos

  • Cc cjl added

Another strange thing is that 'français' is lower case, all the others are upper case.

http://anonscm.debian.org/gitweb/?p=iso-codes/iso-codes.git;a=blob;f=iso_639/fr.po;h=668064ef8a2d59403848a8cd189e0f2edc5301dc;hb=HEAD

 580 #. name for fra, fr
 581 msgid "French"
 582 msgstr "français"

All in all, we have quite a few languages and countries that are wrong or untranslated. We would have to special case quite a few. Not sure we have a chance of special casing all of them.

Having a sugar internal list adds maintenance cost.

Maybe in the meantime, as the list is not fully updated, we could just display the native one and the English one or the localized one? e.g.

Deutsch (Deutschland) - German (Germany)
Deutsch (Deutschland) - Aleman (Alemania)

@Chris, for upstream fixes, shouldn't there be an additional entry in the es.po that just translates 'Spanish'?

comment:3 Changed 7 years ago by manuq

Using gettext domain 'iso_639_3' instead of 'iso_639' looks more promising. It has 'Spanish' and 'Dutch', but not 'Greek'.

[manuq@localhost iso-codes]$ python -c "import gettext; print gettext.dgettext('iso_639_3', 'Spanish')"
Español
[manuq@localhost iso-codes]$ python -c "import gettext; print gettext.dgettext('iso_639_3', 'Dutch')"
Neerlandés (Holandés)
[manuq@localhost iso-codes]$ python -c "import gettext; print gettext.dgettext('iso_639_3', 'Greek')"
Greek

comment:4 Changed 7 years ago by manuq

We can also think other posibilities. I've found python package named 'babel' (http://babel.edgewall.org/ or yum install babel) that seems to map language code with name:

from babel import Locale
>>> locale = Locale('es', 'UY')
>>> print locale.display_name
español (Uruguay)
>>> locale = Locale('bs', 'BA')
>>> print locale.display_name
bosanski (Bosna i Hercegovina)

Unfortunatly, not the perfect solution:

>>> locale = Locale('el', 'EL')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/babel/core.py", line 137, in __init__
    raise UnknownLocaleError(identifier)
babel.core.UnknownLocaleError: unknown locale 'el_EL'

comment:5 in reply to: ↑ 2 Changed 7 years ago by manuq

Replying to erikos:

Another strange thing is that 'français' is lower case, all the others are upper case.

Isn't that specificity of the Language? I have tried google-translate and the result is the same:

http://translate.google.com.ar/#en/fr/French

comment:6 Changed 7 years ago by manuq

For an external option, PyICU (yum install pyicu) looks better:

>>> from PyICU import Locale
>>> locale = Locale('es_UY')
>>> print locale.getDisplayName(locale)
español (Uruguay)
>>> locale = Locale('el_EL')
>>> print locale.getDisplayName(locale)
Ελληνικά (EL)

Changed 7 years ago by manuq

Script that outputs a table for comparison

Changed 7 years ago by manuq

Output of the script in my system

comment:7 Changed 7 years ago by manuq

Attached is a script that outputs a table for comparison.

The columns are:

  • Code: the language / country code
  • Original: the language / country names parsed from 'locale -av' command
  • ISO 639: using gettext, the language is get from ISO 639. this is what Sugar does currently
  • ISO 639-3: also using gettext, but the language is get from ISO 639-3
  • Babel: names get from the code using the external library Babel
  • ICU: names get from the code using the external library PyICU

I also attach the output of the script sent to a file in my system doing 'python langs_table.py > langs_table.txt'

The script also outputs to standard error different errors of each method.

The table columns are unaligned at some rows, sorry. I suspect because I'm using String.ljust() in RTL strings.

comment:9 Changed 7 years ago by cjl

  • Original: the language / country names parsed from 'locale -av' command

If I understand correctly, to address this, it looks like we would need to touch many glibc locales, not an easy task, but things have gotten better there since Ulrich Drepper moved on. I've been making some in-roads with the glibc community and have even been granted commit priv. for my work on locales.

  • ISO 639: using gettext, the language is get from ISO 639. this is what Sugar does currently
  • ISO 639-3: also using gettext, but the language is get from ISO 639-3

These PO files are hosted by the Translation Project, (also started by Ulrich Drepper).

http://translationproject.org/domain/iso_639.html
http://translationproject.org/domain/iso_639_3.html

It is a somewhat closed-off community, my interactions with them have been more limited. They host a lot of packages used by Sugar (see the lines starting with TP here as an example: http://translate.sugarlabs.org/es/upstream_l10n/)

In general, the ISO PO files are in pretty good shape for major languages, but there are glaring omissions in some and unfortunately the TP does not have as wide an array of language projects as would be ideal. Many of these packages are hosted in duplicate in places like LaunchPad, possibly because the Translate Project refuses to employ a modern translation hosting infrastructure / workflow and still operates by "complete the PO and mail it in" process.

Please let me know if there is something in particular you would like me to investigate.

comment:10 follow-up: Changed 7 years ago by erikos

Looking at the table the best result for this conversion is with PyICU, agreed?

What are the pro and cons of using one or the other? PyICU would add a new dependency (pyicu on Fedora and Ubuntu). API usage looks straight forward as well, how is the performance? Any other downside?

comment:11 in reply to: ↑ 10 Changed 7 years ago by manuq

Replying to erikos:

Looking at the table the best result for this conversion is with PyICU, agreed?

Yes, so Chris I would like to know what's your impression about the ICU project? http://site.icu-project.org/ . It looks well tested and used by several software projects and companies. The obvius con is that adds a new dependency. The pro is that it could improve performance, as we don't need to parse 'locale -av' looking for the original language/country names. Having the codes (output of 'locale -a') is enough. I will provide a performance table soon.

For the gettext option, It would require upstream work on glibc and "the translation project" so that 'locale -av' returns the right names that match the po files for ISO 639 (or 639-3). The way I see to go this path is check in what side the problem is, and file a bug.

Changed 7 years ago by manuq

Changed 7 years ago by manuq

Changed 7 years ago by manuq

Changed 7 years ago by manuq

Changed 7 years ago by manuq

Changed 7 years ago by manuq

Changed 7 years ago by manuq

comment:12 Changed 7 years ago by manuq

I have separated the different options (gettext, babel, icu) in separate scripts and profiled them:

gettext (current)BabelICU
3.088 seconds28.589 seconds0.170 seconds

The profiling scripts and the output I got for XO-4 are attached.

comment:13 Changed 7 years ago by manuq

  • Summary changed from "Spanish" not translated in My Settings language section to "Spanish" and other language names are not translated in My Settings language section

comment:14 Changed 6 years ago by dnarvaez

What's the status here? We never decided the best approach?

comment:15 Changed 6 years ago by cjl

I've been trying slowly to improve the L10n of the iso files at the Translation Project (FWIW)

comment:16 Changed 6 years ago by manuq

I have spent a lot on this and couldn't find a good solution. A followup on the Translation Project is the best we can get. Using ICU is a radical change and takes more resources and adds dependencies.

comment:17 Changed 6 years ago by dnarvaez

  • Milestone changed from 1.0 to Unspecified

comment:18 Changed 6 years ago by mystery828

  • Milestone changed from Unspecified to 0.102.0
  • Priority changed from Unspecified by Maintainer to Normal
Note: See TracTickets for help on using tickets.