Opened 11 years ago
Last modified 10 years ago
#4449 new defect
"Spanish" and other language names are not translated in My Settings language section
Reported by: | manuq | Owned by: | manuq |
---|---|---|---|
Priority: | Normal | Milestone: | 0.102.0 |
Component: | Sugar | Version: | Unspecified |
Severity: | Unspecified | Keywords: | |
Cc: | erikos, cjl | Distribution/OS: | Unspecified |
Bug Status: | Unconfirmed |
Description
gettext fails to find 'Spanish' translation because it doesn't exist. Having a look at the es.po file in upstream iso-codes, the one that exists is "Spanish; Castilian". In the es.po, for example, it is translated to "Español; Castellano"
Here is where "Spanish" comes from: to construct the set of available locales (language, country, code), Sugar parses the output of locale -av command. For language Spanish it returns 'Spanish' string parsing locales like this:
locale: es_AR archive: /usr/lib/locale/locale-archive ------------------------------------------------------------------------------- title | Spanish locale for Argentina source | RAP address | Sankt J<F8>rgens Alle 8, DK-1615 K<F8>benhavn V, Danmark email | bug-glibc-locales@gnu.org language | Spanish territory | Argentina revision | 1.0 date | 2000-06-29 codeset | ISO-8859-1
Attachments (10)
Change History (28)
Changed 11 years ago by manuq
comment:1 Changed 11 years ago by erikos
You are right, we do not get the right translation as we do not pass the full 'key'.
python -c "import gettext; print gettext.dgettext('iso_639', 'Spanish; Castilian')" Español; Castellano
Same is true for other languages:
python -c "import gettext; print gettext.dgettext('iso_639', 'Dutch')" Dutch python -c "import gettext; print gettext.dgettext('iso_639', 'Dutch; Flemish')" Nederlands
[erikos@t61 ~]$ export LANGUAGE=el_EL.utf8 [erikos@t61 ~]$ python -c "import gettext; print gettext.dgettext('iso_639', 'Greek, Modern (1453-)')" Ελληνικά [erikos@t61 ~]$ python -c "import gettext; print gettext.dgettext('iso_639', 'Greek')" Greek
comment:2 follow-up: ↓ 5 Changed 11 years ago by erikos
- Cc cjl added
Another strange thing is that 'français' is lower case, all the others are upper case.
580 #. name for fra, fr 581 msgid "French" 582 msgstr "français"
All in all, we have quite a few languages and countries that are wrong or untranslated. We would have to special case quite a few. Not sure we have a chance of special casing all of them.
Having a sugar internal list adds maintenance cost.
Maybe in the meantime, as the list is not fully updated, we could just display the native one and the English one or the localized one? e.g.
Deutsch (Deutschland) - German (Germany) Deutsch (Deutschland) - Aleman (Alemania)
@Chris, for upstream fixes, shouldn't there be an additional entry in the es.po that just translates 'Spanish'?
comment:3 Changed 11 years ago by manuq
Using gettext domain 'iso_639_3' instead of 'iso_639' looks more promising. It has 'Spanish' and 'Dutch', but not 'Greek'.
[manuq@localhost iso-codes]$ python -c "import gettext; print gettext.dgettext('iso_639_3', 'Spanish')" Español [manuq@localhost iso-codes]$ python -c "import gettext; print gettext.dgettext('iso_639_3', 'Dutch')" Neerlandés (Holandés) [manuq@localhost iso-codes]$ python -c "import gettext; print gettext.dgettext('iso_639_3', 'Greek')" Greek
comment:4 Changed 11 years ago by manuq
We can also think other posibilities. I've found python package named 'babel' (http://babel.edgewall.org/ or yum install babel) that seems to map language code with name:
from babel import Locale >>> locale = Locale('es', 'UY') >>> print locale.display_name español (Uruguay)
>>> locale = Locale('bs', 'BA') >>> print locale.display_name bosanski (Bosna i Hercegovina)
Unfortunatly, not the perfect solution:
>>> locale = Locale('el', 'EL') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/site-packages/babel/core.py", line 137, in __init__ raise UnknownLocaleError(identifier) babel.core.UnknownLocaleError: unknown locale 'el_EL'
comment:5 in reply to: ↑ 2 Changed 11 years ago by manuq
Replying to erikos:
Another strange thing is that 'français' is lower case, all the others are upper case.
Isn't that specificity of the Language? I have tried google-translate and the result is the same:
comment:6 Changed 11 years ago by manuq
For an external option, PyICU (yum install pyicu) looks better:
>>> from PyICU import Locale >>> locale = Locale('es_UY') >>> print locale.getDisplayName(locale) español (Uruguay)
>>> locale = Locale('el_EL') >>> print locale.getDisplayName(locale) Ελληνικά (EL)
comment:7 Changed 11 years ago by manuq
Attached is a script that outputs a table for comparison.
The columns are:
- Code: the language / country code
- Original: the language / country names parsed from 'locale -av' command
- ISO 639: using gettext, the language is get from ISO 639. this is what Sugar does currently
- ISO 639-3: also using gettext, but the language is get from ISO 639-3
- Babel: names get from the code using the external library Babel
- ICU: names get from the code using the external library PyICU
I also attach the output of the script sent to a file in my system doing 'python langs_table.py > langs_table.txt'
The script also outputs to standard error different errors of each method.
The table columns are unaligned at some rows, sorry. I suspect because I'm using String.ljust() in RTL strings.
comment:8 Changed 11 years ago by manuq
A spreadsheet is much better in this case:
comment:9 Changed 11 years ago by cjl
- Original: the language / country names parsed from 'locale -av' command
If I understand correctly, to address this, it looks like we would need to touch many glibc locales, not an easy task, but things have gotten better there since Ulrich Drepper moved on. I've been making some in-roads with the glibc community and have even been granted commit priv. for my work on locales.
- ISO 639: using gettext, the language is get from ISO 639. this is what Sugar does currently
- ISO 639-3: also using gettext, but the language is get from ISO 639-3
These PO files are hosted by the Translation Project, (also started by Ulrich Drepper).
http://translationproject.org/domain/iso_639.html
http://translationproject.org/domain/iso_639_3.html
It is a somewhat closed-off community, my interactions with them have been more limited. They host a lot of packages used by Sugar (see the lines starting with TP here as an example: http://translate.sugarlabs.org/es/upstream_l10n/)
In general, the ISO PO files are in pretty good shape for major languages, but there are glaring omissions in some and unfortunately the TP does not have as wide an array of language projects as would be ideal. Many of these packages are hosted in duplicate in places like LaunchPad, possibly because the Translate Project refuses to employ a modern translation hosting infrastructure / workflow and still operates by "complete the PO and mail it in" process.
Please let me know if there is something in particular you would like me to investigate.
comment:10 follow-up: ↓ 11 Changed 11 years ago by erikos
Looking at the table the best result for this conversion is with PyICU, agreed?
What are the pro and cons of using one or the other? PyICU would add a new dependency (pyicu on Fedora and Ubuntu). API usage looks straight forward as well, how is the performance? Any other downside?
comment:11 in reply to: ↑ 10 Changed 11 years ago by manuq
Replying to erikos:
Looking at the table the best result for this conversion is with PyICU, agreed?
Yes, so Chris I would like to know what's your impression about the ICU project? http://site.icu-project.org/ . It looks well tested and used by several software projects and companies. The obvius con is that adds a new dependency. The pro is that it could improve performance, as we don't need to parse 'locale -av' looking for the original language/country names. Having the codes (output of 'locale -a') is enough. I will provide a performance table soon.
For the gettext option, It would require upstream work on glibc and "the translation project" so that 'locale -av' returns the right names that match the po files for ISO 639 (or 639-3). The way I see to go this path is check in what side the problem is, and file a bug.
Changed 11 years ago by manuq
Changed 11 years ago by manuq
Changed 11 years ago by manuq
Changed 11 years ago by manuq
Changed 11 years ago by manuq
Changed 11 years ago by manuq
Changed 11 years ago by manuq
comment:12 Changed 11 years ago by manuq
I have separated the different options (gettext, babel, icu) in separate scripts and profiled them:
gettext (current) | Babel | ICU |
3.088 seconds | 28.589 seconds | 0.170 seconds |
The profiling scripts and the output I got for XO-4 are attached.
comment:13 Changed 11 years ago by manuq
- Summary changed from "Spanish" not translated in My Settings language section to "Spanish" and other language names are not translated in My Settings language section
comment:14 Changed 10 years ago by dnarvaez
What's the status here? We never decided the best approach?
comment:15 Changed 10 years ago by cjl
I've been trying slowly to improve the L10n of the iso files at the Translation Project (FWIW)
comment:16 Changed 10 years ago by manuq
I have spent a lot on this and couldn't find a good solution. A followup on the Translation Project is the best we can get. Using ICU is a radical change and takes more resources and adds dependencies.
comment:17 Changed 10 years ago by dnarvaez
- Milestone changed from 1.0 to Unspecified
comment:18 Changed 10 years ago by mystery828
- Milestone changed from Unspecified to 0.102.0
- Priority changed from Unspecified by Maintainer to Normal
Ugly workaround