The goal of Pontoon is to enable localization of as many website messages as possible. Sometimes, messages are hidden behind a hard to reach sequene, such as JavaScript warnings. We try to minimize the number of such messages by designing Pontoon to operate in two modes:
Vanilla mode. Pontoon can open any HTML document and guess which messages are suitable for localization. This mode is useful when the content manager does not possess access to the server-side document, or when the document is not generated by server-side code.
Aided mode. If content manager has access to the code running on the server, Pontoon can receive background information on messages by the hook, which enables a more complete localization of a document. Useful for websites, developed and maintained in-house.
Hooks are responsible for two things: identifying messages and providing their background information, e.g. original message, translation, suggestions from other users, etc. But how exactly do we implement them? I’ve been playing with this question a lot lately, so I’d like to give you an overview of my conclusions. I identified four different approaches:
<span>. Perhaps the most obvious choice is to wrap every message in a span tag (or em) and use HTML5 data-* attributes for storing background information. But the problem is that changing markup with span tags also means changing style of the website.
<span data-original="Original">Translation</span>
<nonstandardtags>. The quickest solution to the problem above is using non-standard tags. Except that it’s not! We are Mozilla, and using non-standard tags is not an option.
<l10n data-original="Original">Translation</l10n>
<!– Comment nodes –>. What about comment nodes? They do not affect the appearance, which is good, and they do not support data-* attributes, which is bad. We could simply ignore the latter and parse background information, but that sounds fugly.
<!-- l10n data-original="Original" -->Translation<!-- /l10n -->
External file. The best option I’ve found so far is the use of an external file with all the background information stored in a structued format, e.g. JSON. There’s no need for parsing and it doesn’t affect the appearance. I still use comment tags for message identification, but the idea is to get rid of them and use Xpath to match messages with JSON entities.
{"entities":
[
...
{
"original": "Become a Test Pilot!",
"translation": "Wird Test Pilot!",
"comment": "Test Pilot should not be translated.",
"suggestions": [
{
"translation": "Werden Test pilot"
},
{
"translation": "Werden Sie Testni Pilot"
}
]
},
...
]}
Of course the list does not stop here. I’d love to hear your suggestions! What do you think about hooks and how would you like to see them implemented?

7 comments
Why not just add a specific data attribute to whatever tag surrounds the text that needs l10n? Something like data-l10n=’{“entities”: [{"original": "Become a Test Pilot!", "translation": "Wird Test Pilot!" ... }]}’
Then pontoon can walk the tree and grab any data-l10n attribute it finds, regardless of the tag.
6 July 2011
Ryan Freebern
Re non-standard tags, does HTML5 allow defining custom namespaces? Reading the spec, it seems like it doesn’t, but I’m not sure. Do you know?
How tied to the HTML elements do we want to be? Should the list of localizable messages be a one to one mapping with the HTML elements, or maybe should it be a super-/subset?
If we’re OK with the former, I like Ryan’s suggestion above to overload the existing elements with l10n meta data. Ryan puts a stringified JSON into data-l10n, but I wonder if we could get away with using more than one attribute:
(p data-pontoon-orig=”Become a (a)Test Pilot(/a)!”
data-pontoon-comment=”Header”)
Wird (a)Test Pilot(/a)!
(/p)
This way all the required data stays in a single file.
Having said that, I’m not opposed to the manifest file idea. It could prove very flexible going forward, with uses that we don’t think about right now. It would also help mitigate the pain points of translating strings in JavaScript, although arguably putting UI messages in pure JS is not a good practice and instead hidden HTML elements should be used.
6 July 2011
Staś Małolepszy
RDF or a combination of lang attributes?
7 July 2011
Fabian
Ryan & Stas, I totally forgot to mention this idea, which was actually the first one I evaluated. :/ The problem with using parent element is explained below.
(parent)
textnode
(img)
textnode
(/parent)
If you have more than one textnode per parent element, how are you going to store l10n information for them?
8 July 2011
mathjazz
Fabian, could you be more specific? In the meantime I’ll try to lear more about RDF, I have zero experience with it.
8 July 2011
mathjazz
mathjazz, if you have more than one textnode and you’re storing the l10n data as JSON, surely you could just store an array instead of a single object?
11 July 2011
Ryan Freebern
Ryan: you can do that, but how do you store this information in the parent element? For instance, if you take a look at our PHP sites, they use gettext and if a snippet looks like this:
(p)(?= _(‘Be the Difference’); ?)(/p)
The output will look like this:
(p)Fai la differenza(/p)
With Pontoon hooks, we (a) provide wrapper methods which add metadata to the HTML output and (b) perform a simple search and replace in the PHP files to call this wrapper methods:
(p)(?= _w(‘Be the Difference’); ?)(/p)
Now, how do you get to the output like this?
(p original=”Be the Difference”)Fai la differenza(/p)
I’ve been playing with using comment nodes to store metadata temporary and then using JS to move them to data-* attributes, but using the external metafile is easier and leaves the original content clean.
11 July 2011
mathjazz
By submitting a comment you grant Horv.at a perpetual license to reproduce your words and name/web site in attribution. Inappropriate and irrelevant comments will be removed at an admin’s discretion. Your email is used for verification purposes only, it will never be shared.