Internationalization and Text-Rendering Pitfalls
By Mykhailo Boichuk · Co-founder & Vice-President
In short
Internationalization breaks most often on assumptions that hold only for English: that text reads left to right, that it does not expand when translated, that strings can be split or sorted naively, and that one script behaves like another. Building software that works across locales means handling text as Unicode, deferring formatting to locale-aware system facilities, and never hard-coding language-specific assumptions into layout or logic.
English is not the default of the world
Software written without internationalization in mind quietly assumes its author’s language. It assumes text reads left to right, that a sentence is short enough for the space allotted, that a name has a first and last part, and that sorting means alphabetical in the English sense. Each assumption fails somewhere, and the failures are often invisible to the original author because their own locale never triggers them.
Internationalization is the work of removing these assumptions so the same code can serve many languages and regions. It is far cheaper to design for from the start than to retrofit, because the assumptions get baked into layout, data models, and logic in ways that are tedious to unwind.
Text is Unicode, and Unicode is subtle
The foundation is treating text as Unicode rather than as bytes or ASCII. But Unicode introduces subtleties that naive code gets wrong. A single visible character may be composed of several code points, so counting or splitting by code point can cut a character in half. What looks like one letter may have multiple valid encodings, which breaks naive comparison.
- Count and split text by user-perceived characters, not by code points or bytes.
- Normalize strings before comparing them, since the same text can be encoded more than one way.
- Never assume one byte, or one code unit, equals one character.
Layout must absorb language
Translated text rarely matches the length of the original. A short English label can become a long phrase in another language, and a layout that fits the English exactly will clip or overflow. Some scripts are written right to left, which means the entire interface direction must flip, not just the text. Designing layouts that expand, wrap, and mirror gracefully is essential.
Defer formatting to the system
Dates, times, numbers, currencies, and sorting all vary by locale in ways that are easy to get wrong by hand. The order of day and month, the decimal separator, the currency symbol and its placement, and the rules for alphabetical order are all locale-specific. Platforms provide locale-aware facilities for exactly these tasks, and using them is far safer than formatting by hand.
The reliable posture is to keep user-facing text in resource files rather than hard-coded in the program, to format locale-dependent values through the system’s formatters, and to test the interface in a non-English, ideally right-to-left, locale early. Software that respects how the world actually writes and reads is more work to build once and far less work to maintain across every market it reaches.
Key takeaways
- Internationalization removes assumptions that hold only for the author’s own language.
- Treat text as Unicode and operate on user-perceived characters, not bytes or code points.
- Normalize strings before comparing them, since identical text can be encoded differently.
- Design layouts to absorb text expansion and to mirror for right-to-left languages.
- Defer dates, numbers, currency, and sorting to locale-aware system facilities.
Frequently asked questions
- Why does software break in other languages?
- Because it tends to assume the author’s language: left-to-right text, short strings, and English sorting. Those assumptions fail in other locales, often invisibly to the original developer.
- Why is counting characters tricky in Unicode?
- Because a single visible character can be made of several code points, so counting or splitting by code point can cut a character in half. Operate on user-perceived characters instead.
- Should I format dates and numbers manually?
- No. Their format varies by locale in ways easy to get wrong, so use the platform’s locale-aware formatters rather than building the formatting by hand.
References
About the author
Mykhailo Boichuk
Co-founder & Vice-President
Mykhailo is an engineer who builds native applications and the systems behind them. He concentrates on macOS and iOS performance, local-first data architecture, and the synchronization problems that come with offline-capable software.