Why does software break in other languages?

Because it tends to assume the author’s language: left-to-right text, short strings, and English sorting. Those assumptions fail in other locales, often invisibly to the original developer.

Should I format dates and numbers manually?

No. Their format varies by locale in ways easy to get wrong, so use the platform’s locale-aware formatters rather than building the formatting by hand.

← All articles

January 13, 20267 min readinternationalization, localization, text-rendering, unicode

Internationalization and Text-Rendering Pitfalls

By Mykhailo Boichuk · Co-founder & Vice-President

In short

Internationalization breaks most often on assumptions that hold only for English: that text reads left to right, that it does not expand when translated, that strings can be split or sorted naively, and that one script behaves like another. Building software that works across locales means handling text as Unicode, deferring formatting to locale-aware system facilities, and never hard-coding language-specific assumptions into layout or logic.

English is not the default of the world

Software written without internationalization in mind quietly assumes its author’s language. It assumes text reads left to right, that a sentence is short enough for the space allotted, that a name has a first and last part, and that sorting means alphabetical in the English sense. Each assumption fails somewhere, and the failures are often invisible to the original author because their own locale never triggers them.

Internationalization is the work of removing these assumptions so the same code can serve many languages and regions. It is far cheaper to design for from the start than to retrofit, because the assumptions get baked into layout, data models, and logic in ways that are tedious to unwind.

Text is Unicode, and Unicode is subtle

The foundation is treating text as Unicode rather than as bytes or ASCII. But Unicode introduces subtleties that naive code gets wrong. A single visible character may be composed of several code points, so counting or splitting by code point can cut a character in half. What looks like one letter may have multiple valid encodings, which breaks naive comparison.

Count and split text by user-perceived characters, not by code points or bytes.
Normalize strings before comparing them, since the same text can be encoded more than one way.
Never assume one byte, or one code unit, equals one character.

Layout must absorb language

Translated text rarely matches the length of the original. A short English label can become a long phrase in another language, and a layout that fits the English exactly will clip or overflow. Some scripts are written right to left, which means the entire interface direction must flip, not just the text. Designing layouts that expand, wrap, and mirror gracefully is essential.

Allow for text expansion and right-to-left layouts from the beginning. Fixed-width controls sized to English text and layouts that assume left-to-right direction are two of the most common ways an interface breaks in other locales.

Defer formatting to the system

Dates, times, numbers, currencies, and sorting all vary by locale in ways that are easy to get wrong by hand. The order of day and month, the decimal separator, the currency symbol and its placement, and the rules for alphabetical order are all locale-specific. Platforms provide locale-aware facilities for exactly these tasks, and using them is far safer than formatting by hand.

The reliable posture is to keep user-facing text in resource files rather than hard-coded in the program, to format locale-dependent values through the system’s formatters, and to test the interface in a non-English, ideally right-to-left, locale early. Software that respects how the world actually writes and reads is more work to build once and far less work to maintain across every market it reaches.

Key takeaways

Internationalization removes assumptions that hold only for the author’s own language.
Treat text as Unicode and operate on user-perceived characters, not bytes or code points.
Normalize strings before comparing them, since identical text can be encoded differently.
Design layouts to absorb text expansion and to mirror for right-to-left languages.
Defer dates, numbers, currency, and sorting to locale-aware system facilities.

Frequently asked questions

Why does software break in other languages?: Because it tends to assume the author’s language: left-to-right text, short strings, and English sorting. Those assumptions fail in other locales, often invisibly to the original developer.
Why is counting characters tricky in Unicode?: Because a single visible character can be made of several code points, so counting or splitting by code point can cut a character in half. Operate on user-perceived characters instead.
Should I format dates and numbers manually?: No. Their format varies by locale in ways easy to get wrong, so use the platform’s locale-aware formatters rather than building the formatting by hand.

References

Apple Human Interface Guidelines

About the author

Mykhailo Boichuk

Co-founder & Vice-President

Mykhailo is an engineer who builds native applications and the systems behind them. He concentrates on macOS and iOS performance, local-first data architecture, and the synchronization problems that come with offline-capable software.