How I Built an Internationalization Pipeline in Eight Hours

Or, why I offloaded i18n to machine translation

  • i18n
  • CI/CD
  • Accessibility
  • UX
  • AI

TL;DR

DSPy with Gemini Turbo affords reliable translation at build time. React Context provider, Next routing middleware, and a translation wrapper, affords cheap, fast, and mostly reliable i18n pipeline with simple DevEx. Automate it with GitHub Actions; voilà.

The Problem Space

If you want a web product to reach users globally, you need a way to express the same interface in multiple languages without multiplying authoring effort by every locale you support.

If you've bolted an internationalization (i18n) solution into a web app before, you're likely familiar with the pains of the solutions already on the market.

I've used crowdin, which is extremely reliable, but it's expensive, slow (you can expect a 48 hour turnaround), and frankly their API is cumbersome to work with. i18next on the other hand, is a popular open-source platform for machine translation, but I've found it to be brittle for what amounts to a simple architectural requirement: mapping keys to strings.

Beyond the overbaked nature of these platforms, there's something to be said about taking ownership over your toolchain, and plenty to learn about how machine translation works in the process.

In hustling to get this site ready, I realized I needed a simpler solution which enables fast-feedback and a reliable translation engine.

What I landed on buildtime Machine Translation with DSPy, Gemini, and Next routing, orchestrated by GitHub actions. With these constraints laid out, let's dig in.

The 8-Hour Solution

Instead of adopting a large runtime i18n framework, I built a small pipeline around one core idea: English is the source of truth, and every other locale is a generated artifact.

At a high level, the system has four parts:

  1. A canonical English locale file that defines the application's copy.
  2. A generation step that derives types and identifies untranslated keys.
  3. A batch translation script that produces locale JSON files at build time.
  4. A lightweight runtime layer—middleware for locale routing and a React context for dictionary lookup.

The crux of the design decision here is that translation does not happen at request time. By the time the app is deployed, the translated dictionaries already exist. That keeps runtime latency flat, avoids model calls in production, and makes the system much easier to reason about.

Why Build Time Won

This isn't a live chat app, so we could rid ourselves of any pesky realtime concerns. Given the Build-time translation gave me three things I cared about:

  • Predictability: deploys produce deterministic locale artifacts.
  • Performance: no per-request calls to a model or translation API.
  • Simplicity: the runtime only needs dictionary lookup, interpolation, and locale selection.

At a high level, the workflow looks something like this:

Rendering diagram...

We have static content (the blog homepage) which is itself compiled, so there was some compound GitHub action trickery required to incorporate that static content also rendering, and the markdown blogs are another story.

But the net result is a highly reliable system that essentially runs by itself. Let's look at how the client side acts as a provider and consumer of the i18n engine.

Do as the Romans Do (and use a GD foundation model)

The central piston of the implementation is a translation generator. The inputs and outputs are pure strings; English -> Target Language. Pretty simple.

To generate the keys we rigged up an AST parser with ts-morph that digs through our TypeScript for our i18n wrapper, the t() method (more on that below). For each wrapper found, we extract the wrapped string, and drop it along with the canonicalized English value into /lib/en.json. This file serves as the For the model harness DSPy library served a coup de grâce, enabling structured derivation with a tight contract for the expected inputs and outputs from the model.

Translator

* Note that garbage collection, logging, and context parsing are omitted for brevity.

import os
import json
import dspy
from pydantic import BaseModel, Field
class TranslationOutput(BaseModel):
translations: dict[str, str] = Field(
description="A dictionary mapping the english keys to translated strings"
)
class GenerateTranslations(dspy.Signature):
"""Generate professional, localized translations for a web application UI."""
source_language: str = dspy.InputField(desc="The source language of the given keys.")
keys: list[str] = dspy.InputField(desc="A list of english interface text keys to translate.")
target_language: str = dspy.InputField(desc="The target language code (e.g., 'es', 'fr', 'de').")
translations: TranslationOutput = dspy.OutputField(desc="A strictly structured JSON output of translations.")
class TranslationGenerator(dspy.Module):
def __init__(self):
super().__init__()
PredictComponent = getattr(dspy, "TypedPredictor", dspy.Predict)
self.generator = PredictComponent(GenerateTranslations)
def forward(self, keys: list[str], target_language: str) -> dict:
"""
Generates translations from English to the target language for all keys in en.json
"""
if not os.environ.get("GEMINI_API_KEY"):
print(f"Skipping LLM call for {target_language} (no API key). Using mock translations.")
return {key: f"[{target_language}] {key}" for key in keys}
result = self.generator(source_language="English", keys=keys, target_language=target_language)
# Defensively parse out the response from Gemini
try:
translations = getattr(result, "translations", None)
if hasattr(translations, "model_dump"):
return translations.model_dump().get("translations", {})
elif hasattr(translations, "dict"):
return translations.dict().get("translations", {})
elif hasattr(translations, "translations"):
return translations.translations
elif isinstance(translations, dict):
return translations.get("translations", translations)
else:
return {}
except Exception as e:
print(f"Failed to extract translations for {target_language}. Error: {e}")
return {key: f"[{target_language}] {key}" for key in keys}
TARGET_LANGUAGES = {
"es": "Spanish",
"fr": "French",
"de": "German"
}
def main():
generator = TranslationGenerator()
current_dir = os.path.dirname(os.path.abspath(__file__))
locales_dir = os.path.join(os.path.dirname(os.path.dirname(current_dir)), "packages", "i18n", "locales")
os.makedirs(locales_dir, exist_ok=True)
en_json_path = os.path.join(locales_dir, "en.json")
with open(en_json_path, 'r', encoding='utf-8') as f:
en_locale = json.load(f)
english_keys = list(en_locale.keys())
print(f"Loaded {len(english_keys)} keys from en.json")
for lang_code, lang_name in TARGET_LANGUAGES.items():
lang_json_path = os.path.join(locales_dir, f"{lang_code}.json")
existing_translations = {}
if os.path.exists(lang_json_path):
with open(lang_json_path, 'r', encoding='utf-8') as f:
existing_translations = json.load(f)
# Find missing keys
missing_keys = [k for k in english_keys if k not in existing_translations]
# Batch LLM processing
batches = batch_missing_keys(missing_keys, 50)
for i, batch in enumerate(batches):
# calls the `forward` method
new_translations = generator(keys=batch, target_language=lang_code)
# Merge translations
for k in batch:
if k in new_translations:
existing_translations[k] = new_translations[k]
else:
existing_translations[k] = k # fallback to ascii english
with open(lang_json_path, 'w', encoding='utf-8') as f:
json.dump(existing_translations, f, indent=2, ensure_ascii=False)

The actual logic is straightforward; iterate through the English keys, and for each target language, have the model generate it's best guess at a translation, storing it in the associated json file. Pretty neat.

Still, there's a catch with turning MT responsibilities over to an LLM. Despite the strict typing of our DSPy harness and Pydantic contracting, there's no assurance the model will give us what we want. To account for the probabilistic

With the translator out of the way, let's turn our attention to the client-side to better understand how we serve our polyglot prose.

Context is King

With the translator in place, we needed a semantic harness that can capture content and dynamically inject the relevant translation.

We built a custom I18nProvider that wraps our application payload. A lightweight React Context provides the locale, while the actual translation strings are fetched server-side when possible or loaded initially.

The crux of the matter is an i18n wrapper, t( key, fallback, values ). It defines a key, which defaults to the content passed, a fallback value, if no translation can be found, and the values give us the capacity to interpolate dynamic content into that string, should we so choose.

Let's look a little closer at the implementation.

Server-side translation
// /i18n/provider.ts
'use client';
import type React from 'react';
import { createContext, useContext } from 'react';
import type { ReactNode } from 'react';
import { interpolate } from './interpolate';
import type {
I18nContextType,
LocaleCode,
Translations,
} from './types';
const I18nContext = createContext<I18nContextType | null>(null);
export interface I18nProviderProps {
children: ReactNode;
defaultLocale?: LocaleCode;
dictionary?: Translations;
}
export const I18nProvider: React.FC<I18nProviderProps> = ({
children,
defaultLocale = 'en',
dictionary,
}) => {
const t = (
key: string,
fallbackOrValues?: string | Record<string, string | number>,
values?: Record<string, string | number>,
): string => {
let fallback = key;
let interpolationValues = values;
if (typeof fallbackOrValues === 'string') {
fallback = fallbackOrValues;
} else if (fallbackOrValues !== undefined) {
interpolationValues = fallbackOrValues;
}
return interpolate(
dictionary?.[key] || fallback,
interpolationValues,
);
};
const value: I18nContextType = {
locale: defaultLocale,
t,
};
return (
<I18nContext.Provider value={value}>
{children}
</I18nContext.Provider>
);
};
export const usei18n = (): I18nContextType => {
const context = useContext(I18nContext);
if (!context) {
throw new Error('usei18n must be used within an I18nProvider');
}
return context;
};
// layout.tsx
const RootLayout = async ({
children,
params,
}: { children: React.ReactNode; params: Promise<any> }) => {
const { locale } = (await params) as { locale: LocaleCode };
const dictionary = await getDictionary(locale);
return (
<I18nProvider defaultLocale={locale} dictionary={dictionary}>
<body className='flex min-h-screen flex-col bg-background text-foreground transition-colors overflow-x-hidden antialiased'>
<AppMenu />
{children}
</body>
</I18nProvider>
);
};

This context is doing the heavy lifting on the server side, by reading the current locale from the requesting url and injecting the resolved dictionary into the React context provider for consumption.

Challenges and Extensions

The solution I landed on is not without its limitations. Along the way, I discovered that the baseline MT represented a naive approach. But carpe tauri cornua; let's look at some of the limitations of our approach and how they can be resolved.

Markdown

Fom an implementation standpoint, one hiccup was establishing a means of parsing MDX content (like this blog post) to extract text content without mangling the markdown or React components. The baseline extraction looks for text content that's wrapped in a t() call, but in markdown the semantic structure becomes much looser. Text is interleaved with markup, components, and prose, which makes naïve extraction brittle. To solve that, we had to implement a custom plugin using the remark library, which exposes a handy API for dealing with arbitrary ASTs.

Isolated Copy

The hardest strings to translate are often the shortest ones. Buttons, labels, menu items, and fragments of UI chrome are semantically thin in isolation. A model shown only Open, Apply, or Charge has to guess which sense of the word you intend. Humans resolve that ambiguity from context; a batch translation pipeline has to provide it explicitly.

An industry standard solution to this problem is adding code comments with relevant context about the meaning of a given phrase that can guide the translator. These comments can then be coupled with their keys at generation time to extend the extraction prompt with a nudge to the model about the intended meaning.

(This was such a good idea, I went ahead and baked it in while writing the blog. No longer an eight-hour implementation.)

Low-Resource Languages

Though out of scope for i18n implementation as such, it's worth calling out that Machine Translation is not a solved problem, particularly for low-resource languages. Ample research, like the No Language Left Behind study from 2022, has demonstrated that less popular languages suffer from a lack of high-quality training data, which can lead to poor translation quality. Suffice it to say, mileage may vary, and if you're writing in Zulu and are targeting a Kurmanji audience, you're going to need to hire a human translator.


That's All She Wrote

All together it proved a highly instructive experience, and mostly works as one would hope. This project's convinced me that foundation models can serve as a formidable aid in broadcasting my message to a broader audience, and the prototype of a quick hacking session to build a bespoke i18n pipeline has paid dividends.

You can check out the result on this page; use the language picker in the menu bar to switch between English, German, French, and Spanish– digame, ¿qué te parece?