Content translation has been around ever since publication existed in the digital era. It has always been localized manually because computer algorithms have not matched the cognitive abilities like humans – at least not yet. Not that efforts are not made, in fact the machine translation (lovingly known as “MT”) idea has been around from the time computers took serious role in our lives in the late eighties/early nineties.
I don’t think machine translation will be able replace translators’ – that’s not happening anytime soon, but MT has made a very real place in our lives. I don’t know a single person who hasn’t used translate.google.com to look up some foreign language text for casual translations. Also, almost everyone has experienced the silliness of the output MT engines produced in their native language.
This shouldn’t be a surprise, given the complexities and nuances languages have – Not to talk about the number of languages our planet speaks and scripts we use. To add to that, local dialects and millions of words that only a specific region is familiar with.
It’s an artificial intelligence (AI) problem and there are scores of studies underway – all by scholars and using a variety of algorithms. The problem is very easy to understand – the answer equally elusive! Google is now researching neural machine translations where the deep learning algorithms are employed by feeding millions of interconnected phrases/terms for language pairs (Ref: Link). This is a shift from the widely used statistical machine translation (SMT). (Link)
Efforts are likely to continue to enable machines do the work of localization, because of the time and effort it takes to localize content manually and ready for regional markets.
Decisions like which meaning of a word to pick depending on the context or whether to transliterate a phrase, are nearly impossible to program. These have taken generations to evolve and a lifetime for us to learn. Imagine the number of IF-THEN blocks we would need to write for each of the scenarios for simple conversation sets that we even use on daily basis.
Let us compare this problem with that of self-driven cars. Corporate giants like Google and Tesla have made serious investments and years of tests and trials to tune the performance of auto-driver cars. In this case, we have relatively fewer scenarios that are programmed and they still run into situations where a human driver would have handled it better using our evolutionary programming and instincts. (ref: Tesla self driven fatal accident)
But it has matured to a stage that we use it for fast delivery of shipments (also by remote controlled drones) and now also for human transportation. (Ref: https://www.google.com/selfdrivingcar/)
Coming back to the problem of translation, until we perfect the machine learning, we can shift our thinking to using technology in making the process of translation smooth, efficient and error-free. And with the goal to make it scalable, do large volumes of content conversion in acceptable time-frame, without compromising on the quality only humans are known to produce.
Take an example of an e-Commerce company in India that intends to sell products to customers in Tier-2/Tier-3 cities. Indians are perceived to be English speaking population (thanks to call-centers and Outsourcing Industry), but in realty only 10% people (or 125 Million – Ref: Link) can read and understand simple English. There is still a whopping ~1.12 Billion left untapped. For a company to make a dent in these segments, they need to communicate and sell in local languages. This large section cannot be ignored now, especially with massive penetration of smart phone users (30% or 500 Million – Ref: Link)
Our example company is a marketplace with sellers from having products upwards of 1,000,000 (1 Million), each having content of around 300-500 words that make up of a total of around 300 Million words. With a minimum of 9 languages that are commonly spoken in India, to reach the last smartphone owner, we need to produce 2.7 Billion localized words. And it is really worth it!
A typical human translator produces 1000-1500 words (usable and proof-read) a day. If we do the math, it would take us 2,000,000 man-days of effort and that would be a massive project management and resourcing challenge. In other words, if we employ 1000 translators, it would take 2000 days (~5 years) to get the localized content out. This is too long a time even if we invest in starting an operation like this.
This is the reason, most e-Commerce companies are turning to Machine translation APIs that produce sub-standard content quality (mildly put!) but have it ready instantly. Which may be a good first step, however the company needs to quickly get the content reviewed and fixed or it may have a reverse effect.
Also, most Indian languages are not supported by popular machine translation engines (e.g. Microsoft only supports Hindi and Urdu – Link).
Why not take a step back and think of providing a process solution to using human-translators and see if we can work around this problem of content localization rather than looking for a quick fix.
This would be something like providing “Uber” like cab-service for human transportation while “self-driven” cars are being perfected. In terms of number of service providers, the distributed nature and small deal size make the option strikingly similar to localizing millions of words.
Translation Management Solutions (TMS) play a pivotal role to manage the fragmented service providers (viz. linguists). TMS solutions eliminate the manual steps of engaging multiple translators/linguists collaborating to solve mass translation jobs. A typical TMS workflow looks like the flow diagram above.
The process seems simple but managing the same requires high level of management/coordination.
In the next post, we would discuss various options for deploying TMS and the strategies to effectively manage each step of this workflow.