Bespoke Neural Machine Translation Engines for Simplified Chinese
Our client is Internation Data Corporation (IDC).
IDC is one of the leading providers of market intelligence and analysis for the IT, telecoms and consumer technology markets, employing more than 1100 analysts around the world.
Their goal was to reduce the costs of the translation of their research reports into and out of English and Simplified Chinese without compromising the quality. They also wanted to improve on the turnaround time required to translate the reports.
To do this, they would replace their existing translation provider with Machine Translation (MT) technology.
What challenges did we take on?
- We undertook to develop two Neural Machine Translation engines. One translating English into Simplified Chinese and one translating Simplified Chinese into English.
- The engines would need to be able to translate at a level of accuracy and fluency at least as high as our client’s current translation provider.
- They would need to use our client’s preferred terminology.
- For the first 12 months, we would be responsible for post-editing the content to an agreed standard. Whether or not the raw MT output reached the required standard. After the initial 12 months, Asian Absolute would continue to provide quality management and maintenance of the Neural engines, but the client would perform the post-editing themselves in-house, reducing their cost even further.
- The engines would need to be vendor-neutral. This would mean that, after the initial 12-month period, our client could take over running the MT engines if they wished. Or they could choose to have us continue to manage and maintain the engines at a reduced cost.
- In order to facilitate our client taking over the running of the engines should they wish to, we also undertook to provide training for their in-house team. This would cover how to use the TMS (Translation Management System, the software which provides the project management interface) to edit translations and how to edit MT output efficiently and effectively.
How did we build the engines?
There are several different types of MT engine available. For this task, Neural Machine Translation engines was the most suitable.
In order to train the engines, we would need a great deal of data. This data would need to be in-domain and it would need to be clean. Initially, we received 500,000 words of bilingual data from the client. We then went about cleaning the data, checking alignments, duplications, noise and so on. This is necessary to make sure it is suitable for use as MT training data.
Our client also undertook to provide 90,000 characters of data per quarter. One of the more important tasks would be to keep training the engine by giving it feedback using the post-editing performed on the additional data.
We also needed to fine-tune the engines’ runtime rules, glossary terms, times, dates, currency formats, measurements, capitalisation rules, specify non-translatable terms and the like.
In addition, we would need to set up the Translation Management System and integrate it with the engines.
How did we begin?
For the first four months of the project and possibly longer, our expert human translators will check the quality of the MT output.
We will receive the research reports from our client via email, then use the TMS and MT engines to complete the translation prior to review by specialist editors. The report will then be returned to our client for their own review. Once approved, this data can then be used in the next round of MT engine training.
By the fifth month, this process should be refined down to take less than a day.
Each month we will conduct an overall analysis and review of the translations to identify error patterns. BLEU scores (a measurement of translation quality) comparing the initial translation against the final version reviewed by our client’s editors will also be calculated regularly.
BLEU isn’t the only score we will be using to assess quality. But it will help us identify areas for improvement as we continually refine the dataset and retrain the engines.
How did we manage costs?
One of our client’s goals in replacing their research report translation provider with MT engines was to lower their translation costs in the long-term. This is one of the major advantages of Machine Translation when used for tasks of this kind.
The vast majority of the costs of building MT solutions are initial. This is also when most of the development work takes place – usually in the first few months.
We were able to offer a below-market rate to our client – as well as absorb all the extra post-editing work and risk – partially by leveraging the work done in the development of the first machine to develop the second.
Our client’s costs would also be mitigated by the fact that they would not need to spend money on human translation from their usual vendor during this time. Until the MT engines we were developing were able to produce work of the agreed standard on their own, we would take responsibility for bringing the output up to the agreed standard.
How will we proceed?
The project is set to continue into at least 2021, when the initial 12-month set-up and monitoring period ends.
After this, we move into a period where our client may decide to have us continue to handle the TMS management, project management and MT maintenance for a minimum monthly fee.
Alternatively, our client might ask us to hand everything over to them. To do this, we will create a clone of the solution for their own SaaS user accounts or create these accounts for them and then provide the agreed training on how to administer and use the TMS to complete projects.
What was the outcome?
At the end of the first round of refinement, the BLEU scores for the two engines have reached rather impressive levels. But the best way to measure quality is to monitor how much human post-editing is needed to bring the output up to standard.
With our client telling us it’s looking good and our MT experts continuing work on refinement, we’re sure this is a trend which is set to continue.