Desktop Publishing (DTP)
Pre-processing and Post-processing Documents for CAT Tools
Nowadays, the processing of documents (DTP) for the translation phase is an essential step for the efficiency of the translation process as a whole, but also for the use of certain specialized programs we currently have at our disposal in this field: computer-assisted translation software (CAT tools), translation memories, term bases and quality assurance programs (QA and QC). Without a properly formatted document, you might encounter situations in which you are not able to benefit from the above, as the text may be altered by numerous factors during the conversion process.
Furthermore, in the translation process, the involvement of a DTP specialist may be needed in the following two stages:
- During the pre-processing stage of the source documents that need translation
- During the post-processing stage of the resulted translation
The pre-processing stage of the documents occurs when the source documents that need translation are not in an editable format (PDF, PNG, .dwg and other). Most clients put an emphasis on the final format of the translated documents so, in order to fulfill the clients’ needs, but also to be able to benefit from a series of specialized programs – which are significantly optimizing the work of a translator with the help of terminology resources and automated processes – throughout the translation phase, the documents must be first prepared for translation.
Pre-processing implies the following steps:
- analyzing the source document to identify the sections that need a detailed pre-processing and verification of certain aspects that may have an impact on the final translated document (double spaces, tabs, erroneous alignment, symbols, text converted into images and many more);
- converting the document into an editable format through specialized programs (for example, Abby FineReader, Smallpdf, Adobe, InDesign, CorelDRAW, among others);
- performing a proper “cleaning” of the document, respectively deleting symbols and characters that are not in accordance with the source text (which may arise during the conversion process), additional spaces that are not often visible, tags that may make the translation process more difficult (if using a CAT tool), which can be done with the help of a dedicated program (for example, TransTools);
- reformatting the document, which involves the verification of the document structure and the faithful reconstruction of the source formatting in the resulted document, by eliminating the possible issues that appeared following the conversion, including the transcription of text sections that may have not been processed and are still displayed in .png, .jpg or other image formats (this usually happens if the source document is a non-editable scanned document;
- ensuring that the source text is correct (including spelling, spaces, line breaks), using the various instruments that Microsoft Word offers;
- preparing the document for a CAT tool (if the case, using certain processing rules – like Regex – both for efficiency and for ensuring that the elements of the document have not changed);
- analyzing the document in the computer-assisted program (CAT tool) to ensure it has a correct segmentation and no further issues may arise in the process. This allows the use and leverage of all the entries of a translation memory (TM) and guarantees its optimal consolidation after the translation process.
Principal issues to consider during the verification phase:
- line breaks that result in an erroneous segmentation;
- spelling issues – ensuring that the source text is correct after the conversion;
- unnecessary section/page breaks;
- if certain fonts are unavailable for a specific language.
Without a proper processing of the layout of the document that is to be translated, a translator may encounter difficulties, both during the actual translation phase and when using or updating a translation memory:
- when analyzing a single document from a project (through comparison with the rest of the documents of the projects or with the help of translation memories and term bases), a faulty segmentation of its elements (phrases, short sentences, even words) will generate an incorrect analysis and, implicitly, will result in the inability of identifying them in the translation memories, being, thus, unusable;
- the existence of certain elements/characters resulted from the conversion of documents will be signaled through tags, which may harden the translation process and can, in certain situations, generate errors when importing and exporting documents in the CAT tool used for the translation.
However, pay attention: not all tags are elements that must be erased from the content of the document – many of them represent important formatting placeholders (space breaks, strikethrough, bold, italic, punctuation marks and many more). These can be protected by implementing various Regex rules or by converting them into actual symbols and formats, actions that assist the translation process and avoid their accidental erasure.
In our translation projects, more often than not, we must outsource the pre-processing and post-processing stages to a DTP specialist, as the documents we receive for translation are usually in a PDF format, and the native files (INDD, IDML and other) are not available. Neither the PDF files nor other non-editable file formats are compatible with computer-assisted software (CAT tools), making the pre-processing stage a mandatory one, to harvest the benefits a CAT tool can offer.
Communication with the final client is the foundation of a successful translation project; understanding the translation process and involving the client from the start is a crucial step for ensuring a seamless process. Moreover, clarifying certain aspects from the beginning (concerning format processing, preferred terminology, what should be left out from the translation) contributes to achieving extraordinary results.
In an ideal scenario, the translation projects should include the following:
- a source file for each document, respectively the native file created in specialized programs, such as Adobe InDesign, CorelDraw, Visio, Adobe Illustrator, Microsoft Office and others. In case the project contains a larger number of files, it is recommended to use an organized transfer method (along with a centralizing file containing all documents and their information, transfer via sharepoint/server and other);
- a file containing instructions and specifications on: the target language, respectively specifications about the target audience and country in which the document will be used (for example, a translation into Spanish implies opting for adequate linguists and a quite different approach when it comes to translations that are used in Spain, Argentina, or Columbia);
- purpose of the translation: where will it be used, is it meant for internal use or does it have a specific scope, which is the desired result following its use (marketing, information or other)? The purpose of the translation helps in determining various aspects concerning the approach in the translation process: from the tone of voice (formal or informal) to the establishment of the stages (native revision, adaptation, localization) which the text will go through until the final version is reached;
- if the case, highlighting certain specific aspects (to what extent is the adaptation necessary, do the images containing text need processing and translation, preferences regarding the tone of voice – should it be formal or informal, passive or active and other);
- a terminology glossary, a style guide, a term base, a translation memory, or reference materials for the content that needs translation;
- information on the desired deadline;
- information on the final format of the resulted translation, based on the requirements of the client (monolingual or bilingual, editable or non-editable, suitable for a CAT tool, in extract or in full and many more);
- information of the internal revision process – if there is going to be a person within the company that will revise the translation or whom should be able to help in case of questions or needed clarifications;
The post-processing stage of the resulted translation involves the verification of the target documents exported from a CAT tool. There are also the happy cases in which the documents sent to us for translation have already gone through a pre-processing stage, one performed by the client, or which, after our internal pre-processing, have resulted in no formatting errors (there could be slight differences because of the specifics of each language, but these do not require significant work) or these are native files.
This post-processing stage involves:
- including the external elements in the translated IDML file;
- ensuring the text is correctly displayed and that it is visible in the entire document;
- ensuring that all elements are correctly positioned and visible;
- ensuring that all links or other functional elements are working;
- ensuring both that the correct fonts were used and that these are used throughout the entire document to reflect the original document;
- ensuring that the formatting of the final translated document reflects the original document.
Before finalizing the post-processing stage, a final verification of the translated file in an exported format (for example, PDF) is highly recommended. This way, the DTP specialist will be able to make the final amendments before saving the final package for the client.
The following are the types of files processed by the DTP specialist: Microsoft Office (Microsoft Word, Excel, PowerPoint, Visio), Adobe Creative Cloud (PDF, Adobe Illustrator, InDesign), CorelDRAW and many more.
In conclusion, technology has become a huge part of the translation process, and tools have become a must, the question no longer being whether to use them or not, but rather which best suits your needs. Currently, at least when we talk about specialized translations, a dictionary does no longer represents the only resource a translator needs to perform his work at the highest standards. In recent years, along with the advancements in the field of artificial intelligence and machine learning, we have seen an explosion of programs, tools and instruments that are able to aid (and make easier) a translator’s work – from software that converts documents to computer-assisted programs, which come bearing a great variety of incorporated functionalities for translation memories and term bases – translation technology puts the linguists in charge of the creative part of the process, while it takes care of the automated and repetitive tasks.
You can find a few more blogposts here – we seized the opportunity, and took a short break from the daily to-do lists and translations tools.
Date of publication: 10.01.2023