Hub's sentence alignment issues: punctuation and spatial alignment
Hello,I am encountering issues with the Translator Hub's sentence alignment tool when uploading parallel documents. In particular, the tool seems to be quite sensitive to punctuation and spatial alignment for the text. For instance, if two independent clauses are written in one sentence and separated by a comma in the source-language version of the document, while they are separated by a semi-colon in the target-language version of the document, the sentence alignment tool registers the semi-colon as a sentence separator, giving a 1:2 sentence ratio for source:target language. While this separation makes grammatical sense, it makes the process of sentence alignment quite difficult as most of my translated documents are not 100% identical in punctuation.Same problem applies for spatial alignment -- let's say that, for a multi-page document, the source-language version of the document is written in 12 pt font, while the target-language version of the document is written in 14 pt font. The text in the target-language version of the document occupies more space and "enters" the next page "sooner" than the one in the source-language version of the document. The issue that arises here is: if the document has footnotes (which most of mine do), the sentence alignment tool picks up the footnote from the first page as a separate sentence, which disturbs the sentence alignment afterwards since the texts were not identically aligned on the first page. Is there a way to circumvent these issues or does one have to extract sentences in the text file and then manually correct everything? Thank you.
For the first issue: the semi-colon is treated as sentence breaker and this is by design. We canÕt change this since it will have a big impact on the system.
For the second issue: this issue is caused by the footnote got inserted between sentences. Ca n you remove footnote before uploading the file?
MT Hub Support