In this workshop we look at some methodological extensions of Translation Mining, a methodology that was developed in the Time in Translation project to study cross-linguistic variation in the domain of tense. Translation Mining uses parallel corpora (corpora of translated texts) as a source of data, and multidimensional scaling (MDS) to visualize variation. Three talks will be centered around the following questions:
Time | Speaker | Title |
---|---|---|
10.30 - 11.15 | Jos Tellings | Conditionals in Translation: towards Translation Mining in a compositional setting |
break | ||
11.30 - 12.15 | Maarten Bogaards | Heuristic Translation Mining and Distributional Analysis: Using parallel and non-parallel corpora side by side |
Lunch break | ||
13.30 - 14.30 | Bert Le Bruyn |
Traduttore Traditore Squared: limits of parallel corpora for linguistic research? |
break |
14.45 - 15.45 | TinT team + speakers | group discussion |
Heuristic Translation Mining and Distributional Analysis: Using parallel and non-parallel corpora side by side
The conditional perfect, a quantitative analysis in English-French comparable-parallel corpora [CANCELLED]
This is a study of the frequency of the conditional perfect in English (WOULD HAVE DONE, COULD HAVE DONE, etc.) and French (AURAIT FAIT, AURAIT PU FAIRE, etc.) and translations from one language to the other. The corresponding constructions in both languages were observed in a corpus of almost 12-million words corpus consisting of four 2.9-million-word comparable and parallel subcorpora, tagged by POS and lemma, and analyzed using regular expressions. Intra-linguistically, authors and translators were compared using the Wilcoxon-Mann-Whitney test to determine whether they were sufficiently different to be considered as separate "sub-species" while Cliff's delta was taken as a measurement of effect size (i.e. how big a difference exists between sub-species), while potential inter-linguistic influences were assessed by means of Spearman's correlation test. French-translated-from-English was found to be distinct from original French in its use of all conditional perfect forms, while the differences between original and translated English were less obvious. Significant and more prominent differences between translators and authors were observed in both languages when COULD, MIGHT and POUVOIR were used as auxiliaries.
Conditionals in Translation: towards Translation Mining in a compositional setting
The Time in Translation project uses data from parallel corpora, together with multidimensional scaling (MDS) to create semantic maps. This methodology, dubbed Translation Mining in van der Klis et al. (2017), is used to investigate the cross-linguistic semantics of tense forms. I start with an introduction of the main ideas and results of the project of the past three years.
Tellings and van der Klis (2020) describe how linguists have used MDS in a range of applications from typology to formal linguistics, but always involving variation with respect to a single semantic parameter. Semanticists are also interested in compositional structures in which multiple semantic parameters conspire to produce semantic and pragmatic effects.
I show how Translation Mining can be used in a compositional setting, by studying conditional sentences. I extracted 1000 conditional sentences from the Europarl parallel corpus, which I annotated in Dutch and English with respect to the tense forms used in antecedent and consequent clause, as well as the modal structure.
I will show the resulting semantic maps for conditionals, and demonstrate how our online interface works for compositional data. I then discuss some of the empirical patterns that this process reveals, in particular on the distribution of Dutch modal zou (≈ 'would') in conditionals.
van der Klis, Martijn, Bert Le Bruyn & Henriëtte de Swart (2017). Mapping the PERFECT via Translation Mining. Proceedings of the EACL, 497-502.
Tellings, J. and M. van der Klis (2020). Multidimensional scaling and linguistic theory. Manuscript.