Workshop "Conditionals, Corpora, and Translation"

30 October 2020, online (Microsoft Teams)

In this workshop we look at some methodological extensions of Translation Mining, a methodology that was developed in the Time in Translation project to study cross-linguistic variation in the domain of tense. Translation Mining uses parallel corpora (corpora of translated texts) as a source of data, and multidimensional scaling (MDS) to visualize variation. Three talks will be centered around the following questions:

how can we apply Translation Mining and MDS to compositional structures, in particular conditional sentences?
how can we use Translation Mining with not directly comparable languages, and how can we combine data from parallel and non-parallel corpora?
how are translated texts different from non-translated texts, and are we justified to draw conclusions in one domain on the basis of data from the other?

Programme

Time	Speaker	Title
10.30 - 11.15	Jos Tellings	Conditionals in Translation: towards Translation Mining in a compositional setting
break
11.30 - 12.15	Maarten Bogaards	Heuristic Translation Mining and Distributional Analysis: Using parallel and non-parallel corpora side by side
Lunch break
13.30 - 14.30	~~Daniel Henkel~~ Bert Le Bruyn	~~The conditional perfect, a quantitative analysis in English-French comparable-parallel corpora~~ [CANCELLED] Traduttore Traditore Squared: limits of parallel corpora for linguistic research?
break
14.45 - 15.45	TinT team + speakers	group discussion

Speaker info (abstracts)

Maarten Bogaards (Universiteit Leiden)

Heuristic Translation Mining and Distributional Analysis: Using parallel and non-parallel corpora side by side

Abstract

Daniel Henkel (Université Paris 8)

The conditional perfect, a quantitative analysis in English-French comparable-parallel corpora [CANCELLED]

This is a study of the frequency of the conditional perfect in English (WOULD HAVE DONE, COULD HAVE DONE, etc.) and French (AURAIT FAIT, AURAIT PU FAIRE, etc.) and translations from one language to the other. The corresponding constructions in both languages were observed in a corpus of almost 12-million words corpus consisting of four 2.9-million-word comparable and parallel subcorpora, tagged by POS and lemma, and analyzed using regular expressions. Intra-linguistically, authors and translators were compared using the Wilcoxon-Mann-Whitney test to determine whether they were sufficiently different to be considered as separate "sub-species" while Cliff's delta was taken as a measurement of effect size (i.e. how big a difference exists between sub-species), while potential inter-linguistic influences were assessed by means of Spearman's correlation test. French-translated-from-English was found to be distinct from original French in its use of all conditional perfect forms, while the differences between original and translated English were less obvious. Significant and more prominent differences between translators and authors were observed in both languages when COULD, MIGHT and POUVOIR were used as auxiliaries.

Jos Tellings (Universiteit Utrecht)

Conditionals in Translation: towards Translation Mining in a compositional setting

Slides

The Time in Translation project uses data from parallel corpora, together with multidimensional scaling (MDS) to create semantic maps. This methodology, dubbed Translation Mining in van der Klis et al. (2017), is used to investigate the cross-linguistic semantics of tense forms. I start with an introduction of the main ideas and results of the project of the past three years.

Tellings and van der Klis (2020) describe how linguists have used MDS in a range of applications from typology to formal linguistics, but always involving variation with respect to a single semantic parameter. Semanticists are also interested in compositional structures in which multiple semantic parameters conspire to produce semantic and pragmatic effects.

I show how Translation Mining can be used in a compositional setting, by studying conditional sentences. I extracted 1000 conditional sentences from the Europarl parallel corpus, which I annotated in Dutch and English with respect to the tense forms used in antecedent and consequent clause, as well as the modal structure.

I will show the resulting semantic maps for conditionals, and demonstrate how our online interface works for compositional data. I then discuss some of the empirical patterns that this process reveals, in particular on the distribution of Dutch modal zou (≈ 'would') in conditionals.

van der Klis, Martijn, Bert Le Bruyn & Henriëtte de Swart (2017). Mapping the PERFECT via Translation Mining. Proceedings of the EACL, 497-502.

Tellings, J. and M. van der Klis (2020). Multidimensional scaling and linguistic theory. Manuscript.