Publications by Vilelmini Sosoni

Translation, interpreting, cognition: The way out of the box, 2021
In recent years, Post-Editing (PE) has been increasingly gaining ground, especially following the... more In recent years, Post-Editing (PE) has been increasingly gaining ground, especially following the advent of neural machine translation (NMT) models. However, translators still approach PE with caution and skepticism and question its real benefits. This study investigates the perception of both experienced and novice translators vis-à-vis PE, it compares the technical, temporal and cognitive effort expended by experienced translators during the full PE of NMT output with the effort expended by novice translators, focusing on the English-Greek language pair and explores potential differences in the quality of the post-edited texts. The findings reveal a more negative stance of the experienced translators as opposed to novice translators vis-à-vis Machine Translation (MT) and a more pragmatic approach vis-à-vis PE. However, the novice translators' more positive attitude does not seem to positively affect the temporal and cognitive effort that they expend. Finally, experienced translators have a tendency to overcorrect the NMT output, thus carrying out more redundant edits.

Applied Sciences, 2021
Evaluation of machine translation (MT) into morphologically rich languages has not been well stud... more Evaluation of machine translation (MT) into morphologically rich languages has not been well studied despite its importance. This paper proposes a classifier, that is, a deep learning (DL) schema for MT evaluation, based on different categories of information (linguistic features, natural language processing (NLP) metrics and embeddings), by using a model for machine learning based on noisy and small datasets. The linguistic features are string based for the language pairs English (EN)–Greek (EL) and EN–Italian (IT). The paper also explores the linguistic differences that affect evaluation accuracy between different kinds of corpora. A comparative study between using a simple
embedding layer (mathematically calculated) and pre-trained embeddings is conducted. Moreover, an analysis of the impact of feature selection and dimensionality reduction on classification accuracy
has been conducted. Results show that using a neural network (NN) model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation for EN–EL and EN–IT, by an increase of almost 0.40 points in correlation with human judgments on pairwise MT evaluation. It is observed that the proposed algorithm achieved better results on noisy and small datasets. In addition, for a more integrated analysis of the accuracy results, a qualitative
linguistic analysis has been carried out in order to address complex linguistic phenomena.

1st Workshop on Post-Editing in Modern-Day Translation/The 14th Conference of The Association for Machine Translation in the Americas , 2020
Machine Translation (MT) has been increasingly used in industrial translation production scenario... more Machine Translation (MT) has been increasingly used in industrial translation production scenarios thanks to the development of Neural Machine Translation (NMT) models and the improvement of MT output, especially at the level of fluency. In particular, in an effort to speed up the translation process and reduce costs, MT output is used as raw
translation to be subsequently post-edited by translators. However, post-editing (PE) has been found to differ from both human translation and revision of human translation in terms of the cognitive processes
and the practical goals and processes employed. In addition, translators remain sceptical towards PE and question its real benefits. The paper seeks to investigate the effort required for full PE and compare it with the effort required for manual translation, focusing on the English-Greek language pair and NMT output. In particular, eye-tracking and keystroke logging data are used to measure the effort expended by translators while translating from scratch and the effort required while post-editing the NMT output. The findings indicate that the effort is lower when post-editing than when translating from scratch, while they also suggest that experience in PE plays a role.
Proceedings of the 22nd Annual Conference of the European Association for Machine Translation, 2020
The present study aims to compare three systems: a generic statistical machine translation, a gen... more The present study aims to compare three systems: a generic statistical machine translation, a generic neural machine translation and a tailored-NMT system focusing on the English to Greek language pair. The comparison is carried out following a mixed-methods approach, i.e. automatic metrics, as well as side-by-side ranking, adequacy and
fluency rating, measurement of actual post editing effort and human error analysis performed by 16 postgraduate Translation students. The findings reveal a higher score for both the generic NMT and the tailored-NMT outputs as regards automatic metrics and human evaluation metrics, with the tailored-NMT output faring even better than the generic NMT output.

Translating and the Computer 41 Proceedings, 2019
In an effort to meet the demands in speed, productivity and low-cost, the translation industry ha... more In an effort to meet the demands in speed, productivity and low-cost, the translation industry has turned to Machine Translation (MT) and Post-editing (PE). Nowadays, MT output is used as raw translation to be
further post-edited by a translator (Lommel and DePalma, 2016). Yet, translators still approach PE with caution and scepticism and question its real benefits (Koponen 2012; Gaspari et al 2014; Moorkens 2018). In
addition, attitudes to MT and PE seem to affect PE effort and performance (Witczak, 2016; Çetiner and İşisağ, 2019). Under that light, this study aims to investigate the attitudes and perceptions of undergraduate translation students towards MT and PE and their performance before and after they receive training in MT
and PE. Questionnaires are used to capture their attitudes and perceptions, a calculation of the technical effort and the temporal effort expended by the students while post-editing is also used, while a human evaluation of he post-edited output is carried out to assess their performance and the quality of the post-edited texts. The analysis reveals a change in the students’ attitudes and perceptions; they report a more positive attitude toward MT and PE, they are more confident and faster, while they avoid over-editing.

Fit-For-Market Translator and Interpreter Training in a Digital Age (Language and Linguistics), 2019
Recent technological advances have given rise to a wider availability of Machine Translation (MT)... more Recent technological advances have given rise to a wider availability of Machine Translation (MT) systems for various language pairs, while the advent of neural machine translation (NMT) models have led to an improved MT quality, especially regarding fluency and in comparison to statistical machine translation (SMT) models. MT is thus increasingly used in industrial settings, a fact that has also attracted interest in the ways that translators and post-editors are, or should be, trained. This case study seeks to explore the effort involved in post-editing NMT and SMT outputs, and the ways in which the quality of MT systems used as well as the errors found in the raw MT output should be taken into consideration in post-editing (PE) training. To this end, the study examines, by means of eye-tracking and keystroke logging data, the performance of twenty professional Greek translators while postediting SMT and NMT outputs.

EU Legal Culture and Translation in the Era of Globalisation The Hybridisation of EU Terminology on the Example of Competition Law, 2019
The present chapter has a twofold aim: first, it reports on the panel EU Legal Culture and Transl... more The present chapter has a twofold aim: first, it reports on the panel EU Legal Culture and Translation at the ILLA (International Law and Language Association) relaunch conference, focusing on the main topics which emerged from the contributions – most notably, the hybridity of translator-mediated EU legal culture; second, it explores EU hybridity by focusing on the terminology of EU competition law which clearly demonstrates how concepts and ideas have travelled from outside the EU, colonising and/or merging with existing concepts, and
how they have travelled within the EU primarily through translation. The main argument set forward is that EU terminology is the result of the Europeanisation of law which is achieved through the convergence of national laws and law harmonisation, but is also strongly affected
by global trends which are in turn influenced by socio-political and historical factors. The final section discusses the ‘side effects’ of hybridity, including instability of meaning, graphic/surface
similarity and semantic opacity, asymmetries of terms between official languages and the complex relation between supranational and national levels of meaning.

We present a parallel wikified data set of parallel texts in eleven language pairs from the educa... more We present a parallel wikified data set of parallel texts in eleven language pairs from the educational domain. English sentences are lined up to sentences in eleven other languages (BG, CS, DE, EL, HR, IT, NL, PL, PT, RU, ZH) where names and noun phrases (entities) are manually annotated and linked to their respective Wikipedia pages. For every linked entity in English, the corresponding term or phrase in the target language is also marked and linked to its Wikipedia page in that language. The annotation process was performed via crowdsourcing. In this paper we present the task, annotation process, the encountered difficulties with crowdsourcing for complex annotation, and the data set in more detail. We demonstrate the usage of the data set for Wikification evaluation. This data set is valuable as it constitutes a rich resource consisting of annotated data of English text linked to translations in eleven languages including several languages such as Bulgarian and Greek for which not many LT resources are available.
The present work describes a multilingual corpus of online content in the educational domain, i.e... more The present work describes a multilingual corpus of online content in the educational domain, i.e. Massive Open Online Course material, ranging from course forum text to subtitles of online video lectures, that has been developed via large-scale crowdsourcing. The English source text is manually translated into 11 European and BRIC languages using the CrowdFlower platform. During the process several challenges arose which mainly involved the in-domain text genre, the large text volume, the idiosyncrasies of each target language, the limitations of the crowdsourcing platform, as well as the quality assurance and workflow issues of the crowdsourcing process. The corpus constitutes a product of the EU-funded TraMOOC project and is utilised in the project in order to train, tune and test machine translation engines.

The limited availability of in-domain training data is a major issue in the training of applicati... more The limited availability of in-domain training data is a major issue in the training of application-specific neural machine translation models. Professional outsourcing of bilingual data collections is costly and often not feasible. In this paper we analyze the influence of using crowdsourcing as a scalable way to obtain translations of target in-domain data having in mind that the translations can be of a lower quality. We apply crowdsourcing with carefully designed quality controls to create parallel corpora for the educational domain by collecting translations of texts from MOOCs from English to eleven languages, which we then use to fine-tune neural machine translation models previously trained on general-domain data. The results from our research indicate that crowdsourced data collected with proper quality controls consistently yields performance gains over general-domain baseline systems, and systems fine-tuned with pre-existing in-domain corpora.
Łucja Biel & Vilelmini Sosoni (2017). „The translation of economics and the economics of translat... more Łucja Biel & Vilelmini Sosoni (2017). „The translation of economics and the economics of translation.” Perspectives. Studies in Translation Theory and Practice, 25:3, 351-361, DOI: 10.1080/0907676X.2017.1313281

This paper reports on an empirical study concerning professional translators’ attitudes towards a... more This paper reports on an empirical study concerning professional translators’ attitudes towards and experience with translation crowdsourcing. In particular, it seeks to explore how professional translators perceive translation crowdsourcing and what concerns they raise, if any. It also aims at identifying any problems they may face as
crowdworkers during the translation process. The investigation takes place in the framework of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project where a crowd of professional translators is used for the translation into Greek (EL) of English (EN) MOOC (Massive Open Online Courses) educational data
on the CrowdFlower platform with the end goal of using such translations to train and tune a statistical machine translation (SMT) system. It concludes highlighting the unexpected benefits that crowdsourcing may bring to professional translators.

Although central in translation practice, and increasing in volume as well as impact due to the g... more Although central in translation practice, and increasing in volume as well as impact due to the growing globalisation and explosion of financial transactions and increasing business activity, economic translation –including business and financial translation – has been little researched and discussed over the years. Yet it constitutes a fascinating and robust area that grows hand-in-hand with the evolution of human civilisation and the development of societies or the developing world. In this global village, the concept of ‘economics’ in translation has become even more relevant lately, due to the ever-increasing technicalisation of the profession and the alteration of the translation habitus in Bourdieu’s terms, which unavoidably affects the translation profession, not least with respect to the diminishing rates and deteriorating working conditions. This special issue aims to explore the specificities and particularities of economic translation as it has been practised over the years and as it is being currently practised around the globe, and also investigate new research trends that appear in the field. At the same time, it wishes to cast some light on the economics of the profession and the changing habitus of the translator.

Proceedings of the 11th International Workshop on Semantic and Social Media Adaptation and Personalization., 2016
In the framework of the TraMOOC (Translation for Massive Open Online Courses) research and innova... more In the framework of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project, NLP tasks for parallel translation and entity annotation have been implemented, using a crowdsourcing platform. The educational genre (videolectures subtitles, forums discussions, course assignments), the type of text (segmentation, misspellings, syntax errors, specialized terminology, science formulas, limited knowledge on context) of the source data, and the multilingual approach of the involved activities (a total of 12 European and BRIC languages have been focused on) were very challenging for the success of our undertaking. Experimental trials revealed significant findings for the purposes of Language Technology research as well as limitations in crowdsourcing linguistic data collections for multilingual tasks.

The present work is an overview of the TraMOOC (Translation for Massive Open Online Courses) rese... more The present work is an overview of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project, a machine translation approach for online educational content. More specifically, videolectures, assignments, and MOOC forum text is automatically translated from English into eleven European and BRIC languages. Unlike previous approaches to machine translation, the output quality in TraMOOC relies on a multimodal evaluation schema that involves crowdsourcing, error type markup, an error taxonomy for translation model comparison, and implicit evaluation via text mining, i.e. entity recognition and its performance comparison between the source and the translated text, and sentiment analysis on the students' forum posts. Finally, the evaluation output will result in more and better quality in-domain parallel data that will be fed back to the translation engine for higher quality output. The translation service will be incorporated into the Iversity MOOC platform and into the VideoLectures.net digital library portal.
Proceedings of the 18th Annual Conference of the European Association for Machine Translation

Translation and Hybridity in the European Union: Fighting the Demons
G. Androulakis (ed.) (2002) Conference Proceedings. Translating in the 21st Century: Trends and Prospects. Thessaloniki: University of Thessaloniki. , 2002
"The European Union is a democratic federation of equal nations and as a result all the citizens ... more "The European Union is a democratic federation of equal nations and as a result all the citizens of its member-sates have a right to use their national official language. This right is enshrined in the policy of multilingualism which dictates the use by the European institutions of the official languages of the member states and which was agreed by the founding fathers with Council Regulation No 1 (15 April 1958). Today there are 11 official languages for the current 15 member states.
At the heart of multilingualism we find text production, translation and interpretation, activities which are inherently complex but which become even more complex in the context of the European Union. The present paper will focus on text production and translation and on their end products which have been labelled 'hybrid texts' and which have been fiercely criticised by the public and press alike, at least in English and Greek (Goffin, 1994: 636; Koutsivitis, 1989: 31-46; Deilinos, 1981). Hybrid texts are texts that appear as an outcome of negotiations between different languages and cultures and may involve features which are contradictory to TL (Target Language) and TC (Target Culture) norms (Trosborg, 1997: 329-330). The aim of the present paper is to investigate this notion of 'hybridity' both in original text production and in translation and to come to a conclusion as to whether Europeans are in fact dealing with a "demon" or simply an "exotic animal".
""

Lexical Cohesion: The Case of European Union Texts in English and Greek
M. Cazzoli-Goeta, S. Pourcel, S. & L. Van Espen (eds.) (2002) Conference Proceedings of the Fifth Durham Postgraduate Conference in Theoretical and Applied Linguistics. Durham: University of Durham., 2002
"According to Halliday and Hasan, the property that distinguishes a text from a non-text is textu... more "According to Halliday and Hasan, the property that distinguishes a text from a non-text is texture (1976). Two necessary conditions for the success of a text in terms of texture are cohesion, which is a characteristic of the surface text, and coherence, which is the underlying characteristic of textual worlds. Halliday and Hasan have identified five broad types of cohesion: reference, substitution, ellipsis, conjunction and lexical cohesion. Out of the five, lexical cohesion, which is usually further divided into reiteration/repetition and collocation, constitutes a major way of joining one sentence to another in a text.
The present paper will focus on lexical cohesion, and in particular on lexical repetition in all its forms i.e. repetition mediated through "general lexical relations" and repetition mediated through "instantial lexical relations" (Hasan: 1984). In particular, seven categories of lexical cohesion are used in the present study, namely Simple Lexical Repetition, Complex Lexical Repetition, Simple Paraphrase, Hyponymy, Meronymy, Equivalence and Semblance. These categories stem from a combination of Hoey's theory of patterns of lexis in text (1991) and Hasan's framework of coherence and cohesive harmony (1984) and are applied to six texts, with the aim to establish whether there are any differences in the frequency in which they occur.
More specifically, starting from the assumption that that the larger the number of cohesion links in a language piece, the easier it is for readers/hearers to process it, an attempt is made to investigate how the number of cohesion links differs with respect to three variables: the language variable, the translation variable and the hybridity variable. Consequently, the texts that are analysed are divided into Greek and English, originals and translations and hybrid and non-hybrid. The findings of this analysis can cast some light on the way lexical cohesion works with respect to the three above-mentioned variables and can be of considerable value to translators and drafters of EU official documents whose work has, at times, been attacked by the public and the media as obscure, unnatural and even erroneous (Goffin, 1994: 636; Deilinos, 1981).
"
Uploads
Publications by Vilelmini Sosoni
embedding layer (mathematically calculated) and pre-trained embeddings is conducted. Moreover, an analysis of the impact of feature selection and dimensionality reduction on classification accuracy
has been conducted. Results show that using a neural network (NN) model with different input representations produces results that clearly outperform the state-of-the-art for MT evaluation for EN–EL and EN–IT, by an increase of almost 0.40 points in correlation with human judgments on pairwise MT evaluation. It is observed that the proposed algorithm achieved better results on noisy and small datasets. In addition, for a more integrated analysis of the accuracy results, a qualitative
linguistic analysis has been carried out in order to address complex linguistic phenomena.
translation to be subsequently post-edited by translators. However, post-editing (PE) has been found to differ from both human translation and revision of human translation in terms of the cognitive processes
and the practical goals and processes employed. In addition, translators remain sceptical towards PE and question its real benefits. The paper seeks to investigate the effort required for full PE and compare it with the effort required for manual translation, focusing on the English-Greek language pair and NMT output. In particular, eye-tracking and keystroke logging data are used to measure the effort expended by translators while translating from scratch and the effort required while post-editing the NMT output. The findings indicate that the effort is lower when post-editing than when translating from scratch, while they also suggest that experience in PE plays a role.
fluency rating, measurement of actual post editing effort and human error analysis performed by 16 postgraduate Translation students. The findings reveal a higher score for both the generic NMT and the tailored-NMT outputs as regards automatic metrics and human evaluation metrics, with the tailored-NMT output faring even better than the generic NMT output.
further post-edited by a translator (Lommel and DePalma, 2016). Yet, translators still approach PE with caution and scepticism and question its real benefits (Koponen 2012; Gaspari et al 2014; Moorkens 2018). In
addition, attitudes to MT and PE seem to affect PE effort and performance (Witczak, 2016; Çetiner and İşisağ, 2019). Under that light, this study aims to investigate the attitudes and perceptions of undergraduate translation students towards MT and PE and their performance before and after they receive training in MT
and PE. Questionnaires are used to capture their attitudes and perceptions, a calculation of the technical effort and the temporal effort expended by the students while post-editing is also used, while a human evaluation of he post-edited output is carried out to assess their performance and the quality of the post-edited texts. The analysis reveals a change in the students’ attitudes and perceptions; they report a more positive attitude toward MT and PE, they are more confident and faster, while they avoid over-editing.
how they have travelled within the EU primarily through translation. The main argument set forward is that EU terminology is the result of the Europeanisation of law which is achieved through the convergence of national laws and law harmonisation, but is also strongly affected
by global trends which are in turn influenced by socio-political and historical factors. The final section discusses the ‘side effects’ of hybridity, including instability of meaning, graphic/surface
similarity and semantic opacity, asymmetries of terms between official languages and the complex relation between supranational and national levels of meaning.
crowdworkers during the translation process. The investigation takes place in the framework of the TraMOOC (Translation for Massive Open Online Courses) research and innovation project where a crowd of professional translators is used for the translation into Greek (EL) of English (EN) MOOC (Massive Open Online Courses) educational data
on the CrowdFlower platform with the end goal of using such translations to train and tune a statistical machine translation (SMT) system. It concludes highlighting the unexpected benefits that crowdsourcing may bring to professional translators.
At the heart of multilingualism we find text production, translation and interpretation, activities which are inherently complex but which become even more complex in the context of the European Union. The present paper will focus on text production and translation and on their end products which have been labelled 'hybrid texts' and which have been fiercely criticised by the public and press alike, at least in English and Greek (Goffin, 1994: 636; Koutsivitis, 1989: 31-46; Deilinos, 1981). Hybrid texts are texts that appear as an outcome of negotiations between different languages and cultures and may involve features which are contradictory to TL (Target Language) and TC (Target Culture) norms (Trosborg, 1997: 329-330). The aim of the present paper is to investigate this notion of 'hybridity' both in original text production and in translation and to come to a conclusion as to whether Europeans are in fact dealing with a "demon" or simply an "exotic animal".
""
The present paper will focus on lexical cohesion, and in particular on lexical repetition in all its forms i.e. repetition mediated through "general lexical relations" and repetition mediated through "instantial lexical relations" (Hasan: 1984). In particular, seven categories of lexical cohesion are used in the present study, namely Simple Lexical Repetition, Complex Lexical Repetition, Simple Paraphrase, Hyponymy, Meronymy, Equivalence and Semblance. These categories stem from a combination of Hoey's theory of patterns of lexis in text (1991) and Hasan's framework of coherence and cohesive harmony (1984) and are applied to six texts, with the aim to establish whether there are any differences in the frequency in which they occur.
More specifically, starting from the assumption that that the larger the number of cohesion links in a language piece, the easier it is for readers/hearers to process it, an attempt is made to investigate how the number of cohesion links differs with respect to three variables: the language variable, the translation variable and the hybridity variable. Consequently, the texts that are analysed are divided into Greek and English, originals and translations and hybrid and non-hybrid. The findings of this analysis can cast some light on the way lexical cohesion works with respect to the three above-mentioned variables and can be of considerable value to translators and drafters of EU official documents whose work has, at times, been attacked by the public and the media as obscure, unnatural and even erroneous (Goffin, 1994: 636; Deilinos, 1981).
"