Applied Corpus Linguistics Research Papers - Academia.edu
About
Press
Papers
We're Hiring!
All Topics
Languages and Linguistics
Applied Linguistics
Applied Corpus Linguistics
Applied Corpus Linguistics
description
791 papers
group
11,728 followers
lightbulb
About this topic
Applied Corpus Linguistics is the interdisciplinary field that utilizes corpus linguistics methodologies to address practical language-related issues. It involves the analysis of large, structured sets of linguistic data (corpora) to inform language teaching, translation, lexicography, and other applied linguistic domains, enhancing understanding of language use in real-world contexts.
lightbulb
About this topic
Applied Corpus Linguistics is the interdisciplinary field that utilizes corpus linguistics methodologies to address practical language-related issues. It involves the analysis of large, structured sets of linguistic data (corpora) to inform language teaching, translation, lexicography, and other applied linguistic domains, enhancing understanding of language use in real-world contexts.
Key research themes
1. How can corpus linguistics methodologies rigorously capture and quantify language change and grammaticalization processes?
This research theme addresses the integration of corpus linguistics techniques with grammaticalization theory to empirically track and analyze language change phenomena, such as grammaticalization, subjectification, and semantic shifts. It matters because it moves the study of language change beyond qualitative observations and anecdotal evidence toward statistically supported, replicable findings derived from large, diachronic corpora, thereby enriching both subfields and offering a more precise understanding of linguistic evolution.
Continuing the dialogue between corpus linguistics and grammaticalization theory: Three case studies
by
María José López Couso
2023, Corpus Linguistics and Linguistic Theory
Key finding: By utilizing large-scale diachronic and contemporary English corpora (e.g., Helsinki Corpus, ARCHER, COCA), the study demonstrates that corpus linguistics offers robust empirical grounds to recognize and document...
Key finding: By utilizing large-scale diachronic and contemporary English corpora (e.g., Helsinki Corpus, ARCHER, COCA), the study demonstrates that corpus linguistics offers robust empirical grounds to recognize and document grammaticalization processes. Specifically, it shows how frequency data and discourse contexts gleaned from corpora reveal gradual category transitions and interdependencies between form and meaning, offering statistical validation to grammaticalization trajectories such as the existential 'there', epistemic parentheticals, and discourse markers.
article
View Paper
"Corpus Linguistics and Grammaticalisation Theory: Beyond Statistics and Frequency?" 121-150.
by
Christian Mair (English Linguistics)
2016
Key finding: This work argues that while both corpus linguistics and grammaticalization theory emphasize frequency and discourse context, many grammaticalization studies initially relied on small, manually compiled corpora, limiting...
Key finding: This work argues that while both corpus linguistics and grammaticalization theory emphasize frequency and discourse context, many grammaticalization studies initially relied on small, manually compiled corpora, limiting statistical rigor. The paper advances the methodological conversation by showing how larger, digitized corpora enable precise identification of incipient grammaticalization and differentiate between dynamic grammatical change and occasional grammatical usage, reconciling quantitative data with interpretive theoretical frameworks.
article
View Paper
Some proposals towards more rigorous corpus linguistics
by
Stefan Th. Gries
2015, Zeitschrift für Anglistik und Amerikanistik
Key finding: The paper critiques prevalent corpus linguistic studies for methodological oversights such as failing to distinguish between 'by-subjects' and 'by-items' analyses, which are critical for evaluating the generalizability of...
Key finding: The paper critiques prevalent corpus linguistic studies for methodological oversights such as failing to distinguish between 'by-subjects' and 'by-items' analyses, which are critical for evaluating the generalizability of frequency and co-occurrence findings. By advocating for rigorous statistical methods adapted from psycholinguistics, including analyses of variance and reliability testing, it contributes to enhancing the empirical robustness of corpus-based investigations of linguistic phenomena including syntactic priming and grammatical variation.
article
View Paper
keyboard_arrow_down
Show more
2. What are the effective corpus-based approaches and tools for applied language teaching and learner engagement?
This theme focuses on applied corpus linguistics in language education, emphasizing how corpora and corpus analysis techniques can be harnessed to improve language teaching, resource design, syllabus development, and learner autonomy. It matters because empirical frequency data, authentic examples, and learner-specific corpus analysis inform pedagogical decisions, enabling evidence-based teaching and enhancing learner awareness and linguistic competence through corpus-driven materials and activities.
Classroom applications of corpus analysis
by
Alex Boulton
2018
Key finding: The paper identifies three principal applied uses of corpora in language teaching: (1) informing more accurate descriptions of language varieties and features via frequency analysis; (2) providing teachers with accessible...
Key finding: The paper identifies three principal applied uses of corpora in language teaching: (1) informing more accurate descriptions of language varieties and features via frequency analysis; (2) providing teachers with accessible corpus tools to analyze texts and tailor instruction; (3) directly involving learners with corpus data through concordance-driven activities that promote inductive and deductive exploration of language use, thereby fostering data-driven learning and enhancing awareness of frequency, collocations, and register.
article
View Paper
Understanding Corpus Linguistics by Danielle Barth & Stefan Schnell, 2022
by
Zahra Ghane
2023, Corpus Pragmatics
Key finding: This comprehensive textbook equips applied linguists and language educators with a state-of-the-art overview of corpus linguistics, including corpus types, annotation schemes, query tools, and statistical measures. It...
Key finding: This comprehensive textbook equips applied linguists and language educators with a state-of-the-art overview of corpus linguistics, including corpus types, annotation schemes, query tools, and statistical measures. It emphasizes practical considerations in corpus design, metadata handling, querying (e.g., concordances, frequency plots, keyness), and annotation techniques, thereby facilitating effective corpus use for empirical language description and pedagogical application.
article
View Paper
English Language Teaching: Global Perspectives
by
Vijay Kumar Roy
2025, Paragon International Publishers, New Delhi
Key finding: This edited volume highlights empirical studies employing corpus methodologies to analyze English language teaching in diverse cultural contexts. It addresses corpus-informed syllabus design, language learning strategies for...
Key finding: This edited volume highlights empirical studies employing corpus methodologies to analyze English language teaching in diverse cultural contexts. It addresses corpus-informed syllabus design, language learning strategies for weak learners, and challenges posed by digital age pedagogies, thereby illustrating the pivotal role of corpus linguistics in adapting teaching approaches to globalized, multilingual education settings.
article
View Paper
keyboard_arrow_down
Show more
3. How can corpus linguistics methodologies be systematically applied to analyze specialized discourse genres such as tourism and pragmatics?
This theme explores corpus linguistics as a method to investigate specific language domains and pragmatic phenomena within authentic discourse usage. It prioritizes quantitative and qualitative corpus techniques to elucidate patterns, rhetorical strategies, and contextual meanings in specialized genres, facilitating cross-disciplinary insights and practical applications for discourse analysis, translation studies, and pragmatic interpretation.
Corpus pragmatics: laying the foundations
by
Karin Aijmer
2024, Corpus Pragmatics
Key finding: The work pioneers corpus pragmatics by integrating corpus linguistics methodologies with pragmatic theory, demonstrating empirical approaches to analyzing context-dependent meanings and inferencing processes in actual...
Key finding: The work pioneers corpus pragmatics by integrating corpus linguistics methodologies with pragmatic theory, demonstrating empirical approaches to analyzing context-dependent meanings and inferencing processes in actual language use. It stresses the importance of discourse context, multimodality, and interactional dynamism, positioning corpus data as essential for investigating pragmatic phenomena such as implicature, politeness, and dialogicity.
article
View Paper
The Use of Corpus Analysis in Analysing Tourism Texts
by
Bashar Abdulkareem Alali
2025, South Asian Journal of Social Sciences and Humanities
Key finding: This study articulates the application of corpus linguistics in tourism discourse, illustrating how concordancers and statistical tools (e.g., WordSmith, AntConc) uncover recurrent lexical patterns, collocations, and...
Key finding: This study articulates the application of corpus linguistics in tourism discourse, illustrating how concordancers and statistical tools (e.g., WordSmith, AntConc) uncover recurrent lexical patterns, collocations, and rhetorical strategies across diverse tourism materials like brochures and websites. It evidences corpus analysis’ versatility in exposing cultural values, discourse construction of destinations, and genre-specific language, thereby informing translation and language teaching in tourism.
article
View Paper
Principles of corpus querying: A discussion note
by
Sass Bálint
2023, Acta Linguistica Academica
Key finding: Providing foundational guidance for effective corpus querying, this paper formalizes eight principles that underpin robust data retrieval from annotated corpora. It emphasizes the necessity of mastering formal query languages...
Key finding: Providing foundational guidance for effective corpus querying, this paper formalizes eight principles that underpin robust data retrieval from annotated corpora. It emphasizes the necessity of mastering formal query languages (e.g., CQL), accounting for annotation imperfections, and carefully contextualizing queries to ensure relevant and comprehensive data extraction, thereby underpinning empirical studies across specialized discourse domains including pragmatics and applied linguistics.
article
View Paper
keyboard_arrow_down
Show more
Related Topics
Corpus Linguistics and Discourse Analysis
Corpus Linguistics & Language Pedagogy
Learner corpora
Corpus Linguistics and Translation Studies
Corpus-Based Translation Studies
Corpus compilation and design
Formulaic Language
Corpus-based discourse analysis
Corpus-Based Studies
Applied Linguistics
All papers in Applied Corpus Linguistics
Newest
Top papers
Most cited
Most downloaded
"Why Isn't Anything Showing Up?": Interactional Practices and the Development of Corpus Literacy in Data-Driven Writing
by
Peter Crosthwaite
2026, Digital Studies in Language and Literature
Corpus-based data-driven learning is widely recognised in applied linguistics, yet how students interact with and around corpus tools in Vietnamese classrooms remains underexplored. Grounded in social constructivist learning theory, this...
more
Corpus-based data-driven learning is widely recognised in applied linguistics, yet how students interact with and around corpus tools in Vietnamese classrooms remains underexplored. Grounded in social constructivist learning theory, this study investigates how Vietnamese EFL students engage with a corpus tool during collaborative writing, and the difficulties and strategies emerging from this process. The study involved six English-major undergraduates working in small groups during a 60-minute corpus-based writing session in a Vietnamese university classroom. Students' interactions with the corpus tool, peers, and the teacher were video-recorded and analysed qualitatively. The findings show that corpus use unfolded along two closely connected dimensions: operational interaction, related to navigating the corpus and retrieving data, and interpretive interaction, related to making sense of corpus evidence for writing decisions. Students also encountered technical difficulties during corpus use. These difficulties were managed through peer-supported step recovery, evidence-based comparison, andwhen uncertainty persistedteacher mediation. Rather than indicating failure, such difficulties reflected learners' emerging corpus literacy. The data also reveal learners occasionally oriented to other digital tools such as Google search or AI-based writing assistants as potential resources for verifying corpus findings. Overall, the study highlights the interactional and developmental nature of corpus-assisted writing and emphasises the importance of structured support and flexible use of multiple tools during corpus-based instruction.
description
View Paper
arrow_downward
Ecolinguistic analysis of ecological framing in agricultural weather reports
by
Muhammad Saleem
2026, GeoJournal
This study investigates how weather reports construct agriculture discourse through language. It focuses on how weather discourse frames agriculture damage and ecological response. The research analyzes a corpus of 750 weather reports...
more
This study investigates how weather reports construct agriculture discourse through language. It focuses on how weather discourse frames agriculture damage and ecological response. The research analyzes a corpus of 750 weather reports (150 each from CNN, BBC, DW, Aljazeera, and GNN) published from December 1, 2024 to October 30, 2025, with a total of 112,587 tokens. The study uses AntConc (version 4.3.1), a corpus analysis tool, (Anthony, 2024) to extract keywords, collocations, and concordance lines relevant to agriculture. Qualitative analysis was conducted using Stibbe 2021) Stories We Live By framework, and quantitative data were analyzed through AntConc. Findings reveal that warlike metaphors like parched landscapes, string of storms, encroaching desert, hold back, buffer zones, fuel rapid growth, gobble up vegetation, and greening the land dominate across Western news outlets. Narratives follow a predictable disaster-responseoutcome structure which promotes episodic crisis framing while underplaying systemic causes (climate change or policy failure). Identity constructions present farmers either as helpless or heroic, while meteorological and governmental institutions are elevated as technocratic saviors. Ideological analysis shows that economic framings often override ecocentric concerns, with plant life viewed predominantly through lenses of yield, loss, and commodity value. Reports from Aljazeera and GNN occasionally diverge from this trend by foregrounding subsistence agriculture, community resilience, and traditional knowledge systems. This presents ecocentric and culturally grounded discourse. The study concludes that weather reporting constitutes a potent genre for shaping public understanding of agricultural risk and protection. It argues for the integration of beneficial discourses that promote ecological resilience, farmer agency, and sustainable land use.
description
View Paper
arrow_downward
עדויות לקיום עדותיות (evidentiality) בעברית
by
Danny Kalev
2026, לשוננו
Emerging Evidential Constructions in Contemporary Hebrew A novel syntactic construction consisting of an alleged complement clause that lacks the complementizer še ‘that’ is emerging in Contemporary Hebrew, e.g., xašavti ⓪ ʔata vepaʔulina...
more
Emerging Evidential Constructions
in Contemporary Hebrew
A novel syntactic construction consisting of an alleged complement clause that lacks the complementizer še ‘that’ is emerging in Contemporary Hebrew, e.g., xašavti ⓪ ʔata vepaʔulina tavoʔu ‘I thought ⓪ you and Paulina were coming’. Although traditionally analyzed as complex sentences,
I argue that utterances in this pattern are evidential constructions (ECs). ECs are simplex sentences consisting of an evidential expression that indicates the source of information, e.g., sensory perception or hearsay, and a proposition:
[xašavti]EVIDENTIAL [ʔata vepaʔulina tavoʔu]PROPOSITION
Since nonevidential verbs in a matrix clause, e.g., volitives and hortatives, do require an overt še before a complement clause, the lack of še is neither coincidental nor attributable to phonetic attrition. Rather, I contend that ECs’ syntactic formulation reflects the “demotion” of the matrix clause to a parenthetic evidential, alongside insubordination of the primary proposition. Certain evidential expressions exhibit typical grammaticalization indicators such as phonological reduction and semantic bleaching. Furthermore, a layering-oriented reconstruction suggests that the ECs have evolved from direct discourse forms. Finally, a comparison with a canonical evidential language, Tariana, reveals semantic similarities to Hebrew’s ECs, thereby supporting the proposed analysis.
description
View Paper
arrow_downward
Artificial Intelligence Meets e-Lexicography
by
Gilles-Maurice de Schryver
2026, eLexicography in the 21st century: New challenges, new applications
The future of lexicography is digital, so much is certain. Yet what that digital future will look like is far less certain. In the current paper, the Web takes centre stage, and the novel type of lexicography that is proposed revolves...
more
The future of lexicography is digital, so much is certain. Yet what that digital future will look like is far less certain. In the current paper, the Web takes centre stage, and the novel type of lexicography that is proposed revolves entirely around 'you'. What is needed is a dictionary that is truly adaptive — meaning that it will physically take on different forms in different situations, and one that would do so as intelligently as possible — meaning that it would have the ability to study and understand its user, and based on that to learn how to best present itself to that user. With this, the field has moved to a very different paradigm indeed, to that where artificial intelligence meets e-lexicography.
description
View Paper
arrow_downward
VAGUENESS AND GENDER COMPREHENSIBILITY IN ONLINE NEWS REPORTAGE
by
Olayemi Mahmud
2026, JABU International Journal of Humanities and Social Sciences Volume 1 Issue 1. 159-184.
The study sets out to find out if women understand and interpret vague language better than their male counterparts. To achieve this, the paper focused on Nigerian L2 users of English with attention on their gender. Using corpus...
more
The study sets out to find out if women understand and interpret vague language
better than their male counterparts. To achieve this, the paper focused on Nigerian
L2 users of English with attention on their gender. Using corpus linguistics
methodologies, the study preselected 20 hedging elements (10 approximators and
10 quantifiers). The preselected elements were identified and classified, while
examples of their usage in real-life situations were searched in a purpose-built
corpus of 200 online news reportage. Besides that, a 20-item questionnaire, using
Likert Rating Scale of five points from the corpus, was administered on 350
purposively selected Nigerian L2 English user-respondents through the Google
Forms. At the same time, one null hypothesis was formulated to test the
comprehensibility of the respondents’ responses. The test-item questionnaire was
also used to determine the respondents’ sociolinguistic competence of some
instances of the vague expressions in the corpus. Respondents’ responses were
subjected to inferential statistics of chi-square at 0.05 significant level. The study
discovered that there is a difference on comprehensibility, and sociolinguistic
competence of vague language on the basis of the gender of the respondents.
Keywords: approximators, corpus linguistics, hedging, news reportage, Nigerian
English L2 users, quantifiers, vagueness
description
View Paper
arrow_downward
Fundamental Approaches in Corpora Linguistics and Perspectives of This Field in Azerbaijani Linguistics
by
Könül Həbibova
2026, Terminology Issues
tədqiqatçıların dil korpusuna yanaşmaları müqayisə edilir və onların nəzəri-metodoloji əsasları təhlil olunur. Araşdırmada dil korpuslarının leksikoqrafiya, semantika, sintaksis, tərcümə prosesi, maşın tərcüməsi və nitq texnologiyaları...
more
tədqiqatçıların dil korpusuna yanaşmaları müqayisə edilir və onların nəzəri-metodoloji əsasları təhlil olunur. Araşdırmada dil korpuslarının leksikoqrafiya, semantika, sintaksis, tərcümə prosesi, maşın tərcüməsi və nitq texnologiyaları kimi sahələrdə rolu vurğulanır. Bununla yanaşı, Azərbaycan dilinin milli korpusunun yaradılması, bu sahədə aparılan tədqiqatlar və texnoloji imkanlar da müzakirə edilir. AMEA-nın İ.Nəsimi adına Dilçilik İnstitutunun Kompüter dilçiliyi şöbəsinin bu istiqamətdə apardığı tədqiqatlar, milli korpusun inkişafı üçün atılan addımlar haqqında da məqalədə məlumat verilir. Məqalə korpus dilçiliyinin təkcə kompüter dilçiliyinin bir qolu kimi deyil, həm də müstəqil empirik tədqiqat metodu kimi əhəmiyyətini vurğulayır. Nəticədə korpus dilçiliyinin filoloji tədqiqatlarla əlaqəsi, onun dilin semantik və sintaktik xüsusiyyətlərinin daha dərindən öyrənilməsinə töhfəsi və gələcək inkişaf perspektivləri araşdırılır. Bu tədqiqat dil korpuslarının təkcə elmi-linqvistik tədqiqatlar üçün deyil, həmçinin pedaqoji və texnoloji tətbiqlər üçün də mühüm əhəmiyyət daşıdığını göstərir. Açar sözlər: dil korpusu, korpus dilçiliyi, məlumat bazası, elektron lüğət, nitq texnologiyaları.
description
View Paper
arrow_downward
Functionality in Stemming: Processing Complex Verb Forms Considering Vowel Harmony (based on Azerbaijani language)
by
Könül Həbibova
2026, СБОРНИК МАТЕРИАЛОВ круглого стола «НАЦИОНАЛЬНЫЙ КОРПУС КАЗАХСКОГО ЯЗЫКА – ЛИНГВИСТИЧЕСКАЯ БАЗА ДЛЯ ОБУЧЕНИЯ БОЛЬШОЙ ЯЗЫКОВОЙ МОДЕЛИ (LLM)» в рамках регулярной научно-языковой площадки «Аскар Жубановские чтения-4»
This study examines the linguistic and algorithmic aspects of automatic analysis and stemming of verb word forms in the Azerbaijani language. The key sources of complexity are identified as agglutination, vowel harmony, and embedded...
more
This study examines the linguistic and algorithmic aspects of automatic analysis and stemming of verb word forms in the Azerbaijani language. The key sources of complexity are identified as agglutination, vowel harmony, and  embedded constructions (e.g., gәlmәyәcәkdilәr). A hybrid architecture is proposed:  a rule-based analysis modeling allomorphy and suffix order; stepwise “outside-in”  stemming; and contextual disambiguation using corpus data (transformer tagger).
Integration into the corpus enables lemmatized concordances, morphological search, and normalization for collocation statistics. Evaluation metrics and application scenarios (NER, parsing, indexing) are demonstrated. The advantages of the approach – accuracy, flexibility, and scalability – are emphasized, along with a roadmap for further development. Validation includes a gold subcorpus with genre stratification and an assessment of vowel harmony consistency in allomorph selection
description
View Paper
arrow_downward
The Courtesy of Uncertainty: An Analysis of Hedging in Academic English
by
Tahti A Korpela
2026
This paper investigates the use of hedging in the British Academic Spoken English (BASE) corpus by focusing on a selection of modal verbs and how they are employed in lectures versus seminars. At a broader sociocultural level, British...
more
This paper investigates the use of hedging in the British Academic Spoken English (BASE) corpus by focusing on a selection of modal verbs and how they are employed in lectures versus seminars. At a broader sociocultural level, British English tends to favour indirectness in professional interactions, while at an institutional level, hedging as a speech strategy serves a distinct epistemic function. These functions, though both frequently occurring in academic discourse, are analytically distinct and should not be conflated.
Hedging contributes towards successful academic discourse by addressing uncertainty politely, supporting professional decorum, and maintaining courtesy. By using hedging to introduce an element of tentativeness, academics are able to critically engage in discussions while maintaining epistemic caution. Using a corpus-driven approach, the BASE corpus was analysed for the modal verbs “might,” “may,” and “should.” Frequency data were normalised for each subcorpus (lectures and seminars) and sample concordance lines were extracted in order to analyse their roles in expressing uncertainty, possibility, or politeness. Findings indicated that lectures used hedges to express tentative possibility while maintaining authority and cautious speculation, while seminars reflected their collaborative nature by using hedges to soften statements, encourage participation, and signal polite recommendations or shared obligations.
The results highlight the role modal verbs play as hedging devices in academic discourse, reflecting their effectiveness in expressing tentativeness, facilitating collaborative dialogue and constructive critique, signalling caution, and ensuring that academic claims are presented with the appropriate humility. Hedging thus shows adaptability in academic settings, and by contributing to both epistemic precision and interpersonal politeness, is therefore essential for producing and maintaining productive academic discourse.
description
View Paper
arrow_downward
Gender fair strategies in job advertisements in Italy: a first update on the current situation
by
Anita Perra
2026, Linguistik Online
This research aims to examine the use of gender-fair language in job advertisements within the contemporary Italian job market. The study is based on a sample of 240 job announcements collected online in 2024, with results...
more
This research aims to examine the use of gender-fair language in job advertisements within the contemporary  Italian  job  market.  The  study  is  based  on  a  sample  of  240  job  announcements collected online in 2024, with results compared to those of previous research and analyzed using corpus linguistics tools. The findings will confirm the predominance of neutralizing strategies and masculine forms, while revealing a slight increase in the use of split forms and the emer-gence of non-binary language, such as neomorphemes. This trend underscores the persistence of linguistic asymmetries in job announcements: problematic gender-related linguistic strate-gies may contribute to maintaining access barriers to specific segments of the labor market for women and to the overall persistence of social and cultural stereotypes.
description
View Paper
arrow_downward
Güzel Sözcüğünün Derlem-temelli incelenmesi: Yayılan Ağ görünümleri A corpus-based study of the Turkish Word güzel: Radial network Occurrences
by
gulsum atasoy
2026, Derlem Dilbilim ve Sözcük Anlambilim
Bilişsel anlambilimin geliştirdiği yayılan ağ (radial network), prototip temelli sınıfsal yapıyı göstermesi bakımından değerlidir. Özellikle, yayılan ağ, çokanlamlılıktaki sözcük anlamların tipik olma durumuna göre yakınlık uzaklık...
more
Bilişsel anlambilimin geliştirdiği yayılan ağ (radial network), prototip temelli sınıfsal yapıyı göstermesi bakımından değerlidir. Özellikle, yayılan ağ, çokanlamlılıktaki sözcük anlamların tipik olma durumuna göre yakınlık uzaklık ilişkisini gösteren zengin bir kaynaktır (Vandenbergen, Marie ve Aijmer, 2007:19). Yayılan ağ, ilişkili ama ayırtedici sözcük anlamlarının anlamsal hafızada nasıl saklandığını gösteren bir modeldir ve kavramsal bir sınıfı temsil etmektedir (Lakoff, 1988). Bu çalışmanın amacı, Türkçede güzel sözcüğünün çokanlamlılığını ortaya koyarak doğal dil verisi aracılığıyla bu anlamların benzerliklerini ve farklarını betimlemektir. Bu doğrultuda, anlamsal sürekliliği incelenerek Türkçedeki güzel sözcüğünün tipik anlamından yan anlamına giden yayılan bir ağ oluşturulacaktır. Güzel sözcüğünün anlamlarına Türkçe Ulusal Derlemi demo sürümü kullanılarak ulaşılmıştır ve sadece sıfat kullanımları incelenmiştir. Ortaya çıkan sonuçlar, Ötüken Türkçe Sözlük (ÖTS) ve Türk Dil Kurumu (TDK) sözlüklerindeki güzel sözcüğünün tanımlarıyla karşılaştırılmıştır. Güzel sözcüğünü tanımlarken ÖTS 15 anlam, TDK 12 anlam vermektedir. Ancak bu anlamların bazıları birbirini kapsayan tanımlardır ve dilin ekonomi yasasına uygun olmadığı ve anlam farkları net olmadığı için anlamların takibi kolay değildir. Elde edilen bulgular sonucunda Türkçe güzel sözcüğünün çokanlamlılığını gösteren yayılan bir ağ oluşturulmuştur. Buna göre, merkezde olan anlamı en tipik anlamken merkezden uzaklaştıkça yan anlama gidilmektedir. Türkçede güzel olarak nitelendirdiğimiz şeyler üç temel anlamda toplanmaktadır. Bunlar, duyu, karakter ve başarı anlamlarıdır.
description
View Paper
arrow_downward
THE LANGUAGE OF LINGUISTICS CHAPTER OF NEW PARADIGM OF COMMUNICATION THE MISCOMMUNICATION TRILOGY "The Conspiracy of Speech, Vol. I." Part 6
by
Peter Ayolov
2026, THE LANGUAGE OF LINGUISTICS CHAPTER OF NEW PARADIGM OF COMMUNICATION THE MISCOMMUNICATION TRILOGY "The Conspiracy of Speech, Vol. I." Part 6
The extended version expands the original text to accommodate a significantly larger body of material that could not be contained within a first four parts, allowing the argument to unfold with greater depth, continuity, and conceptual...
more
The extended version expands the original text to accommodate a significantly larger body of material that could not be contained within a first four parts, allowing the argument to unfold with greater depth, continuity, and conceptual precision. It is written as a more comprehensive and elaborated version of the original, preserving its core structure while integrating additional analyses, examples, and theoretical developments that extend the scope of the work.
description
View Paper
arrow_downward
Yeung, K. K. A., & Hu, G. (2026). Direct and indirect data-driven learning: An experimental study of that-complementation. Journal of Second Language Writing, 72, Article 101301.
by
Guangwei Hu
2026, Journal of Second Language Writing
While previous studies have demonstrated the pedagogical utility of data-driven learning (DDL), little research has compared direct (computer-based) and indirect (paper-based) DDL in Englishas-a-second-language writing instruction. To...
more
While previous studies have demonstrated the pedagogical utility of data-driven learning (DDL), little research has compared direct (computer-based) and indirect (paper-based) DDL in Englishas-a-second-language writing instruction. To address this gap, this study examined the effectiveness of the two DDL approaches in improving first-year college students' knowledge and use of that-clauses in an English-for-academic-purposes course. Using a pre-post-delayed quasi-experimental design, two experimental groups received either direct or indirect DDL interventions, while a control group received traditional teacher-fronted instruction. Both DDL interventions led to short-term gains in the frequency of that-clause use. While the indirect DDL intervention was effective in sustaining such gains beyond the post-test, the gains for the direct DDL group disappeared on the delayed test. In terms of variety of use, only the direct DDL group showed improvement from the pre-test to the post-test and maintained this improvement on the delayed test. The direct DDL intervention was also somewhat more effective than the indirect DDL intervention in improving participants' variety scores on the post-test. Finally, neither the direct nor the indirect DDL intervention had any significant effect on the accuracy with which that-clauses were used. Implications for second language writing instruction and future DDL research are discussed.
description
View Paper
arrow_downward
Student Learning Engagement with Emerging Technologies in the EFL Classroom in China: A Case Study
by
Gurnam K A U R Sidhu
2026, Environment-Behaviour Proceedings Journal
This study investigated learner engagement with emerging technologies as a viable teaching and learning tool. This explorative study was conducted in a public university located in Sichuan Province, China involving 160 sophomore students....
more
This study investigated learner engagement with emerging technologies as a viable teaching and learning tool. This explorative study was conducted in a public university located in Sichuan Province, China involving 160 sophomore students. Data were collected via a questionnaire consisting of open and close-ended questions. The findings revealed that EFL students frequently used Tencent as the preferred platform for online distance learning. Besides that, the findings demonstrated students' positive perception of learner-to-learner, learner-to-instructor, and learner-to-content interaction through the support of emerging technologies. Thus, it implies that emerging technologies have the potential to boost a quality learning environment.
description
View Paper
arrow_downward
Deutsch V3 Diachron. Hauptsätze mit mehrfacher Vorfeldbesetzung in der Geschichte des Deutschen. Annotationshandbuch
by
Christopher Saure
2026
This manual describes the design of the database DV3D (German V3 Diachron). Besides general technical details and the structure of the database, the annotation rules of the used layers and the underlying mechanisms of analyses are...
more
This manual describes the design of the database DV3D (German V3 Diachron). Besides general technical details and the structure of the database, the annotation rules of the used layers and the underlying mechanisms of analyses are described herein. It is additionally provided as a separate document so that the manual can be cited individually.
description
View Paper
arrow_downward
Quand la terminologie revisite les archives pour la constitution d'un patrimoine (linguistique) vivant : vers une eco-terminologie
by
Laurent Gautier
2026, Analele Universității din Craiova. Seria Științe Filologice. Lingvistică.
This study presents an innovative methodological approach to terminology through the archives of the Grande Saline of Salinsles- Bains (Jura, France). By combining terminological analysis, historical sociolinguistics, and eco-terminology,...
more
This study presents an innovative methodological approach
to terminology through the archives of the Grande Saline of Salinsles-
Bains (Jura, France). By combining terminological analysis,
historical sociolinguistics, and eco-terminology, it highlights the
value of organizational archives for understanding linguistic,
cultural, and professional dynamics within a specialized milieu.
The exploration of an original corpus of written and oral data
reveals the diachronic construction of terms related to salt
production and its socio-economic environment. Linking memory,
heritage, and linguistic practice, the research examines how
language contributes to the transmission of technical and cultural
knowledge. It also outlines concrete perspectives for heritage and
museum valorization through collaboration between institutions
and researchers, situating terminology within a living,
interdisciplinary, and context-sensitive framework.
description
View Paper
arrow_downward
Albanian classical poetry in the Albanian National Corpus: Between linguistics and philology
by
Maria Morozova
2026, Studime albanologjike në indoeuropianistikë, filologji dhe gjuhësi kontakti. Vëllim në nderim të prof. Bardhyl Demirajt
The article discusses some important problems and their possible solutions that arise in connection with the representation of the Albanian classical poetry in the Albanian National Corpus, with a focus on literary Gheg texts. Different...
more
The article discusses some important problems and their possible solutions that arise in connection with the representation of the Albanian classical poetry in the Albanian National Corpus, with a focus on literary Gheg texts. Different lifetime editions of the literary work by two most influential Gheg Albanian poets of the beginning of the previous century, Ndre Mjedja (1866–1937) and Gjergj Fishta (1871–1940) are considered and compared, in order to show the variation between versions of the same text and propose solutions to the problem of choosing an optimal edition for the corpus.
description
View Paper
arrow_downward
Kontrastywna analiza językowa z wykorzystaniem modeli sztucznej inteligencji: porównanie języka niemieckiego i innych języków / Contrastive Linguistic Analysis Using Artificial Intelligence Models: A Comparison of German and Other Languages)
by
Linguistische Treffen in Wrocław
2026, Linguistische Treffen in Wrocław 28
With the rapid development of artificial intelligence (AI) and natural language processing (NLP), new opportunities are emerging in linguistic analysis, particularly in contrastive and comparative studies. Traditional methods of...
more
With the rapid development of artificial intelligence (AI) and natural language processing (NLP), new opportunities are emerging in linguistic analysis, particularly in contrastive and comparative studies. Traditional methods of linguistic analysis are time-consuming and resource-intensive, making AI a valuable tool for such research. This paper aims to present an innovative approach to contrastive analysis of the German language in comparison with other languages, utilizing the latest AI models, such as transformer-based models (e.g., BERT, GPT). The analysis includes a comparison of grammatical and semantic structures, as well as an attempt at large-scale automated identification of differences and similarities. Compared to previous research, the discussed approach stands out due to the use of advanced AI tools that enable faster and more precise identification of linguistic differences. The research objectives included comparing AI-based methods in contrastive linguistics and assessing their effectiveness. The analysis revealed that AI can successfully identify structural patterns in languages; however, it remains dependent on the quality of the provided data. Based on the study’s results, conclusions regarding the role of AI in linguistic research will be presented. A potential practical application of these findings is the acceleration of research processes and the enhancement of precision in analyzing linguistic differences and similarities.
description
View Paper
arrow_downward
Korpus dilçiliyi və konkordans: metodoloji yanaşmalar və yeni tətbiqlər
by
Könül Həbibova
2026, Dilçilik araşdırmaları
Bu məqalədə müasir korpus dilçiliyinin əsas anlayışları, xüsusilə konkordans və kollokasiya analizlərinin tətbiqi, elmi diskursda və poetik mətnlərin tədqiqində rolu araşdırılmışdır. Müəyyən edilmişdir ki, korpus dilçiliyinin inkişafı,...
more
Bu məqalədə müasir korpus dilçiliyinin əsas anlayışları, xüsusilə konkordans və kollokasiya analizlərinin tətbiqi, elmi diskursda və poetik mətnlərin tədqiqində rolu araşdırılmışdır. Müəyyən edilmişdir ki, korpus dilçiliyinin inkişafı, milli korpusların yaradılması, avtomatik mətn emalı və süni intellekt texnologiyalarının inteqrasiyası ilə birlikdə aparılmalıdır. Konkordans və kollokasiya metodlarının elmi diskursda, media və poetik mətnlərin tədqiqində geniş tətbiqi Azərbaycan dilçiliyində yeni metodoloji yanaşmaların inkişafına töhfə verə bilər.
description
View Paper
arrow_downward
A Corpus-Based Approach in Vocabulary Research: Defining the Word of the Year 2023 in Kazakh
by
Assel Ormanova
2026, Theory and Practice in Language Studies
The Word of the Year (WOTY) is an event held in various countries and regions to determine the most relevant, significant, and popular words and expressions that reflect not only the linguistic but also the socio-cultural aspects of the...
more
The Word of the Year (WOTY) is an event held in various countries and regions to determine the most relevant, significant, and popular words and expressions that reflect not only the linguistic but also the socio-cultural aspects of the country. This paper aims to identify the most frequently used words/phrases in Kazakh for 2023 to be nominated for the WOTY title. The research methods include media discourse analysis and quantitative analysis using a corpus-based approach. A computer program, #LancsBox 6.0, generated a dataset-a research corpus consisting of 500 texts published on Kazakh news platforms throughout 2023. The results indicated that: 1) the conjunction "jáne" [and] had the highest frequency and occurrence in the research corpus; 2) the extracted words with high frequency indicators might serve as candidates for WOTY 2023, such as "Kazakhstan", "jana" [new], "jyly" [year], "kerek" [need], "jumys [work]"; 3) WOTY "artificial intelligence" named by other global sources showed a high frequency indicator in Kazakh media texts. The study contributed with the generated corpus of media texts in Kazakh for 2023. The significance of our study is highlighted by the pioneering linguistic assessment in Kazakh language, which involves the analysis of media discourse publications based on corpus outcomes.
description
View Paper
arrow_downward
A Corpus Approach in Language Discovery: A Word Frequency Analysis Based on the Corpus Outcomes in Kazakh
by
Assel Ormanova
2026, Forum for Linguistic Studies
This study examines the most frequently used parts of speech and grammatical forms in the texts of the Sub-corpora of the National Corpus of the Kazakh Language (qazcorpora.kz). The frequency of word forms based on the 13-millionword...
more
This study examines the most frequently used parts of speech and grammatical forms in the texts of the Sub-corpora of the National Corpus of the Kazakh Language (qazcorpora.kz). The frequency of word forms based on the 13-millionword usages in the 2023 corpus database was collected and analyzed both manually and using the functional setting of the corpus software. The study provided key insights into Kazakh journalistic texts' frequency distribution, grammatical variability, and comparative patterns. The results indicated that: (1) conjunction 'žäne' [and], demonstrative pronoun 'bul' [this], auxiliary verb 'dep' [no translation], noun 'Kazakh' [Kazakh], modal verb 'žoq' [not], adjective 'aq' [white], adverb 'köp' [many/much], numeral 'eki' [two] showed the highest frequency indicators emphasizing their functional and stylistic roles in text construction in their word class. (2) functional words were the most frequently used part of speech. (3) conjunction 'žäne' [and], postposition 'üšın' [for] and particle 'ɣana' [only] possessed the highest frequency indicators among functional words. This corpus-based research highlights the alignment of Kazakh frequency patterns with global linguistic trends, such as Zipf's law, while also showcasing unique features attributed to the language's
description
View Paper
arrow_downward
Voz media y estructura argumental. Sintaxis histórica de la lengua española
by
Margot Vivanco
2026, Sintaxis histórica de la lengua española. Cuarta parte: estructura argumental y estructura informativa
Este capítulo explora la evolución diacrónica de las distintas construcciones de voz media desde el latín hasta nuestros días, con especial énfasis en el castellano medieval. Tras establecer los antecedentes latinos de las estructuras...
more
Este capítulo explora la evolución diacrónica de las distintas construcciones de voz media desde el latín hasta nuestros días, con especial énfasis en el castellano medieval. Tras establecer los antecedentes latinos de las estructuras estudiadas, se prodece en primer lugar a analizar los verbos plenos de cambio de estado, que se dividen en alternantes (alternancias anticausativa, supletiva y lábil) y no alternantes (los llamados "inacusativos puros"). Dentro de este grupo se estudian, además de los fenómenos más conocidos, otros como la expansión del 'se' medio a los verbos de movimiento, los usos causativos de verbos supuestamente no alternantes y otras estructuras propias del castellano medieval, hoy perdidas, como el giro con significado de cambio de estado. En segundo lugar, se analizan los verbos ligeros que configuran predicados complejos de cambio de estado (del tipo 'ponerse enfermo' o 'hacerse famoso') empezando por aquellos que se heredan del latín, especializados en cambios en propiedades de nivel individual, y continuando por la creación romance de nuevos verbos ligeros especializados en cambios en propiedades de nivel de estadio.
description
View Paper
arrow_downward
Avoidance of the English Present Perfect by L1 Thai Learners
by
Pichet PRAKAIANURAT
2026
This study investigated avoidance behavior in the use of the English present perfect among intermediate-level Thai learners of English, employing a mixed-methods approach. Two contrasting theoretical frameworks were explored: the...
more
This study investigated avoidance behavior in the use of the English present perfect among intermediate-level Thai learners of English, employing a mixed-methods approach. Two contrasting theoretical frameworks were explored: the Avoidance Behavior Hypothesis (ABH) (Laufer & Eliasson, 1993; Schachter, 1974), which attributes avoidance to L1-L2 differences or the absence of a corresponding L1 form, and the Factors of L2 Non-avoidance Hypothesis (FNAH) (Thiamtawan & Pongpairoj, 2013, 2019; Wang & Pongpairoj, 2021), which suggests that avoidance is not always observed in L2 learners due to some contributing factors. Thirty participants were recruited from a tertiary-level institution, with data collected through a comprehension task, an Indirect Preference Elicitation (IPE) task, and semi-structured interviews. The results indicated that the learners did not exhibit avoidance behavior with the continuative and resultative perfects. However, they showed avoidance specifically toward the experiential perfect, despite the presence of the aspectual marker kʰɤɤj in Thai, semantically encoding the English experiential perfect. The findings therefore confirmed FNAH. It was assumed that the Thai learners' (non-)avoidance behavior could be accounted for by intralingual transfer (Mahmoud, 2011), which stems from the syntax-semantics interface of the English present perfect, differences in L1-L2 semantic mappings, and transfer of training (Selinker, 1972).
description
View Paper
arrow_downward
Från aspekt till övergripande – en ordlista över svensk akademisk vokabulär
by
Håkan Jansson
2026, Nordiske Studier i Leksikografi
This report describes a project to develop an academic word list for Swedish. The resulting word list is published at . It comprises 655 headwords, extracted from a 25 million word corpus of Swedish academic texts. Both the word...
more
This report describes a project to develop an academic word list for Swedish. The resulting word list is published at . It comprises 655 headwords, extracted from a 25 million word corpus of Swedish academic texts. Both the word list and the corpus are openly accessible through Språkbanken's lexical and corpus infrastructures.
description
View Paper
arrow_downward
Nodirbek. Monografiya
by
Nodirbek N O S I R J O N O ' G ' L I Habibullayev
2026, Sunrise-pro
Monografiyada morfemaning funksional va grammatik jihatlari, ularning so‘z tarkibidagi o‘rni, turkumlanish tamoyillari, hamda o‘zbek tilida morfemaning mustaqil birlik sifatida mavjudligi masalalari ko‘rib chiqilgan. So‘zning tarkibiy...
more
Monografiyada morfemaning funksional va grammatik jihatlari,
ularning so‘z tarkibidagi o‘rni, turkumlanish tamoyillari, hamda o‘zbek
tilida morfemaning mustaqil birlik sifatida mavjudligi masalalari ko‘rib
chiqilgan. So‘zning tarkibiy tuzilishini aniqlash mezonlari muhokama
qilinib, morfema va affiks o‘rtasidagi farqlar, ularning morfologik tahlilga ta’siri o‘rganilgan. Shuningdek, o‘zbek tilida so‘z tarkibi lug‘atlarini yaratish muammolari tahlil qilinib, lug‘atlar tuzishning nazariy tamoyillari taklif etilgan.
Monografiya tilshunos olimlar, tilshunoslik ilmi bilan shug‘ullanuvchi tadqiqotchilar, oliy o‘quv yurtlari filologiya yo‘nalishida ta’lim
olayotgan talabalar va magistrantlar uchun mo‘ljallangan.
description
View Paper
arrow_downward
La reformulación discursiva y su aplicación didáctica en la enseñanza del alemán como lengua extranjera Discursive reformulation and its didactic application in teaching German as a foreign language
by
Bettina Kaminski
2026, Magazin
Como consecuencia, se ofrecen soluciones didácticas enfocadas a los procesos de reformulación en alemán y su aplicación a partir del trabajo con corpus orales.
description
View Paper
arrow_downward
Квантитативна лінгвістика в епоху генеративного ШІ: трансдисциплінарна парадигма університетського курс
by
Solomija Buk
2026, Вісник Львівського університету. Серія філологічна.
У статті досліджено трансформацію квантитативної лінгвістики як університетської дисципліни в контексті стрімкого розвитку генеративного штучного інтелекту (ШІ) та великих мовних моделей (ВММ). Обґрунтовано, що стохастична природа мови,...
more
У статті досліджено трансформацію квантитативної лінгвістики як університетської дисципліни в контексті стрімкого розвитку генеративного штучного інтелекту (ШІ) та великих мовних моделей (ВММ). Обґрунтовано, що стохастична природа мови, яка історично лежить в основі статистичних методів аналізу, стала ключовою передумовою для побудови сучасних генеративних моделей. У центрі увагипитання інтеграції корпусних технологій, алгоритмів машинного навчання та промптінжинірингу в освітній процес. Проаналізовано зміну епістемологічного статусу мовних даних, розширення дослідницьких практик та оновлення дидактичних стратегій. Особливу увагу закцентовано на трансдисциплінарності курсу, що перебуває на перетині прикладної лінгвістики, інформатики, статистики й когнітивних наук. Окреслено етичні виклики та ризики, пов'язані з автоматизацією генерації знань і використанням результатів роботи ШІ у навчальному середовищі. Ключові слова: прикладна лінгвістика, квантитативна лінгвістика, генеративний штучний інтелект (ШІ), великі мовні моделі (ВММ), корпусні технології.
description
View Paper
arrow_downward
CORPUS CONCEPT AND CORPUS LINGUISTICS ANALYSIS
by
Ibrohim Voxitov
2026
This article explores the concept of a corpus and its application in corpus linguistics analysis. A corpus, in linguistic terms, refers to a collection of authentic language texts gathered for linguistic study. Corpus linguistics utilizes...
more
This article explores the concept of a corpus and its application in corpus linguistics analysis. A corpus, in linguistic terms, refers to a collection of authentic language texts gathered for linguistic study. Corpus linguistics utilizes these corpora to extract valuable insights into language patterns, usage, and structures. The article discusses the importance of corpora in linguistic research and delves into the methodologies employed in corpus linguistics analysis. Key findings showcase how corpus linguistics contributes to our understanding of language evolution, usage variations, and contextual meaning. The conclusion emphasizes the significance of corpora in advancing linguistic studies and highlights potential future developments in this field.
description
View Paper
arrow_downward
THE DUAL FACES OF FEMININITY IN IRISH MYTHOLOGY: A CORPUS ANALYSIS OF FEMALE MYTHOLOGEMES
by
Indira Baissydyk
2026, Tiltanym, Almaty. – №2 (98). – P. 126-139
This study investigates the dualistic representations of femininity in Irish mythology through a corpus-based analysis of prominent female mythological figures. By categorizing female mythologemes according to positive and negative...
more
This study investigates the dualistic representations of femininity in Irish mythology through a corpus-based analysis of prominent female mythological figures. By categorizing female mythologemes according to positive and negative archetypes, the research examines their historical significance, linguistic evolution, and ongoing relevance within contemporary cultural discourse. Employing Google Ngram Viewer and Sketch Engine, the study quantitatively tracks the frequency and semantic shifts of negatively framed figures, such as the Banshee, Witch, Cailleach Beara, and Morrigan, in juxtaposition with their positively framed counterparts: Áine, the Sidhe, Étaín, and Airmid. The findings reveal a nuanced landscape wherein certain mythologemes have undergone reclamation and reinterpretation within feminist and neo-pagan contexts. In contrast, others remain culturally marginalized or relegated to niche spheres of influence. The study underscores the dynamic interplay between language, mythology, and gender identity, illuminating how mythological archetypes adapt to evolving cultural discourses and reflect shifting societal values.
description
View Paper
arrow_downward
CFPs special issue in Research in Corpus Linguistics (RiCL) 'Learner Corpus Research meets the Common European Framework of Reference for Languages and the Companion Volume'
by
María-Belen Diez-Bedmar
2026
description
View Paper
arrow_downward
THE INFLUENCE OF TEACHER-STUDENT POWER DYNAMICS ON LEARNING: A LINGUISTIC AND DISCOURSE-ANALYTICAL STUDY
by
Ayesha Batool
2026, Journal of Media Horizons
The paper explores the construction and maintenance of teacher-student power relations in the everyday classroom discourse with the help of Critical Discourse Analysis, Conversation and Interaction Analysis, Interactional...
more
The paper explores the construction and maintenance of teacher-student power relations in the everyday classroom discourse with the help of Critical Discourse Analysis, Conversation and Interaction Analysis, Interactional Sociolinguistics, and corpus-based linguistic methods. The method of data collection was also based on classroom video recordings, transcriptions, and corpus collections in order to explore the linguistic and interactional techniques, which indicate authority, control access, and student identity. The research results indicate that the teacher dominance can be found in the directive speech act, evaluative feedback, the IRF interactional pattern, and repetitive lexical clusters that support institutional control. Weakly informative, tentative, and seeking permission were the main character traits of the speech of students, which points to low agency and increased sensitivity to teacher judgment. The research also discovered that asymmetrical relation was also exacerbated by the tone changes, hesitations, and culturally inculcated code switching. Such findings are in line with previous studies that identify classroom discourse as a key process by which power is performed and learning chances are created. The conclusion of the study is that power is not reproduced out of individual commands but patterned linguistic practices that are embedded in classroom interaction. It suggests that more dialogic, student-based discourse practices should be adopted to encourage equal participation and create more collaborative learning processes.
description
View Paper
arrow_downward
Re ections on the "Bootcamp Debate"
by
Bill Louw
2026
Two drunks were disputing whether it was day or night. Finally, in a state of near paralysis and exhaustion, they decide to appeal to a passer-by to settle the matter: "Say, Buddie, can you tell us, is it day or night?" e stranger...
more
Two drunks were disputing whether it was day or night. Finally, in a state of near paralysis and exhaustion, they decide to appeal to a passer-by to settle the matter: "Say, Buddie, can you tell us, is it day or night?" e stranger replies: "I dunno. I'm noo here.
description
View Paper
arrow_downward
Do Large Language Models Encode Second-Language Writing Proficiency? A CALF-Based Perspective
by
Osamu Takeuchi
2026, Digital Studies in Language and Literature
This study investigates how Large Language Models (LLMs) encode second language (L2) writing proficiency distinctions compared to human learners, focusing on the structural alignment between synthetic outputs and human developmental...
more
This study investigates how Large Language Models (LLMs) encode second language (L2) writing proficiency distinctions compared to human learners, focusing on the structural alignment between synthetic outputs and human developmental patterns. We analyzed CEFR-graded Write & Improve 2024 learner essays and matched LLM generations (A2-C1) using statistical models controlling for prompt and text length. With a CALF feature set (Complexity, Accuracy, Lexical Complexity, Fluency), level explained 2.9 % of variance in human texts (R 2 = 0.029) but 10.6 % in pooled LLM outputs (R 2 = 0.106), indicating clearer A2-C1 features in LLM-generated texts overall. LLM level effects were robust (R 2 range = 0.038-0.454), with sharper differences between neighboring CEFR bands than in learner essays (e.g., A2-B1 distance, D 2 = 4.81 versus 1.07). Prompt-wise analyses found effects for all 10 prompts in the LLM set but only 2/10 in the learner set, with much greater separation than in the learner cohort. These results support CALF and CEFR construct validity: with prompt and length controlled, A2-C1 texts show ordered CALF gradients, especially for LLM texts across prompts. The sharper LLM separation likely reflects idealized, low-variance production; therefore, CALF captures core structural proficiency (syntax, lexis, accuracy, fluency) but not discourse-pragmatic qualities or human variability. The analyses provide level-calibrated evidence that LLM texts show clearer distinctions than learners', under identical prompts, whose development is gradual and overlapping. This positions LLMs as both tool (exemplars, rubric calibration) and challenge (assessment validity and fairness), while offering SLA researchers insight into how proficiency constructs are encoded in human versus model-based writing.
description
View Paper
arrow_downward
Expanding and Refining RoMEMEs: A Multimodal Corpus of Romanian Memes for Advanced AI Analysis
by
Daniela Gifu
2026, 2025 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)
This paper introduces an expanded and refined version of RoMEMEs, a multimodal corpus of Romanian memes collected from social media. Recognizing the limitations of the initial release, which contained 462 manually classified memes, this...
more
This paper introduces an expanded and refined version of RoMEMEs, a multimodal corpus of Romanian memes collected from social media. Recognizing the limitations of the initial release, which contained 462 manually classified memes, this work details the methodology employed for a significant expansion of the corpus and briefly addresses refinements to the annotation guidelines to account for special cases identified in prior annotations. The expanded RoMEMEs corpus aims to enhance the resources available for training and evaluating advanced natural language processing and multimodal analysis models tailored for the specific characteristics of Romanian language memes. This paper outlines the data collection and annotation processes, describes the key features of the resulting dataset, and discusses the challenges encountered during its creation. The significantly enlarged and curated corpus, along with the updated annotation guidelines, is made publicly available to the research community to facilitate further investigation into the linguistic and cultural nuances of Romanian internet memes.
description
View Paper
arrow_downward
EXPLORING THE INTEGRATION OF DIGITAL COMMUNICATION TOOLS IN LANGUAGE EDUCATION: A COMPREHENSIVE REVIEW
by
ShodhGyan-NU: Journal of Literature and Culture Studies
2026, ShodhGyan-NU: Journal of Literature and Culture Studies
The integration of digital communication tools is increasingly viewed as a means of enhancing language acquisition and fostering collaborative learning. This comprehensive review examines a selection of research papers investigating the...
more
The integration of digital communication tools is increasingly viewed as a means of enhancing language acquisition and fostering collaborative learning. This comprehensive review examines a selection of research papers investigating the implementation of these tools in language education contexts. The reviewed research collectively suggests that digital communication tools have the potential to increase interaction, engagement, and access to authentic language materials. The studies indicate that common tools include social media platforms, language learning apps, and video conferencing. These tools have been found to facilitate peer assessment, collaborative learning, and cross-cultural exchanges. However, it is crucial to acknowledge potential drawbacks such as unequal access to technology, varying learning styles, distractions, and data privacy concerns. The review emphasizes the importance of educator training and thoughtful integration strategies to fully leverage the benefits of digital communication tools. Future research exploring the optimal balance between technology-driven practices and human interaction would further illuminate their value in the language learning process.
description
View Paper
arrow_downward
The Secret to Legal Foretelling: Generic and Inter-Generic Aspects of Vagueness in Contracts, Patents and Regulations
by
Ismael Arinas Pellón
2026, DOAJ (DOAJ: Directory of Open Access Journals)
In this genre analysis research paper, we compare U.S. patents, contracts, and regulations on technical matters with a focus upon the relation between vagueness and communicative purposes and subpurposes of these three genres. Our main...
more
In this genre analysis research paper, we compare U.S. patents, contracts, and regulations on technical matters with a focus upon the relation between vagueness and communicative purposes and subpurposes of these three genres. Our main interest is the investigation of intergeneric conventions across the three genres, based on the software analysis of three corpora (one for each genre, 1 million words per corpus). The result of the investigation is that intergeneric conventions are found at the level of types of expressed linguistic vagueness, but that intergeneric conventions at the level of actual formulations are rare. The conclusion is that at this latter level the influence from the situation type underlying the individual genre is more important than the overarching legal character of the genres, when we talk about introducing explicit vagueness in the text.
description
View Paper
arrow_downward
Morfología: del léxico a la sintaxis oracional
by
Nora Múgica
2026
La teoría que se utiliza para este propósito es el Lexicón Generativo, que permite captar los aspectos composicionales del significado léxico en varios niveles (estructura eventiva, estructura argumental y estructura de qualia). Se...
more
La teoría que se utiliza para este propósito es el Lexicón Generativo, que permite captar los aspectos composicionales del significado léxico en varios niveles (estructura eventiva, estructura argumental y estructura de qualia). Se propone que los afijos poseen un significado léxico infraespecificado y relacional, y que este se concretiza dependiendo del significado de la base de la derivación y del contexto sintáctico en el que se inserta el derivado. Se demuestra que las variaciones del significado del sufijo causativo -iza(r) se deben a que diferentes roles de la estructura de qualia del tema del predicado se someten al ligamiento selectivo.
description
View Paper
arrow_downward
Język w Poznaniu 11
by
Władysław Zabrocki
2026
Przewodniczący: prof. UAM dr hab. Dominika Skrzypek, prodziekan ds. nauki Wice-przewodniczący: prof. UAM dr hab. Marta Woźnicka Członkowie: prof. UAM dr hab. Sylwia Adamczak-Krysztofowicz prof. UAM dr hab. Barbara Łuczak prof. zw. dr hab....
more
Przewodniczący: prof. UAM dr hab. Dominika Skrzypek, prodziekan ds. nauki Wice-przewodniczący: prof. UAM dr hab. Marta Woźnicka Członkowie: prof. UAM dr hab. Sylwia Adamczak-Krysztofowicz prof. UAM dr hab. Barbara Łuczak prof. zw. dr hab. Piotr Muchowski prof. UAM dr hab. Wawrzyniec Popiel-Machnicki prof. UAM dr hab. Krzysztof Stroński prof. UAM dr hab. Janusz Taborek prof. UAM dr hab. Władysław Zabrocki Korekta językowa: prof. dr hab. Izolda Kiec mgr Martin Stosik dr Stefan Wiertlewski Szymon Czarnecki Uniwersytet im. Adama Mickiewicza w Poznaniu The formatives in the denominal word-formation in Romanian The paper describes the types of formatives that occur in the denominal word-formation in Romanian on the basis of a sample containing the 60 most frequently used unmotivated nouns (20 for each gender class) and over 1,000 derivatives. The sample was created on the grounds of the nest derivatology methodology developed in Slavistics. The author discusses also the distribution and complexity of the types of formatives: according to the gender class of the word-formation basis, the structure of the nests and the grammatical class of derivatives. The author identified 18 types of formatives in the sample: 4 simple formatives (incl. suffixes, prefixes and paradigmatic formatives among the most frequent) and 14 complex formatives (incl. suffixal-reductional formatives as the most frequent, esp. in formation of verbs). The author identifies Romanian se formative in the motivated reflexiva tantum verbs as an ambifix with extrafixal positions. The nests with feminine nouns as word-formation bases have the most diversified range of types of formatives. While suffixes are the most common formatives in the sample, they predominate mainly in the first-level derivatives. The further the position of a derivative in the word-formation chain is, the more frequent the formatives with paradigmatic component are, while the occurrence of affixal formatives declines. The paradigmatic formative is also the most common formative in derivation of adverbs.
description
View Paper
arrow_downward
Los pronombres y determinantes demostrativos aqueste y aquese en la primera modernidad temprana (1480-1649): frecuencia y usos en corpus epistolares
by
María Heredia Mantis
2026, Onomázein. Revista de lingüística, filología y traducción
En este estudio analizamos la presencia, frecuencia y funciones semántico-pragmáticas de los pronombres y determinantes demostrativos compuestos o reforzados aqueste y aquese en documentos epistolares datados entre 1480 y 1649,...
more
En este estudio analizamos la presencia, frecuencia y funciones semántico-pragmáticas de los pronombres y determinantes demostrativos compuestos o reforzados aqueste y aquese en documentos epistolares datados entre 1480 y 1649, temporalidad en la que se enmarca el español clásico. Exponemos la marcación diatópica como variantes orientales que habían recibido en investigaciones previas y, como ya se había hipotetizado, la adscripción de estas formas a ciertos usos gramaticales y pragmáticos, así como a ciertos géneros textuales. A través de los datos recabados en diversos corpus digitales, entre ellos el corpus epistolar H15Corpus, mostramos la evolución en sus frecuencias de uso, tomando forma de curva de cambio fallido, así como su progresiva adscripción a tradicionalidades discursivas distintas de la epistolar, siguiendo una ruta escalonada en el tiempo.
description
View Paper
arrow_downward
La interpretación estativa de la percepción visual desde un punto de vista tipológico
by
Carmen Horno-Chéliz
2026, El valor de la diversidad (meta)lingüística: Actas del VIII congreso de Lingüística General, 2008, ISBN 978-84-691-4124-3, pág. 55
En esta comunicación nos interesan las oraciones en las que la percepción visual aparece como una propiedad del objeto (del tipo de Juan se ve bien o de John looks ok), de tal modo que éste aparece como sujeto de un verbo estativo. Desde...
more
En esta comunicación nos interesan las oraciones en las que la percepción visual aparece como una propiedad del objeto (del tipo de Juan se ve bien o de John looks ok), de tal modo que éste aparece como sujeto de un verbo estativo. Desde un punto de vista tipológico, las lenguas presentan diversos recursos (incluida la supleción) para conseguir esta lectura. De entre todos ellos, en esta ocasión nos vamos a centrar en aquellos mecanismos morfo-sintácticos que implican una reducción en el número de argumentos del predicado y una promoción del tema (del objeto percibido) a la posición de sujeto. En concreto, distinguiremos entre mecanismos sintácticos, como el uso de una pasiva especial -no eventiva-(por ejemplo, del japonés), y mecanismos léxicos (la construcción medio-pasiva), con o sin la utilización de morfemas específicos.
description
View Paper
arrow_downward
Teaching and assessing oral skills in the advent of oral language testing in the Finnish Matriculation Examination
by
Eliisa Kemiläinen
2026, Helsingin yliopisto
description
View Paper
arrow_downward
Análisis sintáctico de metáforas conceptuales a partir de la gramática de dependencias y la estructura argumental del verbo
by
Víctor Julián Vallejo
2026, Lenguaje
Syntactic Analysis of Conceptual Metaphors from the Dependency Grammar and the Argument Structure of the Verb This article explores the relationships between conceptual metaphors and syntax, based on the analysis of 415 metaphorical...
more
Syntactic Analysis of Conceptual Metaphors from the Dependency Grammar and the Argument Structure of the Verb This article explores the relationships between conceptual metaphors and syntax, based on the analysis of 415 metaphorical expressions present in 353 statements taken from two reference works published by experts in the field of cognitive linguistics. For this, the syntactic categories and functions were identified from the dependency grammar and the argument structure model, using natural language processing tools. Subsequently, descriptive, and inferential statistics were applied to determine the frequencies and associations between the identified categories, the verbal valence variables, the argument structure of the verb and the grammatical person. The results obtained illustrate the importance of the verb within the configuration of the grammatical structures of metaphorical expressions. In this sense, the approach based on verbal valence and that of argument structure can be taken as a basis in the elaboration of a theoretical-practical model for the automated detection of conceptual metaphors.
description
View Paper
arrow_downward
SCRIPTING THE SCREEN: MULTIDIMENSIONAL VARIATION IN URDU-TO-ENGLISH SUBTITLES -A CASE STUDY OF 'AAS PAAS'
by
Arslan Tahir
2026, JOURNAL OF APPLIED LINGUISTICS AND TESOL
The exponential increase in the media consumption of South Asia on international digital platforms has led to a critical need to understand the linguistic mechanisms that are indicative of cross-cultural transmission of narratives. In...
more
The exponential increase in the media consumption of South Asia on international digital platforms has led to a critical need to understand the linguistic mechanisms that are indicative of cross-cultural transmission of narratives. In response to this, this study is a comprehensive Multi-Dimensional (MD) Analysis of the English subtitles of the Pakistani Urdu Drama, Aas Paas (2025). The aim is to outline the lexico-grammatical profile of this particular audiovisual translation register. A Python-based corpus analysis tool was used to create a corpus of 148,288 words consisting of the 32-episode files, then computational feature extraction and factor analysis were used. Drawing upon Biber's (1988) seminal framework for register variation, the research identifies five functional dimensions of variation in the subtitle corpus: (1) syntactic complexity versus simplified orality; (2) informational density versus interactive inquiry; (3) stylistic elaboration versus fragmented coordination; (4) descriptive density; and (5) lexical sophistication. The results expose a "hybrid register" that structurally imitates the syntactic density of written fiction but pragmatically retains the interactive volatility of face-to-face conversation. Specifically, the analysis reveals a high level of nominalisation and verbal complexity in segments with narrative-heavy style-probably an artefact of explicating the translation-and pronounheavy, fragmented discourse in scenes of conflict between Pakistanis, in line with Pakistani English norms. By plotting these dimensions against the story's development of the drama, the research shows how linguistic variation is a proxy for thematic changes between the professional and domestic spheres that are intrinsic to the plot. The research is essential to corpus linguistics, translation studies, and South Asian media studies, as it provides a quantitative multidimensional profile of the Urdu-English subtitles register.
description
View Paper
arrow_downward
WhatsApp en el foco del análisis lingüístico: aproximaciones desde la lingüística alemana y española
by
Bettina Kaminski
2026
This article presents a literature review that offers a comparative analysis of scientific research on WhatsApp digital discourse, drawing on linguistic traditions developed in Spain and Germany. The primary objective is to identify...
more
This article presents a literature review that offers a comparative analysis of
scientific research on WhatsApp digital discourse, drawing on linguistic traditions developed in Spain and Germany. The primary objective is to identify
similarities and differences in each tradition’s methodological approaches and
objects of analysis. The study examines how each framework conceptualizes
the medium, the types of discourse phenomena addressed, and the nature of
the corpora used. In addition to mapping and systematizing over a decade of
research – much of which remains under-recognized or untranslated across
academic contexts – the article also aims to outline implications for future
investigations.
description
View Paper
arrow_downward
Exploring digital genre analysis in LSP. A key thematic lemma-based approach
by
Alejandro Curado Fuentes
2026, Digital genres for academic and professional communication. Mapping research and practice
Evolving methodologies in genre studies have been reported in applied linguistics by different authors (e.g., Kessler and Polio, 2024), marking out LSP as an area where di!erent analytical instruments are deployed according to dynamic...
more
Evolving methodologies in genre studies have been reported
in applied linguistics by different authors (e.g., Kessler and Polio, 2024), marking out LSP as an area where di!erent analytical instruments are deployed according to dynamic communicative practices. However, to the best of my knowledge, there has been no systematic exploration (methodologically speaking) of the abovementioned changes and developments. This chapter describes a methodological approach, the key lemma approach, which is applied to the analysis of research articles about digital genres in the field of LSP to provide empirically tested evidence of major research developments. This framework employs a dual methodological approach, combining a quantitative focus on key lemmas and key lemma-based collocational items with a qualitative analysis of salient themes. The proposed methodology was applied to a case study that focuses specifically on articles about digital genres published in the journal Ibérica over a ten-year period (2012–2022). To gain a more systematic understanding, the two corpora were compared with a large corpus of research articles about LSP topics di!erent from digital genres.
description
View Paper
arrow_downward
Hebrew as L4+ Learner Mistakes Corpus
by
Karolina Bieganowska
2026, RODBUK JU
The "Hebrew as L4+ learner mistakes corpus" is a specialized linguistic dataset containing 753 documented errors collected from multilingual students acquiring Hebrew during a one-semester span. The dataset is provided in three formats:...
more
The "Hebrew as L4+ learner mistakes corpus" is a specialized linguistic dataset containing 753 documented errors collected from multilingual students acquiring Hebrew during a one-semester span. The dataset is provided in three formats: the original .xlsx file and two open-access versions (.csv and .ods). The collection is unique because it focuses on "L4+ learners"-subjects who already know Polish, English, and Arabic, and are learning Hebrew as their fourth or subsequent language. The data is organized into a single table documenting the mistaken form (in IPA), the target form (in IPA, Hebrew script and with Leipzig Glossing), English translations, and categorical metadata such as the learner's level (Year 1 vs. Year 2), the specific linguistic skill (speaking/reading), and the error type (phonology, syntax, morphology, lexis, or mixed). This dataset is particularly valuable for researchers studying cross-linguistic influence (CLI) and the acquisition of Semitic languages by multilinguals.
description
View Paper
arrow_downward
SCRICREA A Learner Corpus of Creative Writing in Italian as a Foreign Language_Florou Tyrou
by
Ioanna Tyrou
and
1 more
Katerina Florou
2026, International Journal of Literature, Linguistics, and Humanities
SCRICREA (from "Scrittura Creativa") is a learner corpus of creative writing in Italian as a foreign language, developed at the National and Kapodistrian University of Athens. It consists of texts written by Greek-speaking adult learners...
more
SCRICREA (from "Scrittura Creativa") is a learner corpus of creative writing in Italian as a foreign language, developed at the National and Kapodistrian University of Athens. It consists of texts written by Greek-speaking adult learners of Italian in the context of structured creative writing activities. The corpus, comprising over one million words, is organized into thematic sub-corpora, each corresponding to a specific writing task or genre. Its interdisciplinary nature positions SCRICREA both as a tool for linguistic research and as a pedagogical resource for encouraging creativity and language proficiency. This paper presents the motivation behind SCRICREA, its composition, annotation and enrichment strategies, as well as its applications in language pedagogy and future directions.
description
View Paper
arrow_downward
EL NO MBRE DE LA RO SA.
by
LUIS VARGAS
2026
ROSA
description
View Paper
arrow_downward
The Mangalam Dictionary of Buddhist Sanskrit: automating lexicographic data with generative LLMs
by
ligeia lugli
2026, The Mangalam Dictionary of Buddhist Sanskrit: automating lexicographic data with generative LLMs
This paper reports on recent experiments on the use of Large Language Models (LLMs) for semantically tagging a corpus of Buddhist Sanskrit literature dating approximately from the II century BC to the XII century CE...
more
This paper reports on recent experiments on the use of Large Language Models (LLMs) for semantically tagging a corpus of Buddhist Sanskrit literature dating approximately from the II century BC to the XII century CE (bit.ly/MangalamCorpusOfBuddhist-Sanskrit). This corpus was created specifically for lexicographic purposes, with a view to enable the development of the Mangalam Dictionary of Buddhist Sanskrit, the first corpus-driven dictionary for this language variety (bit.ly/VisualDictionary-BuddhistSanskrit and mangalamresearch.shinyapps.io/MangalamDictionaryOfBud-dhistSanskrit). 'Buddhist Sanskrit' is intended here as the domain-specific type of Sanskrit attested in historical Buddhist sources. This differs from classical Sanskrit mainly in vocabulary and semantics, but often also in syntax and morphology. Since Buddhist Sanskrit constitutes an extremely low-resource language, we have designed our lexicographic workflow to maximize the re-usability output for linguistic and natural language
description
View Paper
arrow_downward
Remembering Michael Hoey's Work
by
Alan Partington
2026, Journal of corpora and discourse studies
description
View Paper
arrow_downward
Next
Last »
Download research papers for
free
Join us
arrow_forward
Explore
Papers
Topics
Features
Mentions
Analytics
PDF Packages
Advanced Search
Search Alerts
Journals
Academia.edu Journals
My submissions
Reviewer Hub
Why publish with us
Testimonials
Company
About
Careers
Press
Content Policy
580 California St., Suite 400
San Francisco, CA, 94104