Matt Gee - Birmingham City University

Matt Gee

Birmingham City University, English, Research Fellow

Followers

Following

Co-authors

Public Views

Researcher and software developer in the R&D Unit for English Studies at Birmingham City University.
Address: United Kingdom

less

Interests

Uploads

Papers by Matt Gee

Conversation Analysis and the XML method

In this paper we introduce the XML method, a trio of technologies that can benefit conversation-a... more In this paper we introduce the XML method, a trio of technologies that can benefit conversation-analytic research. Specifically, we make a case for converting the center piece of CA research, the Jeffersonian transcript, into the format of the eXtensible Mark-up Language (XML). XML essentially turns documents into hierarchically ordered networks of nodes. As a network, an XML document can be exhaustively searched and any node or node set it contains can be extracted. We argue that the main benefit of formatting CA transcriptions in XML lies in the quantifiability that the format facilitates: CA-as-XML can provide precise “numbers and statistics” (Robinson 2007: 65) thus helping to efficiently quantify observations and statistically substantiate claims about the ‘generalizability’ of observed practices of social action. We also introduce XPath and XQuery, two related query languages designed to exploit the XML format. Further, we describe XTranscript, a free online tool developed to convert completed CA transcripts to XML. Central to our approach is that the methodology be accessible to linguistics of varying levels of technical experience. Therefore, we also describe how this, and common concerns relating to the treatment of spoken data, have shaped our work in this area thus far.

'I just found your blog'. The Pragmatics of initiating comments on blog posts

Highlights • Focus on the pragmatic means used at the onset of blog comments • Insights into the ... more Highlights
• Focus on the pragmatic means used at the onset of blog comments
• Insights into the interaction between commenters and blog authors
• Study based on the Birmingham Blog Corpus http://www.webcorp.org.uk/blogs
• Introducing a new approach to the study of speech acts in large corpora
• Uncovering medium-specific uses of expressive speech acts in blog comments

“I apologise for my poor blogging” Pragmatic annotation in the Birmingham Blog Corpus

by Andrew Kehoe and Matt Gee

This study approaches the pragmatic annotation of a large corpus of blog posts and associated rea... more This study approaches the pragmatic annotation of a large corpus of blog posts and associated reader comments by focusing in particular on the tagging of Illocutionary Force Indicating Devices (IFIDs). Our study is based on the Birmingham Blog Corpus, a diachronically-structured corpus totalling 600 million words. Our work focuses in particular on IFIDs relating to the speech act category of expressives (Searle 1979: 15-16), which convey the speaker's feelings, as for example, by thanking, praising, or apologising.

Download

Weaving web data into a diachronic corpus patchwork

by Matt Gee and Andrew Kehoe

A. Renouf & A. Kehoe (eds.) Corpus Linguistics: Refinements and Reassessments, Amsterdam: Rodopi., 2009

"This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We i... more "This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We introduce the diachronic search facilities provided by the WebCorp Linguist’s Search Engine, including the use of a new ‘heat map’ graph for the analysis of changes in collocational patterns over time. We illustrate how web data can be used to supplement data from standard corpora in lexicological studies. Our focus is on the vogue phrase credit crunch and the paper compares examples from standard corpora (BNC, Brown, LOB, Frown, FLOB) with those found in web-accessible newspaper texts. In contrast to previous studies, we do not rely on the web solely for the most up-to-date usage examples. Instead, we show how web-accessible texts dating back to the beginning of the 20th Century can be used to fill gaps in and sharpen the picture provided by standard corpora."

format_quoteWebCorpLSE traces 'credit crunch' across 25 years of UK broadsheets, revealing significant insights into its lexical development.format_quote

Download

Social Tagging: A new perspective on textual 'aboutness'

by Matt Gee and Andrew Kehoe

P. Rayson, S. Hoffmann & G. Leech (eds.) Studies in Variation, Contacts and Change in English Volume 6: Methodological and Historical Dimensions of Corpus Linguistics, University of Helsinki e-journal., 2011

Society is increasingly dependent on digital information. Much of this is available online free o... more Society is increasingly dependent on digital information. Much of this is available online free of charge but metadata is at a premium. This has encouraged the emergence of a new online phenomenon known as social (or collaborative) tagging. The predominant social tagging site is Delicious, which allows users to assign keywords (or ‘tags’) to their bookmarks (favourite web pages) to describe their content. These tags are then shared with other users, who can search the collection by tag. However, many of the linguistic problems which exist in traditional keyword search remain. Most research on tagging to date has been conducted by information scientists, but this paper describes new work which is examining social tagging from a corpus linguistic perspective. Our discussion compares the new, text-external aboutness indicators offered by social tagging with text-internal aboutness indicators. We illustrate how we are using this multi-layered approach to aboutness both to make better sense of the existing social tagging and to suggest guidelines for better tagging practice. Our work aims to reconcile the worlds of formal textual analysis and intuition.

Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus

by Matt Gee and Andrew Kehoe

S. Oksefjell Ebeling, J. Ebeling & H. Hasselgård (eds.) Studies in Variation, Contacts and Change in English Volume 12: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis, 2012

eMargin: A collaborative text annotation tool

by Matt Gee and Andrew Kehoe

An increasing number of researchers are using corpus linguistic techniques in the study of litera... more An increasing number of researchers are using corpus linguistic techniques in the study of literary texts. In recent years, the corpus stylistic approach has been used to analyse the works of Austen (Fischer-Starcke 2009), Dickens (Hori 2004, Mahlberg forthcoming), and Shakespeare (Ravassat & Culpeper 2011), amongst many others.

Despite the growth in corpus stylistics, there remains some resistance to seemingly abstract, ‘mathematical’ models within the wider field of literary studies. In the teaching of English Literature, the dominant approach is still ‘close reading’: the detailed manual examination and interpretation of short textual extracts. This paper introduces eMargin, an online tool for the collaborative analysis and annotation of literary texts.

format_quotePredictability hypotheses significantly influenced relativizer omission across nine languages, highlighting gender interaction effects with general processing explanations.format_quote

Download

eMargin: A Collaborative Textual Annotation Tool

by Matt Gee and Andrew Kehoe

Ariadne: the web magazine for information professionals, Jul 2013

We describe our Jisc-funded eMargin collaborative textual annotation tool, showing how it has wi... more

eMargin: A Collaborative Textual Annotation Tool

Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus

Social tagging: A new perspective on textual ‘aboutness’

New corpora from the web: making web text more 'text-like'

Weaving web data into a diachronic corpus patchwork

This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We in... more This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We introduce the diachronic search facilities provided by the WebCorp Linguist's Search Engine, including the use of a new 'heat map' graph for the analysis of changes in collocational patterns over time. We illustrate how web data can be used to supplement data from standard corpora in lexicological studies. Our focus is on the vogue phrase credit crunch and the paper compares examples from standard corpora (BNC, Brown, LOB, Frown, FLOB) with those found in web-accessible newspaper texts. Contrary to previous studies, we do not rely on the web solely for the most up-to-date usage examples. Instead, we show how web-accessible texts dating back to the beginning of the 20 th Century can be used to fill gaps in and sharpen the picture provided by standard corpora. 1.

Download