Papers by Matt Gee

Conversation Analysis and the XML method
In this paper we introduce the XML method, a trio of technologies that can benefit conversation-a... more In this paper we introduce the XML method, a trio of technologies that can benefit conversation-analytic research. Specifically, we make a case for converting the center piece of CA research, the Jeffersonian transcript, into the format of the eXtensible Mark-up Language (XML). XML essentially turns documents into hierarchically ordered networks of nodes. As a network, an XML document can be exhaustively searched and any node or node set it contains can be extracted. We argue that the main benefit of formatting CA transcriptions in XML lies in the quantifiability that the format facilitates: CA-as-XML can provide precise “numbers and statistics” (Robinson 2007: 65) thus helping to efficiently quantify observations and statistically substantiate claims about the ‘generalizability’ of observed practices of social action. We also introduce XPath and XQuery, two related query languages designed to exploit the XML format. Further, we describe XTranscript, a free online tool developed to convert completed CA transcripts to XML. Central to our approach is that the methodology be accessible to linguistics of varying levels of technical experience. Therefore, we also describe how this, and common concerns relating to the treatment of spoken data, have shaped our work in this area thus far.
'I just found your blog'. The Pragmatics of initiating comments on blog posts
Highlights
• Focus on the pragmatic means used at the onset of blog comments
• Insights into the ... more Highlights
• Focus on the pragmatic means used at the onset of blog comments
• Insights into the interaction between commenters and blog authors
• Study based on the Birmingham Blog Corpus http://www.webcorp.org.uk/blogs
• Introducing a new approach to the study of speech acts in large corpora
• Uncovering medium-specific uses of expressive speech acts in blog comments
This study approaches the pragmatic annotation of a large corpus of blog posts and associated rea... more This study approaches the pragmatic annotation of a large corpus of blog posts and associated reader comments by focusing in particular on the tagging of Illocutionary Force Indicating Devices (IFIDs). Our study is based on the Birmingham Blog Corpus, a diachronically-structured corpus totalling 600 million words. Our work focuses in particular on IFIDs relating to the speech act category of expressives (Searle 1979: 15-16), which convey the speaker's feelings, as for example, by thanking, praising, or apologising.

A. Renouf & A. Kehoe (eds.) Corpus Linguistics: Refinements and Reassessments, Amsterdam: Rodopi., 2009
"This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We i... more "This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We introduce the diachronic search facilities provided by the WebCorp Linguist’s Search Engine, including the use of a new ‘heat map’ graph for the analysis of changes in collocational patterns over time. We illustrate how web data can be used to supplement data from standard corpora in lexicological studies. Our focus is on the vogue phrase credit crunch and the paper compares examples from standard corpora (BNC, Brown, LOB, Frown, FLOB) with those found in web-accessible newspaper texts. In contrast to previous studies, we do not rely on the web solely for the most up-to-date usage examples. Instead, we show how web-accessible texts dating back to the beginning of the 20th Century can be used to fill gaps in and sharpen the picture provided by standard corpora."

Social Tagging: A new perspective on textual 'aboutness'
P. Rayson, S. Hoffmann & G. Leech (eds.) Studies in Variation, Contacts and Change in English Volume 6: Methodological and Historical Dimensions of Corpus Linguistics, University of Helsinki e-journal., 2011
Society is increasingly dependent on digital information. Much of this is available online free o... more Society is increasingly dependent on digital information. Much of this is available online free of charge but metadata is at a premium. This has encouraged the emergence of a new online phenomenon known as social (or collaborative) tagging. The predominant social tagging site is Delicious, which allows users to assign keywords (or ‘tags’) to their bookmarks (favourite web pages) to describe their content. These tags are then shared with other users, who can search the collection by tag. However, many of the linguistic problems which exist in traditional keyword search remain. Most research on tagging to date has been conducted by information scientists, but this paper describes new work which is examining social tagging from a corpus linguistic perspective. Our discussion compares the new, text-external aboutness indicators offered by social tagging with text-internal aboutness indicators. We illustrate how we are using this multi-layered approach to aboutness both to make better sense of the existing social tagging and to suggest guidelines for better tagging practice. Our work aims to reconcile the worlds of formal textual analysis and intuition.
Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus
S. Oksefjell Ebeling, J. Ebeling & H. Hasselgård (eds.) Studies in Variation, Contacts and Change in English Volume 12: Aspects of Corpus Linguistics: Compilation, Annotation, Analysis, 2012
An increasing number of researchers are using corpus linguistic techniques in the study of litera... more An increasing number of researchers are using corpus linguistic techniques in the study of literary texts. In recent years, the corpus stylistic approach has been used to analyse the works of Austen (Fischer-Starcke 2009), Dickens (Hori 2004, Mahlberg forthcoming), and Shakespeare (Ravassat & Culpeper 2011), amongst many others.
Despite the growth in corpus stylistics, there remains some resistance to seemingly abstract, ‘mathematical’ models within the wider field of literary studies. In the teaching of English Literature, the dominant approach is still ‘close reading’: the detailed manual examination and interpretation of short textual extracts. This paper introduces eMargin, an online tool for the collaborative analysis and annotation of literary texts.
eMargin: A Collaborative Textual Annotation Tool
Ariadne: the web magazine for information professionals, Jul 2013
We describe our Jisc-funded eMargin collaborative textual annotation tool, showing how it has wi... more We describe our Jisc-funded eMargin collaborative textual annotation tool, showing how it has widened its focus through integration with Virtual Learning Environments.
eMargin: A Collaborative Textual Annotation Tool
Reader comments as an aboutness indicator in online texts: introducing the Birmingham Blog Corpus
Social tagging: A new perspective on textual ‘aboutness’
New corpora from the web: making web text more 'text-like'
This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We in... more This paper offers a reassessment of the role of web data in diachronic linguistic analysis. We introduce the diachronic search facilities provided by the WebCorp Linguist's Search Engine, including the use of a new 'heat map' graph for the analysis of changes in collocational patterns over time. We illustrate how web data can be used to supplement data from standard corpora in lexicological studies. Our focus is on the vogue phrase credit crunch and the paper compares examples from standard corpora (BNC, Brown, LOB, Frown, FLOB) with those found in web-accessible newspaper texts. Contrary to previous studies, we do not rely on the web solely for the most up-to-date usage examples. Instead, we show how web-accessible texts dating back to the beginning of the 20 th Century can be used to fill gaps in and sharpen the picture provided by standard corpora. 1.
Software by Matt Gee
eMargin - An online tool for collaborative textual annotation
WebCorp Linguist's Search Engine
WebCorp Live - Concordance the web in real-time
Talks by Matt Gee
The analysis of online interaction has been the topic of a variety of psychological and linguisti... more The analysis of online interaction has been the topic of a variety of psychological and linguistic research for the past two decades.
Conference presentations by Matt Gee
Uploads
Papers by Matt Gee
• Focus on the pragmatic means used at the onset of blog comments
• Insights into the interaction between commenters and blog authors
• Study based on the Birmingham Blog Corpus http://www.webcorp.org.uk/blogs
• Introducing a new approach to the study of speech acts in large corpora
• Uncovering medium-specific uses of expressive speech acts in blog comments
Despite the growth in corpus stylistics, there remains some resistance to seemingly abstract, ‘mathematical’ models within the wider field of literary studies. In the teaching of English Literature, the dominant approach is still ‘close reading’: the detailed manual examination and interpretation of short textual extracts. This paper introduces eMargin, an online tool for the collaborative analysis and annotation of literary texts.
Software by Matt Gee
Talks by Matt Gee
Conference presentations by Matt Gee