William W. Cohen
William W. Cohen
Professor,
Machine Learning Department
Bio
Announcements
Teaching
Projects
Publications
Recent
All
) |
Software
Datasets
Talks
Students & Colleagues
Contact Info
Biography
William Cohen is a Professor at Carnegie Mellon University in
the
Machine Learning Department
He also holds a 20%-time appointment as a Principal Scientist at Google, where he
worked full-time between May 2018 and March 2024. He received his
bachelor's degree in Computer Science from
Duke University
in 1984, and a PhD
in Computer Science from
Rutgers
University
in 1990. From 1990 to 2000 Dr. Cohen worked at
AT&T
Bell Labs
and
later
AT&T Labs-Research
and from April 2000 to May 2002 Dr. Cohen worked
at
Whizbang Labs
, a company
specializing in extracting information from the web. From 2002 to
2018, Dr. Cohen worked at Carnegie Mellon University in
the
Machine Learning Department
with a joint appointment in
the
Language Technology
Institute
Dr. Cohen is a past president of
the
International Machine
Learning Society
. In the past he has also served as an action
editor for the
the
AI
and Machine Learning
series of books published
by
Morgan Claypool
, for
the
journal
Machine
Learning
, the
journal
Artificial
Intelligence
, the
Journal of
Machine Learning Research
, and
the
Journal of Artificial
Intelligence Research
. He was General Chair for
the
2008 International
Machine Learning Conference
, held July 6-9 at
the
University of
Helsinki
in
Finland
Program Co-Chair of
the
2006
International Machine Learning Conference
; and Co-Chair of
the
1994
International Machine Learning Conference
. Dr. Cohen was also the
co-Chair for the
3rd
Int'l AAAI Conference on Weblogs and Social Media
, which was held
May 17-20, 2009 in San Jose, and was the co-Program Chair for
the
4rd Int'l AAAI
Conference on Weblogs and Social Media
. He is
AAAI
Fellow
, and was a winner of the 2008
the
SIGMOD
"Test of Time" Award
for the most influential SIGMOD paper of
1998, the
2014
SIGIR
"Test of Time" Award
for the most influential SIGIR paper of
2002-2004, and the 2023 Semantic Web Science
Association's
Ten-Year
Award
for the most influential paper of the ISWC-2013 conference.
Dr. Cohen's research interests include include question answering,
machine learning for NLP tasks, and neuro-symbolic reasoning, and he
has a long-standing interest in statistical relational learning. He
holds seven patents related to learning, discovery, information
retrieval, and data integration, and is the author of more than 300
publications.
Announcements and FAQs
May 2024: A new edition of
A Computer Scientist's Guide To
Biology
will be out later this summer! More information and an
excerpt is available from
my
co-author's
website
. I'm possibly biased but I
think
Charles Cohen
did a
great job with the update - the book is still quite compact, but
pretty much the whole book has been rewritten and updated. For
example the new version includes several new chapters on topics
like CRISPR which weren't even a thing back in 2007.
On a related note, here's a
nice non-technical
description of LLM hallucinations
written by Charlie
(based in part on an interview with Vidhisha Balachandran).
March 2024: As you can see from my updated bio above, I am have
returned to CMU's ML department full-time (although I still have a
20% involvement a Google, so that email will work!) I'm really
looking forward to re-engaging with my friends at colleagues at CMU.
Publications
My database:
recent
all
my
Scholar page
my
Arxiv page
The second edition
of
Computer Scientist's Guide To Cell Biology, 2nd Edition
is out,
with better graphics and many updates. (If you are an academic you can
probably get the ebook free from Springer.) The second edition is
still a compact guide to biology written from the perspective of
computer science, but includes new updated material, better graphics,
and punchier writing, thanks to
my
co-author
, a young science
writer who worked for many years doing hands-on wetlab biology.
Projects, Software, Datasets, and Talks
These are now being distributed from
my Github page.
Teaching
In Spring 2026:
10-718 Machine Learning in Practice
10-905 Speaking Skills
In Fall 2025:
10-605/10-805 Learning from Large Datasets
10-905 Speaking Skills
Past courses:
Spring 2025:
Machine Learning with Large Datasets
, 10-605/10-405, Mon-Wed GHC 4401.
Spring 2018:
Undergraduate Level Machine Learning with Large Datasets
, 10-405, Mon-Wed 3:30-4:20 in GHC 4307
Fall 2017:
Machine Learning with Large Datasets, 10-605 and 10-805
, Tues-Thus 1:30-2:50pm, PH 100.
Fall
2016:
Machine
Learning with Large Datasets, 10-605 and 10-805
, Tues-Thus
1:30-2:50pm, Wean Hall 7500.
Spring 2016:
Machine Learning 10-601
, Mon-Wed time 10:30-11:50am, GHC 4401, with Maria-Florina Balcan.
Fall 2015:
Machine Learning with Large Datasets, 10-605 and 10-805
, Tu-Thu 4:30-5:50am in Dougherty Hall 2210.
Spring 2015:
Machine Learning with Large Datasets, 10-605 and 10-805
, Tu-Thu 10:30-11:50am in BH A51
Fall 2014:
10-601 Machine Learning
, Tu-Thu 1:30-2:50, Wean 7500
Spring 2014:
10-605 Machine Learning with Large Datasets
, Mon-Wed 1:30-2:50, Dougherty Hall 1112
Fall 2013:
10-601 Machine Learning
, Mon-Wed 4:30-5:50, Doherty Hall 2315 (with Eric Xing).
Spring 2013:
Machine Learning with Large Datasets
, Mon-Wed 1:30-2:50, 4307 GHC
Fall 2012:
ML 10-802 and LTI 11-772 (Analysis of Social Media)
, 10:30-11:50pm Tues & Thus, 4303 Gates Building.
Fall 2012:
10-915, the MLD Journal Club
, 12-1:20pm Tue & Thu, 4101 Gates Building (with Roy Maxion).
Spring 2012:
Machine Learning with Large Datasets
, Tues-Thurs 1:30-2:50pm, NSH 1305
Fall 2011:
Structured
Prediction for Language and Other Discrete Data (SPLODD-2011)
, ML
10-710 and LTI 11-763, Tues-Thursday 3:00-4:20 in Gates-Hillman 4211.
This is co-taught by myself and Noah Smith, and will include some
subjects from
Information
Extraction
and some from
Language and Stats 2
. A
machine learning course (10-701 or consent of the instructors) is a
prereq; we don't recommend that you take the course if you have
already taken Information Extraction or Language and Stats 2.
Spring 2011:
ML 10-802 and LTI 11-772 (Analysis of Social Media)
, 10:30-11:50pm Tues & Thus, 4303 Gates Building.
Spring 2011:
10-915, the MLD Journal Club
, 3-4pm Mon & Wed, 4101 Gates Building.
Fall 2010:
10-707
(Information Extraction - cross-listed in LTI as 11-748)
1:30-2:50pm Mon & Wed, Gates 4101. The first class is 9/8, the
Wed after Labor Day, to allow incoming students time to attend the IC
courses.
Spring 2010:
10-802 (Analysis of Social Media)
Fall 2009:
10-707
(Information Extraction)
, 1:30-2:50pm Mon & Wed, 5222 Gates
Building.
Spring 2008:
10-601 (Machine Learning)
with
Tom Mitchell
, on 3-4:30
Mon & Wed in Wean Hall 5409.
Fall 2007:
Analysis of Social
Media
, Machine Learning 10-802 and LTI 11-772, with Natalie Glance
(of Google Pittsburgh) - a brand-new seminar course. 4:30-6:30
Tuesdays in Wean Hall 4623.
Note: This site is the shattered remains of a once-beautiful wiki,
created by the students of 10-802, generously hosted for free by
ScribbleWiki
, tragically lost (due
a combination of RAID drive failures and low-bidder backup schemes),
and then largely recovered using
Warrick
from various internel caches and archives.
Fall 2007:
Current Topics
in Computational Biology (Journal Club)
, 02-701. (
Announcements
). Thursdays from 4:00-5:00 in 411
Mellon Institute (after Cell & Systems Modeling).
Spring 2007:
Information Extraction
, Machine
Learning 10-707 and LTI 11-748 - back by popular demand for the first time since 2004!
Fall 2006:
Current Topics in Computational Biology (Journal Club)
, 02-701.
Announcements
Spring 2006:
Read the Web
, CALD 10-709.
June 21,23,25, 2005: A mini-course on Minorthird. Materials are below.
Slides, notes, and sample files from first
day's lecture
Slides, notes, and sample files from second
day's lecture
Powerpoint slides from third
day's lecture
Jar file for minorThird
, if you
only want to run the code, not compile it or read it.
The installation process here is:
Install Java 1.4 or higher (actually, JRE is all you need).
Download the
jar for minorThird
and stick it in some directory.
Optionally, download the
sample data
repository
and unpack it into the same directory.
Change to that same directory and
then run Minorthird with the command
java -Xmx500M -jar minorthird.jar
What will pop up will be a small launch pad that can be used to
start any of the UI programs. You can also start a particular
main by specifying minorthird.jar as your classpath, for
instance:
java -Xmx500M -cp minorthird.jar edu.cmu.minorthird.ui.Help
If you want to do a real install here's the
home page on Sourceforge
, and
a document on
how to do a CVS
install Minorthird
Spring 2004:
"Learning to Turn Words into Data:
Machine Learning Approaches to Information Extraction and Information Integration"
, CALD 10-707 and LTI 11-748.
Students and other colleagues
Daniel Spokoyny, LTI PhD student, co-supervised with Taylor Berg-Kirkpatrick.
Long-term colleagues
Katie Rivard Mazaitis
, research programmer/analyst, CMU
Former students
Haitian Sun
(former MLD PhD student, now at Apple).
Zhilin Yang
(former LTI PhD student, then Tsinghua professor, now doing a startup)
Bhuwan Dhingra
(former LTI PhD student, now an Assistant Professor at Duke)
Fan Yang
(former MLD PhD student)
Rose Catherine Kanjirathinkal
(former LTI PhD student, now at Meta)
Yifeng Tao
, (former CMU Comp Bio PhD student, co-supervised with Xinghua Lu.)
William Yang Wang
(former LTI PhD student, now a Professor at UCSB).
Dana Movshovitz-Attias
(former CSD PhD student, now at Google).
Bhavana Dalvi Mishra
(former LTI PhD student co-advised with
Jamie Callan
, now at Google)
Tae Yano
, (former LTIPhD student, co-advisedwith
Noah Smith
Nan Li
, (former CSD PhD student, co-advised with
Ken Koedinger
Ramnath Balasubramanyan
, (LTI PhD student)
Mahesh Joshi
, (former LTI PhD student,
co-advised with
Carolyn Rosé
Frank Lin
, (former LTI PhD student)
Ni Lao
(former LTI PhD student, now at Google)
Richard C. Wang
,(former LTI PhD student co-advised with
Bob Frederking
).
Andrew Arnold
(former MLD PhD student)
Einat Minkov
(former LTI PhD student, now at Haifa University)
Vitor Rocha de Carvalho
(former LTI PhD student)
Zhenzhen Kou
(former MLD PhD student)
Qiao Jin
, School of Medicine, Tsinghua University
Ezra Winston, MLD Master's student.
Lanxio (Karen) Xu, MLD Master's student.
Yuxing Zhang, MLD Master's student.
Jakob Bauer, MLD 5th-year Master's student
Kavya Srinet, MCDS Master's student.
Bhawna Juneja, MCDS Master's student.
Tom Shen, CMU CSD undergrad
Yu-Hsin Allen Kuo
, LTI MLT student, formerly co-advised with
Natasa Miskov-Zivanov
Rahul Goutam
, former LTI MLT student, co-advised with
Natasa Miskov-Zivanov
Malcolm Greaves
, former CSD master's student.
Edoardo Airoldi
(former MLD/Stats PhD student, co-advised with
Steve Fienberg
Ja-Hui Chang
(visiting faculty from National Central University, Taiwan, 2007-2008)
Wen Haw Chong (PhD student at Singapore Management University,
visted CMU in 2015-2016).
Tuan
Ahn Hoang
, (PhD student at Singapore Management University,
visited CMU for 2012-2013 academic year in my group).
Freddy
Chong Tat Chua
(PhD student at Singapore Management University,
visited CMU for the academic year 2011-2012 in my group.)
Gustavo Lacerda
(former research assistant, co-supervised with Noboru Matsuda and Ken Koedinger, now at UBC)
Lidong Bing
, former
postdoc, now at Tencent.
Ramesh Nallapati
(former postdoc, co-supervised with
John Lafferty
Noboru Matsuda
(former postdoc, co-supervised with
Ken Koedinger
now Associate Professor at NC State)
Pradeep Ravikumar
(former MLD PhD student, co-advised with
Steve Fienberg
I have been an external committee member for the PhD theses of
John Zelle
(degree
from U Texas)
Misha
Bilenko
(from U Texas)
Daniel Kudenko
(Rutgers)
Chumki Basu (Rutgers)
Ananlada Chotimongkol (CMU)
Wei-Hao Lin (CMU)
Cenk Gazen (CMU)
David Nadeau (U Ottowa)
Hanghang Tong
(CMU)
Ben van Durme (Rochester)
Partha Talukdar
(U Penn)
Andy Carlson
(CMU)
Yifen Huang
(CMU)
Swapna Sundaran
(U
Pitt)
Michael Heilman
(CMU)
Jon Elsas
(CMU)
Dipanjan Das
(CMU)
Fan Guo
(CMU)
Jana Diesner
(CMU)
Freddy Chong Tat Chua
(Singapore Management University).
Qirong Ho
(CMU)
Danai Koutra (CMU)
Reyyan Yeniterzi (CMU)
YiChi Wang (CMU)
Steven Gardiner (CMU)
Jay Pujara (Univ Maryland)
Derry Wijaya (CMU)
Lingjia Deng (Univ of Pittsburgh)
Chenyan Xiong (CMU)
Pradeep Dasigi (CMU)
Tiancheng Zhao (CMU)
Abulhair Saparov (CMU)
Danish Pruthi (CMU)
Sanket Vaibhav Mehta (CMU)
Luyu Gao (CMU)
Vidhisha Balachandran (CMU)
and some others...
I have also been an external committee member for the Master's theses of
Mehrbod Sharifi
(CMU) and
Weam Abu-Zaki (CMU).
Contact Info
Office: 8015 Gates-Hillman Center
My preferred email address for CMU-related matters is:
wcohen AT cmu DOT edu