Chairs:Martha Palmer, Chris Brew, Fei Xia
Date: June 19, 2008

Third Workshop on Issues in Teaching Computational Linguistics

ACL 2008d

Teaching Computational Linguistics to a Large, Diverse Student Body: Courses, Tools, and Interdepartmental Interaction

Jason Baldridge and Katrin Erk

  • Large, diverse student body
  • Dealing with diversity: different classes for different student audiences
  • working with corpora (no programming requirement). Provides practical skills to linguists, so they can manipulate their own corpora, and corpora that are interesting to them. This class works better for people who already have data, and "feel the pain" of not being able to process it themselves, extract things they want, etc.
    • A big part of the course is a very slow introduction to python.
    • Also includes search (regexps, tregex, etc)
    • Annotation & corpus formats. xml, meta-data, etc.
    • intra- and inter- annotator agreement, kappa
    • existing resources
    • statistics (freq, correlations, etc) using R
    • students really like visualization. they really liked using R, and didn't have trouble picking it up after python.
  • CL 1 (algorithms & data structures)
  • CL 2 (machine learning methods)
  • language & computers -- adapted from OSU class. study language technology applications, and look at how they work. discuss the technology, but without any programming. topics include writing systems, classification, spell checking, etc.
    • undergrads like high-level issues like "can a computer think?"
    • combine that with low-level aspects (eg edit distance, regular languages, freq dists, n-grams are useful)
  • working with corpora
  • NLP
    • cross-listed in CS, has a programming prereq

New tools:

  • OpenCCG: grammar engineering workbench
  • Shalmaneser: shallow semantic parser. use as a teching tool for machine learning, automatic semantic analysis, mapping syntax to semantics.

Use a wiki for tutorials, link collections, tools. Esp. gets used by grad students

Building a Flexible, Collaborative, Intensive Master’s Program in Computational Linguistics

Emily M. Bender, Fei Xia and Erik Bansleben

  • CLMA program (computational linguistics masters of art)
  • some turnover into the phd program
  • designed as a 12-month full-time intensive program
  • evolved to be more flexible (part-time)
  • 9 courses (3 quarters) plus a thesis/internship

does not include traditional "intro nlp" class -- doesn't fit into the calendar, and the student population for the program is already motivated etc.


  • Shallow methods
  • Deep methods
  • Advanced statistical nlp
  • Systems/applications


  • multilingual grammar engineering
  • MT
  • corpus processing
  • text to speech
  • language models in multilingual nlp
  • lexical acquisition
  • etc

hands on, collaborative projects are important. motivational, allows exploring larger systems, models real-world, etc.


  • ability to program
  • college-level probability & stats
  • calculus
  • basic linguistics concepts (POS, CFG)

(though there's flexibility in the prereqs)

there were some remote students -- lectures were broadcast, etc.

Freshmen’s CL Curriculum: The Benefits of Redundancy

Heike Zinsmeister

Newly revised bachelor's program in CL at Heidelberg.

establish a common higher education program.

  • new teaching staff
  • four mandatory courses
  • end-of-semester evaluation: is the new curriculum design reasonable? do students benefit from interdependencies?

Defining a Core Body of Knowledge for the Introductory Computational Linguistics Curriculum

Steven Bird

  • what is the core body of knowledge for CL?
  • there's a lot of diversity, and not necessarily a lot of overlap between different classes

(read this paper)

Panel on Curricula

  • CMU
  • Stanford (Dan Jurafsky & Chris Manning)
    • is CL too marginalized from CS?
    • define a bigger tent -- "information track"
    • acm curriculum -- cs is too big to really cover it all
    • define an option that leads into CL but picks up from other things. e.g., take up ownership of formal languages, automata, bio sequence models, etc. reclaim the web?
    • linguists who want to go into CL vs linguists who want to know a little programming.
  • Edinburgh
    • different ways to structure the material
    • need to be sensitive to what's taught in other courses
    • much that is essential to CL belongs to and is taught by its parent disciplines
    • take care of homeworks
    • computational linguists need to think about both the long tail and the short fat head.
    • many things in linguistics are zipf-ian -- c.f. handwriting recognition
  • Michigan (Dragomir Radev)
    • on his webpage, list of skills, topics, collected from students

Final Session Notes

  • wiki for core curriculum
    • define a core, plus a broad range of topics
    • when we send it out to acl, ask people to look at their minors, concentrations, etc, and see how they line up with the core curriculum description
    • collect suggestions of mailing lists that this should be sent to (not just acl)
  • wiki for assignments. associate assignments with learning objectives. sort by learning objective. once the core curriculum is fleshed out, this could be links.
    • also visualization tools, web demos, etc
  • also wiki for slides. there was a request for slide collections for non-core material from an NLP point of view.
  • password accounts for solutions
  • diverse objectives
  • make sure that your materials are available to the outside world via the web
    • if you use courseware, this may be hard -- talk to administrators, or mirror the content manually
  • gorilla approach -- inject CL into other fields (as problems, etc)
    • try putting together stand alone exercises, which don't necessarily need much background info. fit within a set of goals for what we want to teach.
  • textbooks? most people need to dip in and out of textbooks. suggestion: an online textbook with a large number of chapters on various topics, making it easier to pick and choose.
  • many people who are relevant to this discussion (eg who do corpus linguistics courses) are not people who come to ACL -- make sure to think about including their ideas as well.

meta-deadline -- September 1st.

Target dates: we didn't agree on these.

  • first draft on core curriculum
  • comments on core curriculum: first pass --
  • comments on core curriculum: second pass --