NLTK - the Natural Language Toolkit - is a suite of open source Python modules, data and documentation for research and development in natural language processing. NLTK contains code supporting dozens of NLP tasks, along with 30 popular Corpora and extensive documentation including a 360-page online book.