A collection of utilities to split Thai Unicode UTF-8 text by word boundaries, also known as word tokenization or word breaking. The utilities use emacs, swath, perl, and a c++ icu-project program. All use dictionary-based word splitting. Also included is a merged dictionary file of Thai words, a perl script to grep Thai UTF-8 words, and an emacs library that can split, unsplit, spellcheck, and play audio for Thai words.