On Wed, Jun 18, 2008 at 02:49:48PM +0200, Sabbiolina wrote: > www.google.com is only treated as a unique word? Why not producing multiple > tokens like www.google.com, www, ., google, ., com? (obviously www and . can > be nulled or stopworded). You wouldn't want to get the token ".". It's not a token, but a label boundary. So in your analogy of treating the labels in a FQDN as "words", the "." needs to be treated the way spaces are between words. A -- Andrew Sullivan ajs@xxxxxxxxxxxxxxxxx +1 503 667 4564 x104 http://www.commandprompt.com/