> The approach is problematic, especially when > using "three letter word" combinations, which match > arbitrary, harmless URLs. The dreaded "unintended match in the middle of a word" problem can torpedo most any approach. Solving it is not a matter of changing approaches, but rather of changing tools. It can adversely affect virtually any approach; conversely it can be "fixed" in virtually any approach. What's needed is a way to specify "word boundaries" while regular expression matching. Unfortunately the regular expression syntax for word boundaries varies from tool to tool. Perl and its derivatives let you specify \b at the beginning and/or end of a word (or its opposite \B for not-a-word-boundary). Classic `egrep` provides the same functionality but with the different regular expression syntax \< at the beginning of a word and \> at the end of a word. GNU egrep, GNU awk, and GNU Emacs support both syntaxes. Tcl provides word boundary functionality with \m, \M, and \y. Both Java and .NET are Perl-like. The "-F" command line switch turns GNU `grep` into its even stupider cousin `fgrep`, neither of which let you specify word boundaries in regular expressions at all. GNU grep does however let you use Perl-style regular expression by specifying "-P" on the command line. And perhaps most importantly, GNU grep (and GNU egrep, which is the same program with different switches) lets you quickly and automatically turn _everything_ in your regular expressions into full words with the "-w" command line switch (lots of convenience, not much control:-). In summary: If you want to specify word boundaries inside the regular expressions, use either Perl or GNU grep -P or some other fairly modern tool. If you want word boundary functionality withOUT specifying word boundaries in the regular expressions themselves, use GNU grep -w. If you have no other choice, you can make it work with classic egrep by inserting \< and \> appropriately in your regular expressions. But classic grep won't do word boundaries no matter what. (You can sorta fake it, but it's a lot of effort and it doesn't work in all cases.) Note in particular that the easy-to-overlook "-w" command line switch on GNU grep can make a night/day difference. Please do let this list know your results after a few months (It sounds like I'm not the only one that's a bit skeptical that the "bad words in URL" approach that seemed to work reasonably a couple years ago will give even ballpark results these days...) thanks! -Chuck Kollars