On 14/03/07, Ryan Simpkins <centos@xxxxxxxxxxxxxxxx> wrote:
On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed): > On 14/03/07, Ryan Simpkins <centos@xxxxxxxxxxxxxxxx> wrote: >> Try doing a simple 'cat /var/log/maillog | grep -c check_relay' > > You can avoid the unnecessary 'cat' by just passing the filename to grep directly: > > # grep -c 'checK_relay.*spamhaus' /var/log/maillog > # grep -c 'checK_relay.*spamcop' /var/log/maillog > # grep -c 'checK_relay.*njabl' /var/log/maillog > > Would probably be more efficient and faster, you can test with 'time' to verify this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again. Am I using time right to measure it?
Yep.
# time cat /var/log/maillog | grep check_relay | grep -c njabl 8 real 0m0.299s user 0m0.289s sys 0m0.009s # time grep -c 'check_relay.*njabl' /var/log/maillog 8 real 0m0.404s user 0m0.402s sys 0m0.000s Is the first 'time' measuring the whole one-liner, or just the time it takes to 'cat'?
It should be the time taken for the command line to execute.
I also tried this: time echo `cat /var/log/maillog | grep check_relay | grep -c njabl` 8 real 0m0.325s user 0m0.312s sys 0m0.012s time echo `grep -c 'check_relay.*njabl' /var/log/maillog` 8 real 0m0.411s user 0m0.408s sys 0m0.002s I ran these several times mixed back and forth to try and see if they were flukes, these numbers appear to be representitive of the average. What do you get on your system? Maybe passing the file name to grep gets faster as the file size increases? wc /var/log/maillog 12323 142894 1588860 /var/log/maillog I wonder if the issue here is actually the 'stuff*morestuff' as that might be a more expensive match:
I think you're correct, that regexp wildcard is slower. I've done similar cat/grep/awk tests myself and in *some* cases using awk's pattern matching '/foo/ { awkstuff }' has been quicker than grep so it's always worth running the numbers a couple of times to see what's most effective for a given/typical dataset. The removal of the redundant cat still stands though. There really is no conceivable benefit to forking that additional process. I don't think, anyway. :) And of course, when you start to loop through running for i in `list of stuff` do grep blah | grep -c snee done for example, depending on the number of iterations through the loop it's worth thinking about how you're doing stuff. There is an element of early overoptimisation mind, if something's working on a box that's NOT heavily loaded then don't sweat it. Will. _______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos