Re: sendmail and rbl blocking - generating statistics

"Will McDonald" <wmcdonald@xxxxxxxxx> · Wed, 14 Mar 2007 22:16:33 +0000

On 14/03/07, Ryan Simpkins <centos@xxxxxxxxxxxxxxxx> wrote:
On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):
> On 14/03/07, Ryan Simpkins <centos@xxxxxxxxxxxxxxxx> wrote:
>> Try doing a simple 'cat /var/log/maillog | grep -c check_relay'
>
> You can avoid the unnecessary 'cat' by just passing the filename to grep directly:
>
> # grep -c 'checK_relay.*spamhaus' /var/log/maillog
> # grep -c 'checK_relay.*spamcop' /var/log/maillog
> # grep -c 'checK_relay.*njabl' /var/log/maillog
>
> Would probably be more efficient and faster, you can test with 'time' to verify
this. You're spawning one process 'grep', instead of three seperate processes,
'cat, 'grep' and 'grep' again.

Am I using time right to measure it?

Yep.

# time cat /var/log/maillog | grep check_relay | grep -c njabl
8

real    0m0.299s
user    0m0.289s
sys     0m0.009s

# time grep -c 'check_relay.*njabl' /var/log/maillog
8

real    0m0.404s
user    0m0.402s
sys     0m0.000s

Is the first 'time' measuring the whole one-liner, or just the time it takes to 'cat'?

It should be the time taken for the command line to execute.

I also tried this:
time echo `cat /var/log/maillog | grep check_relay | grep -c njabl` 8

real    0m0.325s
user    0m0.312s
sys     0m0.012s

time echo `grep -c 'check_relay.*njabl' /var/log/maillog`
8

real    0m0.411s
user    0m0.408s
sys     0m0.002s

I ran these several times mixed back and forth to try and see if they were flukes,
these numbers appear to be representitive of the average. What do you get on your
system? Maybe passing the file name to grep gets faster as the file size increases?

wc /var/log/maillog
  12323  142894 1588860 /var/log/maillog

I wonder if the issue here is actually the 'stuff*morestuff' as that might be a more
expensive match:

I think you're correct, that regexp wildcard is slower. I've done
similar cat/grep/awk tests myself and in *some* cases using awk's
pattern matching '/foo/ { awkstuff }' has been quicker than grep so
it's always worth running the numbers a couple of times to see what's
most effective for a given/typical dataset.

The removal of the redundant cat still stands though. There really is
no conceivable benefit to forking that additional process. I don't
think, anyway. :)

And of course, when you start to loop through running

for i in `list of stuff`
do
 grep blah | grep -c snee
done

for example, depending on the number of iterations through the loop
it's worth thinking about how you're doing stuff. There is an element
of early overoptimisation mind, if something's working on a box that's
NOT heavily loaded then don't sweat it.

Will.
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos