Re: sendmail and rbl blocking - generating statistics

"Ryan Simpkins" <centos@xxxxxxxxxxxxxxxx> · Thu, 15 Mar 2007 13:56:12 -0600 (MDT)

On Wed, March 14, 2007 16:16, Will McDonald wrote:
> On 14/03/07, Ryan Simpkins <centos@xxxxxxxxxxxxxxxx> wrote:
>> On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):
>> > On 14/03/07, Ryan Simpkins <centos@xxxxxxxxxxxxxxxx> wrote:
>> >> Try doing a simple 'cat /var/log/maillog | grep -c check_relay'
>> >
>> > You can avoid the unnecessary 'cat' by just passing the filename to grep
>> directly:
>> >
>> > # grep -c 'checK_relay.*spamhaus' /var/log/maillog
>> > # grep -c 'checK_relay.*spamcop' /var/log/maillog
>> > # grep -c 'checK_relay.*njabl' /var/log/maillog
>> >
>> > Would probably be more efficient and faster, you can test with 'time' to verify
>> this. You're spawning one process 'grep', instead of three seperate processes,
>> 'cat, 'grep' and 'grep' again.
>>
>> Am I using time right to measure it?

I see from other posts I wasn't using it right. So I re-wrote and tested again on
the same system, about the same log size:

##########################
$ cat timetest1
#!/bin/bash

for x in `seq 1 3000`; do
        cat /var/log/maillog | grep check_relay | grep -c njabl > /dev/null
done

$ time ./timetest1

real    0m36.685s
user    0m12.505s
sys     0m24.136s

##########################
$ cat timetest2
#!/bin/bash

for x in `seq 1 3000`; do
        grep -c 'check_relay.*njabl' /var/log/maillog > dev/null
done

$ time ./timetest2

real    2m57.914s
user    2m50.574s
sys     0m7.134s

##########################
$ cat timetest3
#!/bin/bash

for x in `seq 1 3000`; do
        grep -c njabl /var/log/maillog > dev/null
done

$ time ./timetest3

real    0m13.331s
user    0m6.895s
sys     0m6.429s

##########################
$ cat timetest4
#!/bin/bash

for x in `seq 1 3000`; do
        cat /var/log/maillog | grep -c njabl > /dev/null
done

$ time ./timetest4

real    0m28.442s
user    0m9.520s
sys     0m18.905s

I think this proves the original poster right on his main point. Getting rid of the
cat speeds things up quite a bit. However, it could be argued that it only matters
if you are doing quite a few in a row, in this case 3000. And it further proves that
doing a 'pattern*pattern' is not a good idea at all (at least not with grep).

One poster also argued on ease of coding. I typically code like thus (my brain
thinking inside the '*'):

cat file | less; *yes, that is the right data, and I see the pattern I wanna match*
cat file | grep pattern | less; *ahh, mistake*
cat file | grep pattern2 | less; *yes, that is right, but still need to reduce*
cat file | grep pattern2 | grep pattern3 | less; *yes, that is looking about right*

The alternate method?

less file; *Right data, I see the patterns*
grep pattern file | less; *mistake*
grep pattern2 file | less; *right, time to reduce*
grep pattern2+pattern3 file | less; *Yes, that is right*

What I don't like about the alternate method is where the file name lives in the
first two lines between the comparison. Also, the pattern is before the file on the
first grep, making it harder to adjust the pattern (which some of us need to do
quite a lot). It makes more sense to me to just add a | on the end and keep going.
Further, for me, it is easier to reduce data by stringing greps together rather than
come up with the regex-fu to do it all in one pattern. Maybe if I were better at
regex...

However, I 100% agree that doing strings of | produces inefficient more often. I
think it is wise to go back and find efficiencies when needed.

-Ryan
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos