Re: Optimizing grep, sort, uniq for speed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]



Woodchuck wrote:
> On Thu, Jun 28, 2012 at 01:30:33PM -0500, Sean Carolan wrote:
>> This snippet of code pulls an array of hostnames from some log files.
>> It has to parse around 3GB of log files, so I'm keen on making it as
>> efficient as possible.  Can you think of any way to optimize this to
>> run faster?
>
> If the key phrase is *as efficient as possible*, then I would say
> you want a compiled pattern search.  Lex is the tool for this, and

That, to me, would be a Big Deal.
<snip>
> BTW, you could easily incorporate a sorting function in lex that
> would eliminate the need for an external sort.  This might be done in awk,
> too, but in lex it would be more natural.  You simply would not
<snip>
Hello, mark, wake up.

Of course, there's an even easier way, just using awk:

awk '{if (/[-\.0-9a-z][-\.0-9a-z]*.com/) { hostarray[$9] = 1;}} END { for
(i in hostarray ) { print i;}}'

This dumps it into an associative array - that's one whose indices are a
string - so it will by default be in order.

       mark

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos


[Index of Archives]     [CentOS]     [CentOS Announce]     [CentOS Development]     [CentOS ARM Devel]     [CentOS Docs]     [CentOS Virtualization]     [Carrier Grade Linux]     [Linux Media]     [Asterisk]     [DCCP]     [Netdev]     [Xorg]     [Linux USB]
  Powered by Linux