Re: Sort logfiles on common lines?

John R Pierce <pierce@xxxxxxxxxxxx> · Sun, 25 Sep 2011 12:43:38 -0700

On 09/25/11 12:18 PM, Dotan Cohen wrote:
> On Sun, Sep 25, 2011 at 22:06, John R Pierce<pierce@xxxxxxxxxxxx>  wrote:
>>> Is there a way to get the most common (unique) lines of the file?
>> sort -k 3 | uniq -f 2
>>
>>
>> which will sort starting at field 3, and then print lines that are
>> unique, skipping the first 2 fields, where fields by default are blank
>> separated.
>>
> Thanks, John. This looks to me that it will sort alphabetically, not
> by commonness. For instance:
> ERROR b
> ERROR a
> ERROR b
>
> Since "ERROR b" was reported more often than "ERROR a", I would prefer
> that the output be:
> ERROR b
> ERROR a
>
> I'm sorry for not making that so clear! Is there a good word for "most
> common" or "used most often" that would be concise in this context?

uniq can count occurances.  will require two sorts.  one to get all 
similar errors adjacent, the other to sort by count order.   instead of 
using field selects, lets just clip the timestamps off up front...

   cut -c 17- | sort | uniq -c | sort -rn

(17- means from char 17 on... I may have miscounted)

-- 
john r pierce                            N 37, W 122
santa cruz ca                         mid-left coast

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos