Re: Text file manipulation in CentOS?

sheraznaz@xxxxxxxxx · Tue, 11 May 2010 08:25:43 +0000

>>To be more specific, I need to find how many distinct records are there in say column#1?

awk '{print $1}' filename | sort -u | wc -l

This will show how many unique entries are present in column one (use awk -F to change delimiter e.g awk -F ":" for : delimiter)

>> How can I filter out the distinct records with number of occurances less than a pre-determined threshold?

I don't quite understand this part.

awk '{print $1}' filename | sort | uniq -c | sort -rn

Will give you a number of occurrences (reverse numerically sorted) of uniq data from column one. 

Now I think you want to put that through a loop and only show those that are less than threshold?

Thanks
Sheraz

------Original Message------
From: sheraznaz@xxxxxxxxx
Sender: centos-bounces@xxxxxxxxxx
To: CentOS mailing list
ReplyTo: CentOS mailing list
Subject: Re:  Text file manipulation in CentOS?
Sent: May 11, 2010 1:14 AM

Can you sample input and expected result.

Sent from my Verizon Wireless BlackBerry

-----Original Message-----
From: hadi motamedi <motamedi24@xxxxxxxxx>
Date: Tue, 11 May 2010 09:09:23 
To: CentOS mailing list<centos@xxxxxxxxxx>
Subject:  Text file manipulation in CentOS?

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos

Sent from my Verizon Wireless BlackBerry
_______________________________________________
CentOS mailing list
CentOS@xxxxxxxxxx
http://lists.centos.org/mailman/listinfo/centos