On Tue, May 11, 2010 at 08:25:43AM +0000, sheraznaz@xxxxxxxxx wrote: > >>To be more specific, I need to find how many distinct records are there in say column#1? > > awk '{print $1}' filename | sort -u | wc -l > > This will show how many unique entries are present in column one (use awk -F to change delimiter e.g awk -F ":" for : delimiter) > > >> How can I filter out the distinct records with number of occurances less than a pre-determined threshold? > > I don't quite understand this part. > > awk '{print $1}' filename | sort | uniq -c | sort -rn > > Will give you a number of occurrences (reverse numerically sorted) of uniq data from column one. > > Now I think you want to put that through a loop and only show those that are less than threshold? If I understand correctly, you can pipe your output to: `awk '{a=$1} {if (a > 3) print a}''. `a' is awk variable. `$1' is first column of awk input so you probably need to change it. -- Dominik Zyla
Attachment:
pgpFiQmT6uwRp.pgp
Description: PGP signature
_______________________________________________ CentOS mailing list CentOS@xxxxxxxxxx http://lists.centos.org/mailman/listinfo/centos