Re: Awk and sort (of text files)

jd1008 <jd1008@xxxxxxxxx> · Mon, 29 Jun 2015 12:57:12 -0600

On 06/29/2015 11:48 AM, Bill Oliver wrote:
On Mon, 29 Jun 2015, jd1008 wrote:

On 06/29/2015 03:39 AM, Dario Lesca wrote:
 Il giorno dom, 28/06/2015 alle 18.38 -0600, jd1008 ha scritto:
>  Hi,
>  I have text files made of paragraphs of text, separated by
>  blank lines.
> >  Each "paragraph" is information about a different item.
> >  In need to sort these paragraphs based on the first line
>  of each paragraph.
> >  Need some hints how to accomplish this.
> >  Thanx.
 An example of your text file can help us to help you.

I described them perfectly.
text paragraphs made of a few or several lines.

The paragraphs are separated by an empty line.

Try something like this.  It's buggy, but what can you expect for 5
minutes of work.

This takes a text with lines separated by hard breaks, and an empty line
between paragraphs, and sorts it.

Here are the obvious problems I haven't bothered to debug:

1) I counts the empty lines as paragraphs, so you get blank space at the
top.

2) I'm doing something wrong with asort (see comment).

3) It looks like I'm sorting twice -- once with asort, and then to
reindex.  There should be a smart way to do this.

Here's the awk code:

BEGIN{newparagraph=0; numlines=0; paranum=0;}

        {
        #if the line is blank, it's time to start a new paragraph
        if ($0==""){
                paranum++;
                numlines=0;
                }
        #if it's not blank, buffer it
        else {
                numl[paranum]=numlines;
                paragraph[paranum][numlines++] = $0;
                }

        }

END{

        for (i=0;i<=paranum;i++){
                firstline[i] = paragraph[i][0]

                }

        #for a reason I don't understand, "sorted" has one index more 
than firstline!?
        #I'm probably making some mistake with starting with 0 vs 1, 
but I'm not going to fix it.
        # so, I'll just increment paranum, because I'm lazy
        asort(firstline,sorted);
        paranum++

        #Renumber the indices
        for (i=0;i<=paranum;i++){

                found=0;
                newindex[i] = 999;
                for(j=0;((j<=paranum) && (found==0));j++){
                        if(sorted[i] == firstline[j]){
                                newindex[i]=j;
                                found=1;
                                }
                }

        }

        #print it out
        for(i=0;i<=paranum;i++){

                current_paragraph = newindex[i];
                new_numlines = numl[current_paragraph];

                for (j=0;j<=new_numlines;j++){
                        print (paragraph[newindex[i]][j]);
                        }
                print("");
                }

        }

Here is the simplest solution and it does what I want without resorting 
to awk:
for i in `/bin/ls -1 lists*`; do
sed '/./{H;d;};x;s/\n/={NL}=/g' $i | sort | sed 
'1s/={NL}=//;s/={NL}=/\n/g' > $i.sorted.txt
done

--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org