Re: Awk and sort (of text files)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 06/29/2015 11:48 AM, Bill Oliver wrote:
On Mon, 29 Jun 2015, jd1008 wrote:



On 06/29/2015 03:39 AM, Dario Lesca wrote:
 Il giorno dom, 28/06/2015 alle 18.38 -0600, jd1008 ha scritto:
>  Hi,
>  I have text files made of paragraphs of text, separated by
>  blank lines.
> >  Each "paragraph" is information about a different item.
> >  In need to sort these paragraphs based on the first line
>  of each paragraph.
> >  Need some hints how to accomplish this.
> >  Thanx.
 An example of your text file can help us to help you.

I described them perfectly.
text paragraphs made of a few or several lines.

The paragraphs are separated by an empty line.



Try something like this.  It's buggy, but what can you expect for 5
minutes of work.

This takes a text with lines separated by hard breaks, and an empty line
between paragraphs, and sorts it.

Here are the obvious problems I haven't bothered to debug:

1) I counts the empty lines as paragraphs, so you get blank space at the
top.

2) I'm doing something wrong with asort (see comment).

3) It looks like I'm sorting twice -- once with asort, and then to
reindex.  There should be a smart way to do this.

Here's the awk code:


BEGIN{newparagraph=0; numlines=0; paranum=0;}

        {
        #if the line is blank, it's time to start a new paragraph
        if ($0==""){
                paranum++;
                numlines=0;
                }
        #if it's not blank, buffer it
        else {
                numl[paranum]=numlines;
                paragraph[paranum][numlines++] = $0;
                }

        }


END{

        for (i=0;i<=paranum;i++){
                firstline[i] = paragraph[i][0]

                }

#for a reason I don't understand, "sorted" has one index more than firstline!? #I'm probably making some mistake with starting with 0 vs 1, but I'm not going to fix it.
        # so, I'll just increment paranum, because I'm lazy
        asort(firstline,sorted);
        paranum++



        #Renumber the indices
        for (i=0;i<=paranum;i++){

                found=0;
                newindex[i] = 999;
                for(j=0;((j<=paranum) && (found==0));j++){
                        if(sorted[i] == firstline[j]){
                                newindex[i]=j;
                                found=1;
                                }
                }

        }

        #print it out
        for(i=0;i<=paranum;i++){

                current_paragraph = newindex[i];
                new_numlines = numl[current_paragraph];

                for (j=0;j<=new_numlines;j++){
                        print (paragraph[newindex[i]][j]);
                        }
                print("");
                }

        }

Here is the simplest solution and it does what I want without resorting to awk:
for i in `/bin/ls -1 lists*`; do
sed '/./{H;d;};x;s/\n/={NL}=/g' $i | sort | sed '1s/={NL}=//;s/={NL}=/\n/g' > $i.sorted.txt
done


--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org



[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [EPEL Devel]     [Fedora Magazine]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Desktop]     [Fedora Fonts]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Fedora Sparc]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux