On 06/29/2015 11:48 AM, Bill Oliver wrote:
On Mon, 29 Jun 2015, jd1008 wrote:
On 06/29/2015 03:39 AM, Dario Lesca wrote:
Il giorno dom, 28/06/2015 alle 18.38 -0600, jd1008 ha scritto:
> Hi,
> I have text files made of paragraphs of text, separated by
> blank lines.
> > Each "paragraph" is information about a different item.
> > In need to sort these paragraphs based on the first line
> of each paragraph.
> > Need some hints how to accomplish this.
> > Thanx.
An example of your text file can help us to help you.
I described them perfectly.
text paragraphs made of a few or several lines.
The paragraphs are separated by an empty line.
Try something like this. It's buggy, but what can you expect for 5
minutes of work.
This takes a text with lines separated by hard breaks, and an empty line
between paragraphs, and sorts it.
Here are the obvious problems I haven't bothered to debug:
1) I counts the empty lines as paragraphs, so you get blank space at the
top.
2) I'm doing something wrong with asort (see comment).
3) It looks like I'm sorting twice -- once with asort, and then to
reindex. There should be a smart way to do this.
Here's the awk code:
BEGIN{newparagraph=0; numlines=0; paranum=0;}
{
#if the line is blank, it's time to start a new paragraph
if ($0==""){
paranum++;
numlines=0;
}
#if it's not blank, buffer it
else {
numl[paranum]=numlines;
paragraph[paranum][numlines++] = $0;
}
}
END{
for (i=0;i<=paranum;i++){
firstline[i] = paragraph[i][0]
}
#for a reason I don't understand, "sorted" has one index more
than firstline!?
#I'm probably making some mistake with starting with 0 vs 1,
but I'm not going to fix it.
# so, I'll just increment paranum, because I'm lazy
asort(firstline,sorted);
paranum++
#Renumber the indices
for (i=0;i<=paranum;i++){
found=0;
newindex[i] = 999;
for(j=0;((j<=paranum) && (found==0));j++){
if(sorted[i] == firstline[j]){
newindex[i]=j;
found=1;
}
}
}
#print it out
for(i=0;i<=paranum;i++){
current_paragraph = newindex[i];
new_numlines = numl[current_paragraph];
for (j=0;j<=new_numlines;j++){
print (paragraph[newindex[i]][j]);
}
print("");
}
}
Here is the simplest solution and it does what I want without resorting
to awk:
for i in `/bin/ls -1 lists*`; do
sed '/./{H;d;};x;s/\n/={NL}=/g' $i | sort | sed
'1s/={NL}=//;s/={NL}=/\n/g' > $i.sorted.txt
done
--
users mailing list
users@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe or change subscription options:
https://admin.fedoraproject.org/mailman/listinfo/users
Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct
Guidelines: http://fedoraproject.org/wiki/Mailing_list_guidelines
Have a question? Ask away: http://ask.fedoraproject.org