Killfile bug causes "random" unsubscribing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Trn 4.0-test76 says:

# Send bug reports, suggestions, etc. to:  trn-workers@lists.sourceforge.net

Here's a bug report.  And a side comment at the end.


Back in 2000, on August 27 and November 26, I emailed the following
bug report as part of an omnibus message to trn@clari.net:

> Vex has trn 4.0-test72 (19 April 1999) installed.  I normally read news
> using this on a VT100-like terminal from sh.
> 
> [0] There is a bug that from time to time randomly unsubscribes me from
> one newsgroup.  I believe it only happens to groups where I have a killfile,
> and most often (maybe always), to ones where I have a large killfile
> containing complicated regular expressions.  A symptom that the bug is
> about to hit is that the count of remaining articles in the newsgroup
> goes negative -- if I notice that, I can "q" out of the group, then "-"
> back to it, whereupon I'm offered the option of resubscribing.  Otherwise,
> of course, I first have to notice that I've been unsubscribed, then use
> "g" to resubscribe, which is a bit of a pain.
> 
> This bug was also present in the trn version previously installed here,
> which I think was 4.0-test69, and it hit *more often* in that version than
> in 4.0-test72.  I don't remember whether the negative article count
> applied there.


Vex now has trn 4.0-test76 (Apr 2, 2001) installed, and this bug
is still present in it.  And I finally decided to try and find it
myself.  Over the last few weeks I have been trying to get it pinned
down, and after a goodish number of hours spread over that time,
I have found it.  Eureka.

However, although I know what goes wrong, I don't know how to fix it.
So I turn to trn-workers for help.

In the original bug report, the part about "more often" in the older
version of trn is probably wrong; if there was a change that really
affected the probability of the bug hitting, it was in my own killfile.
The critical thing about the killfile is not the complexity of the
regular expressions, but the fact that it kills based on the article
body ("a" or "B" modifier).  Also critical is that this is an NNTP-based
installation.

The "random" unsubscribing happens when

	- I'm reading a newsgroup
	- Before I get to the end of the group, additional news comes in
	  and gets processed by my killfile
	- The last article that came in is killed by an entry in my
	  killfile that uses an article-body modifier.

You should be able to reproduce this as follows.  (Some of the steps can
be varied, I'm sure.)

	- Start trn, using an NNTP server.
	- Subscribe to a newsgroup with heavy traffic.
	- Enter the group and select a few threads.
	- Edit the killfile and add a line like /./a:j that will
	  junk lots of articles (what I actually used when confirming
	  the bug was /the/a:=:j).
	- Keep pressing space to read articles in a normal manner.
	- When the selected threads run out, watch the killfile
	  junk the new articles that came in while you were reading.
	  (If there weren't any yet, go back to "select a few threads"
	  and read slower this time.)
	- Select and read another article.
	- Quit out of the newsgroup, then "-" back to it.
	- It will say that you're unsubscribed and ask if you
	  want to resubscribe.


The bug happens because, when the killfile pattern matches, trn
immediately stops reading the article body from the NNTP server.
The NNTP part of trn's code knows it didn't finish, and when
another part of the code attempts a different NNTP transaction,
things start going wrong.


*** In more detail, THIS is what happens. ***


We are reading a newsgroup and are inside the function do_newsgroup().
We get to the top of the article-selection loop:

ng.c:243:    for (; art <= lastart+1; ) {	/* for each article */

ng.c:248:	if ((art > lastart || forcegrow) && !keep_the_group_static) {

and find that art == lastart+1, so it's time to check for more articles.
These lines are executed:

ng.c:253:	    if (datasrc->flags & DF_REMOTE) {
ng.c:254:		if (datasrc->act_sf.fp || getngsize(ngptr) > lastart) {

And we find that there is one new article, so getngsize() > lastart.
After calling nntp_group(), we then get to this:

ng.c:259:		    grow_ng(ngptr->ngmax);

Which brings us to this:

ngdata.c:184:	    kill_unwanted(tmpfirst,buf,TRUE);

Which brings us to this:

kfile.c:365:	    intr = do_kfile(localkfp,entering);

Which brings us to this:

kfile.c:207:	    switch (art_search(bp, (sizeof buf) - (bp - buf), FALSE)) {

Which brings us to this:

artsrch.c:346:	    if (wanted(compex,art,howmuch)) {

Now, in the problematic case, the particular killfile pattern that
we're working on here has a modifier such as "a" or "B" that causes
the whole article body to be scanned.  And the article hasn't been
parsed already.  So inside wanted(), we will get to this line:

artsrch.c:423:	    if (!artopen(artnum,(ART_POS)0))

Now artopen() calls artclose() in case there is already an article
open, and then gets to this:

artio.c:62:	nntp_body(artnum);

In the source for nntp_body(), there is an initial comment reading:

nntp.c:179:/* copy the body of an article to a temporary file */

but despite this, what actually happens here is that it causes the
article header to be copied to a temporary file.  First it confirms
that the article isn't already in a tempfile, then it allocates a
tempfile for it, and then, since the article's header wasn't already
parsed, it executes these lines:

nntp.c:210:    sprintf(ser_line, "ARTICLE %ld", (long)artnum);

nntp.c:211:    if (nntp_command(ser_line) <= 0)

nntp.c:213:    switch (nntp_check()) {

Thus it is telling the NNTP server to send the article header and body,
and checking the server's response, which is code 220 followed by the
article number, message-ID, and "article retrieved - head and body follow".
That's fine, so now we get to this loop:

nntp.c:231:	while (nntp_copybody(b, sizeof b, body_end+1) > 0) {
nntp.c:232:	    if (*b == '\n' && body_end - prev_pos < sizeof b)
nntp.c:233:		break;
nntp.c:234:	    prev_pos = body_end;
nntp.c:235:	}

Again, despite the name nntp_copybody(), what actually happens *here*
is that at each call to the function, one line is read from the header
of the open article (and written to the tempfile, but I won't reproduce
that code here).  When we get to the empty line at the end of the
header, the *b == '\n' test at line 232 succeeds and we leave the loop.

Within nntp_copybody(), we read a line or part-line from the NNTP
server by doing:

nntp.c:258:	found_nl = nntp_gets(s, limit-1);

And if we got a complete line, we do:

nntp.c:262:	    if (nntp_at_list_end(s)) {

In effect, this tests whether the line we just got was ".", which
indicates EOF on the article:

nntpclient.c:239:bool
nntpclient.c:240:nntp_at_list_end(s)
nntpclient.c:241:char* s;
nntpclient.c:242:{
nntpclient.c:243:    if (!s || (*s == '.' && (s[1] == '\0' || s[1] == '\r'))) {
nntpclient.c:244:	nntplink.flags |= NNTP_NEW_CMD_OK;
nntpclient.c:245:	return 1;
nntpclient.c:246:    }
nntpclient.c:247:    nntplink.flags &= ~NNTP_NEW_CMD_OK;
nntpclient.c:248:    return 0;
nntpclient.c:249:}

Note that it sets the NNTP_NEW_CMD_OK flag to true if the "." (for EOF)
*was* read, and false if it was not.

Now, as I said, this all happens for each line of the header, then we
hit the break at nntp.c line 233 above.  nntp_body() then sets the
NNTP_NEW_CMD_OK flag to false, presumably because it knows that not
all of the article has been read, and returns.

nntp.c:238:    nntplink.flags &= ~NNTP_NEW_CMD_OK;
nntp.c:239:}

This brings us back into artopen(), in which we drop through to the
end and return the pointer to the tempfile.  This brings us back into
wanted().  We next call

artsrch.c:425:	    if (!parseheader(artnum))

which parses the article header, and then proceed to process the body.

artsrch.c:428:	/* loop through each line of the article */
artsrch.c:429:	seekartbuf(htype[PAST_HEADER].minpos);
artsrch.c:430:	while ((s = readartbuf(FALSE)) != NULL) {

I have not traced through readartbuf() in detail, but when it finds
that the next line of the body has not been buffered, it causes
readart() to be called.  This in turn calls

artio.c:154:	return nntp_readart(s,limit);

which again calls

nntp.c:354:	    if (nntp_copybody(s, limit, body_pos+1) <= 0)

and this time this causes a line of the article body to be read from
the NNTP server.  Remember this point!  Now back in wanted(), this
line is compared against the the killfile pattern that we're working on:

artsrch.c:441:	    success = success || execute(compex,s) != NULL;
artsrch.c:442:	    if (nlptr)
artsrch.c:443:		*nlptr = ch;
artsrch.c:444:	    if (success && !in_sig)		/* does it match? */
artsrch.c:445:		return TRUE;			/* say, "Eureka!" */

And wanted() returns true.  NOTE that this happens within the article-
reading loop, as soon as a line is found that matches the pattern.

Okay, now we return to art_search() and perform the action specified
by the killfile, which is to junk the article:

artsrch.c:350:		if (perform(cmdlst,output_level && page_line == 1) < 0) {

And this was the last article, so art_search() then returns.

Now kill_unwanted() will continue examining killfile entries via
do_kfile(), but since there was only one new article and we already
junked it, no further action will take place.  So we return to
grow_ng(), which also has no more to do, so we finally return to
do_newsgroup().

We execute the following; I'm not sure what it does.

ng.c:268:	if (art != 0 || (artp && !(artp->flags & AF_TMPMEM)))
ng.c:269:	    artp = article_find(art);

Then we test

ng.c:289:	if (art > lastart) {		/* are we off the end still? */

which is false.  So we get to here and this happens:

ng.c:357:	else if (!reread && (was_read(art)
ng.c:358:		|| (selected_only && !(artp->flags & AF_SEL)))) {
ng.c:359:					/* has this article been read? */
ng.c:360:	    inc_art(selected_only,FALSE);/* then skip it */
ng.c:361:	    continue;

Which takes us back to here.

ng.c:243:    for (; art <= lastart+1; ) {	/* for each article */
ng.c:244:	set_mode('r','a');
ng.c:246:	/* do we need to "grow" the newsgroup? */
ng.c:248:	if ((art > lastart || forcegrow) && !keep_the_group_static) {
ng.c:249:	    ART_NUM oldlast = lastart;

Now, in case you didn't notice, the last transaction with the NNTP server
was to read a line of the article body -- this happened at the words
"Remember this point!" above.  So the NNTP server is still waiting to
give us the rest of the article body.  And this also means that when
we were in nntp_copybody() last time, it would have called
nntp_at_list_end(), which would have made sure the NNTP_NEW_CMD_OK
flag was set to false.

Now, the next line we reach is

ng.c:251:	    if (artsize < 0)
ng.c:252:		nntp_finishbody(FB_SILENT);

And if nntp_finishbody() was called, it would read and discard the rest
of the article and return NNTP_NEW_CMD_OK to true.  But this *does not
happen*, because artsize is not negative.

And now things go badly wrong.  The next thing that happens is

ng.c:253:	    if (datasrc->flags & DF_REMOTE) {
ng.c:254:		if (datasrc->act_sf.fp || getngsize(ngptr) > lastart) {

And getngsize() does this:

ngdata.c:342:    if (!find_actgrp(gp->rc->datasrc,tmpbuf,nam,len,gp->ngmax)) {

And find_actgrp() does this:

datasrc.c:502:    if ((dp->flags & DF_USELISTACT)
datasrc.c:503:     && (DATASRC_NNTP_FLAGS(dp) & NNTP_NEW_CMD_OK)) {

This condition is supposed to be true, after which it will do

datasrc.c:506:	switch (nntp_list("active", nam, len)) {

But because we are stuck in the middle of reading an article, the
NNTP_NEW_CMD_OK flag is actually false.  This causes find_actgrp()
to fail to find the desired information, and ultimately return 0,
indicating this failure.

And getngsize() thinks that if find_actgrp() couldn't find the group
then it must not exist, and silently unsubscribes.

ngdata.c:343:	if (gp->subscribechar == ':') {
ngdata.c:344:	    gp->subscribechar = NEGCHAR;
ngdata.c:345:	    gp->rc->flags |= RF_RCCHANGED;
ngdata.c:346:	    newsgroup_toread--;
ngdata.c:347:	}



So that's what goes wrong.  What I don't know is how to fix it.
One possibility is that in this sequence of events, artsize is
supposed to end up negative, causing nntp_finishbody() to be called
at the line above.

But it doesn't seem all that likely to me that this is really the
right thing to do.  The problem arises in the first place when
wanted() returns (at artsrch.c line 445) part way through reading
the article -- it would seem more appropriate to handle it there.
Or maybe at the end of kill_unwanted().  In either case we might
want to call artclose() rather than calling nntp_finishbody()
directly.  Or something.  I dunno.

Help?


Oh yes, I said there would be a side comment.  While working on
this, I had occasion to look at the code for nntp_finish_list().
This function contains a triple nested loop -- do-while, while,
while -- but the middle one of the three levels serves no purpose
whatever.  If you delete the first "while" header and the corre-
sponding braces, it will work exactly as before, and the next
person who needs to read and understand it may have an easier time.
-- 
Mark Brader   |   "I'm a little worried about the bug-eater", she said.
Toronto       |   "We're embedded in bugs, have you noticed?"
msb@vex.net   |                          -- Niven, "The Integral Trees"


-------------------------------------------------------
This SF.net email is sponsored by OSDN developer relations
Here's your chance to show off your extensive product knowledge
We want to know what you know. Tell us and you have a chance to win $100
http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54

[Index of Archives]     [Photo]     [Yosemite]     [Epson Inkjet]     [Mhonarc]     [Nntpcache]

  Powered by Linux