Trn 4.0-test76 says: # Send bug reports, suggestions, etc. to: trn-workers@lists.sourceforge.net Here's a bug report. And a side comment at the end. Back in 2000, on August 27 and November 26, I emailed the following bug report as part of an omnibus message to trn@clari.net: > Vex has trn 4.0-test72 (19 April 1999) installed. I normally read news > using this on a VT100-like terminal from sh. > > [0] There is a bug that from time to time randomly unsubscribes me from > one newsgroup. I believe it only happens to groups where I have a killfile, > and most often (maybe always), to ones where I have a large killfile > containing complicated regular expressions. A symptom that the bug is > about to hit is that the count of remaining articles in the newsgroup > goes negative -- if I notice that, I can "q" out of the group, then "-" > back to it, whereupon I'm offered the option of resubscribing. Otherwise, > of course, I first have to notice that I've been unsubscribed, then use > "g" to resubscribe, which is a bit of a pain. > > This bug was also present in the trn version previously installed here, > which I think was 4.0-test69, and it hit *more often* in that version than > in 4.0-test72. I don't remember whether the negative article count > applied there. Vex now has trn 4.0-test76 (Apr 2, 2001) installed, and this bug is still present in it. And I finally decided to try and find it myself. Over the last few weeks I have been trying to get it pinned down, and after a goodish number of hours spread over that time, I have found it. Eureka. However, although I know what goes wrong, I don't know how to fix it. So I turn to trn-workers for help. In the original bug report, the part about "more often" in the older version of trn is probably wrong; if there was a change that really affected the probability of the bug hitting, it was in my own killfile. The critical thing about the killfile is not the complexity of the regular expressions, but the fact that it kills based on the article body ("a" or "B" modifier). Also critical is that this is an NNTP-based installation. The "random" unsubscribing happens when - I'm reading a newsgroup - Before I get to the end of the group, additional news comes in and gets processed by my killfile - The last article that came in is killed by an entry in my killfile that uses an article-body modifier. You should be able to reproduce this as follows. (Some of the steps can be varied, I'm sure.) - Start trn, using an NNTP server. - Subscribe to a newsgroup with heavy traffic. - Enter the group and select a few threads. - Edit the killfile and add a line like /./a:j that will junk lots of articles (what I actually used when confirming the bug was /the/a:=:j). - Keep pressing space to read articles in a normal manner. - When the selected threads run out, watch the killfile junk the new articles that came in while you were reading. (If there weren't any yet, go back to "select a few threads" and read slower this time.) - Select and read another article. - Quit out of the newsgroup, then "-" back to it. - It will say that you're unsubscribed and ask if you want to resubscribe. The bug happens because, when the killfile pattern matches, trn immediately stops reading the article body from the NNTP server. The NNTP part of trn's code knows it didn't finish, and when another part of the code attempts a different NNTP transaction, things start going wrong. *** In more detail, THIS is what happens. *** We are reading a newsgroup and are inside the function do_newsgroup(). We get to the top of the article-selection loop: ng.c:243: for (; art <= lastart+1; ) { /* for each article */ ng.c:248: if ((art > lastart || forcegrow) && !keep_the_group_static) { and find that art == lastart+1, so it's time to check for more articles. These lines are executed: ng.c:253: if (datasrc->flags & DF_REMOTE) { ng.c:254: if (datasrc->act_sf.fp || getngsize(ngptr) > lastart) { And we find that there is one new article, so getngsize() > lastart. After calling nntp_group(), we then get to this: ng.c:259: grow_ng(ngptr->ngmax); Which brings us to this: ngdata.c:184: kill_unwanted(tmpfirst,buf,TRUE); Which brings us to this: kfile.c:365: intr = do_kfile(localkfp,entering); Which brings us to this: kfile.c:207: switch (art_search(bp, (sizeof buf) - (bp - buf), FALSE)) { Which brings us to this: artsrch.c:346: if (wanted(compex,art,howmuch)) { Now, in the problematic case, the particular killfile pattern that we're working on here has a modifier such as "a" or "B" that causes the whole article body to be scanned. And the article hasn't been parsed already. So inside wanted(), we will get to this line: artsrch.c:423: if (!artopen(artnum,(ART_POS)0)) Now artopen() calls artclose() in case there is already an article open, and then gets to this: artio.c:62: nntp_body(artnum); In the source for nntp_body(), there is an initial comment reading: nntp.c:179:/* copy the body of an article to a temporary file */ but despite this, what actually happens here is that it causes the article header to be copied to a temporary file. First it confirms that the article isn't already in a tempfile, then it allocates a tempfile for it, and then, since the article's header wasn't already parsed, it executes these lines: nntp.c:210: sprintf(ser_line, "ARTICLE %ld", (long)artnum); nntp.c:211: if (nntp_command(ser_line) <= 0) nntp.c:213: switch (nntp_check()) { Thus it is telling the NNTP server to send the article header and body, and checking the server's response, which is code 220 followed by the article number, message-ID, and "article retrieved - head and body follow". That's fine, so now we get to this loop: nntp.c:231: while (nntp_copybody(b, sizeof b, body_end+1) > 0) { nntp.c:232: if (*b == '\n' && body_end - prev_pos < sizeof b) nntp.c:233: break; nntp.c:234: prev_pos = body_end; nntp.c:235: } Again, despite the name nntp_copybody(), what actually happens *here* is that at each call to the function, one line is read from the header of the open article (and written to the tempfile, but I won't reproduce that code here). When we get to the empty line at the end of the header, the *b == '\n' test at line 232 succeeds and we leave the loop. Within nntp_copybody(), we read a line or part-line from the NNTP server by doing: nntp.c:258: found_nl = nntp_gets(s, limit-1); And if we got a complete line, we do: nntp.c:262: if (nntp_at_list_end(s)) { In effect, this tests whether the line we just got was ".", which indicates EOF on the article: nntpclient.c:239:bool nntpclient.c:240:nntp_at_list_end(s) nntpclient.c:241:char* s; nntpclient.c:242:{ nntpclient.c:243: if (!s || (*s == '.' && (s[1] == '\0' || s[1] == '\r'))) { nntpclient.c:244: nntplink.flags |= NNTP_NEW_CMD_OK; nntpclient.c:245: return 1; nntpclient.c:246: } nntpclient.c:247: nntplink.flags &= ~NNTP_NEW_CMD_OK; nntpclient.c:248: return 0; nntpclient.c:249:} Note that it sets the NNTP_NEW_CMD_OK flag to true if the "." (for EOF) *was* read, and false if it was not. Now, as I said, this all happens for each line of the header, then we hit the break at nntp.c line 233 above. nntp_body() then sets the NNTP_NEW_CMD_OK flag to false, presumably because it knows that not all of the article has been read, and returns. nntp.c:238: nntplink.flags &= ~NNTP_NEW_CMD_OK; nntp.c:239:} This brings us back into artopen(), in which we drop through to the end and return the pointer to the tempfile. This brings us back into wanted(). We next call artsrch.c:425: if (!parseheader(artnum)) which parses the article header, and then proceed to process the body. artsrch.c:428: /* loop through each line of the article */ artsrch.c:429: seekartbuf(htype[PAST_HEADER].minpos); artsrch.c:430: while ((s = readartbuf(FALSE)) != NULL) { I have not traced through readartbuf() in detail, but when it finds that the next line of the body has not been buffered, it causes readart() to be called. This in turn calls artio.c:154: return nntp_readart(s,limit); which again calls nntp.c:354: if (nntp_copybody(s, limit, body_pos+1) <= 0) and this time this causes a line of the article body to be read from the NNTP server. Remember this point! Now back in wanted(), this line is compared against the the killfile pattern that we're working on: artsrch.c:441: success = success || execute(compex,s) != NULL; artsrch.c:442: if (nlptr) artsrch.c:443: *nlptr = ch; artsrch.c:444: if (success && !in_sig) /* does it match? */ artsrch.c:445: return TRUE; /* say, "Eureka!" */ And wanted() returns true. NOTE that this happens within the article- reading loop, as soon as a line is found that matches the pattern. Okay, now we return to art_search() and perform the action specified by the killfile, which is to junk the article: artsrch.c:350: if (perform(cmdlst,output_level && page_line == 1) < 0) { And this was the last article, so art_search() then returns. Now kill_unwanted() will continue examining killfile entries via do_kfile(), but since there was only one new article and we already junked it, no further action will take place. So we return to grow_ng(), which also has no more to do, so we finally return to do_newsgroup(). We execute the following; I'm not sure what it does. ng.c:268: if (art != 0 || (artp && !(artp->flags & AF_TMPMEM))) ng.c:269: artp = article_find(art); Then we test ng.c:289: if (art > lastart) { /* are we off the end still? */ which is false. So we get to here and this happens: ng.c:357: else if (!reread && (was_read(art) ng.c:358: || (selected_only && !(artp->flags & AF_SEL)))) { ng.c:359: /* has this article been read? */ ng.c:360: inc_art(selected_only,FALSE);/* then skip it */ ng.c:361: continue; Which takes us back to here. ng.c:243: for (; art <= lastart+1; ) { /* for each article */ ng.c:244: set_mode('r','a'); ng.c:246: /* do we need to "grow" the newsgroup? */ ng.c:248: if ((art > lastart || forcegrow) && !keep_the_group_static) { ng.c:249: ART_NUM oldlast = lastart; Now, in case you didn't notice, the last transaction with the NNTP server was to read a line of the article body -- this happened at the words "Remember this point!" above. So the NNTP server is still waiting to give us the rest of the article body. And this also means that when we were in nntp_copybody() last time, it would have called nntp_at_list_end(), which would have made sure the NNTP_NEW_CMD_OK flag was set to false. Now, the next line we reach is ng.c:251: if (artsize < 0) ng.c:252: nntp_finishbody(FB_SILENT); And if nntp_finishbody() was called, it would read and discard the rest of the article and return NNTP_NEW_CMD_OK to true. But this *does not happen*, because artsize is not negative. And now things go badly wrong. The next thing that happens is ng.c:253: if (datasrc->flags & DF_REMOTE) { ng.c:254: if (datasrc->act_sf.fp || getngsize(ngptr) > lastart) { And getngsize() does this: ngdata.c:342: if (!find_actgrp(gp->rc->datasrc,tmpbuf,nam,len,gp->ngmax)) { And find_actgrp() does this: datasrc.c:502: if ((dp->flags & DF_USELISTACT) datasrc.c:503: && (DATASRC_NNTP_FLAGS(dp) & NNTP_NEW_CMD_OK)) { This condition is supposed to be true, after which it will do datasrc.c:506: switch (nntp_list("active", nam, len)) { But because we are stuck in the middle of reading an article, the NNTP_NEW_CMD_OK flag is actually false. This causes find_actgrp() to fail to find the desired information, and ultimately return 0, indicating this failure. And getngsize() thinks that if find_actgrp() couldn't find the group then it must not exist, and silently unsubscribes. ngdata.c:343: if (gp->subscribechar == ':') { ngdata.c:344: gp->subscribechar = NEGCHAR; ngdata.c:345: gp->rc->flags |= RF_RCCHANGED; ngdata.c:346: newsgroup_toread--; ngdata.c:347: } So that's what goes wrong. What I don't know is how to fix it. One possibility is that in this sequence of events, artsize is supposed to end up negative, causing nntp_finishbody() to be called at the line above. But it doesn't seem all that likely to me that this is really the right thing to do. The problem arises in the first place when wanted() returns (at artsrch.c line 445) part way through reading the article -- it would seem more appropriate to handle it there. Or maybe at the end of kill_unwanted(). In either case we might want to call artclose() rather than calling nntp_finishbody() directly. Or something. I dunno. Help? Oh yes, I said there would be a side comment. While working on this, I had occasion to look at the code for nntp_finish_list(). This function contains a triple nested loop -- do-while, while, while -- but the middle one of the three levels serves no purpose whatever. If you delete the first "while" header and the corre- sponding braces, it will work exactly as before, and the next person who needs to read and understand it may have an easier time. -- Mark Brader | "I'm a little worried about the bug-eater", she said. Toronto | "We're embedded in bugs, have you noticed?" msb@vex.net | -- Niven, "The Integral Trees" ------------------------------------------------------- This SF.net email is sponsored by OSDN developer relations Here's your chance to show off your extensive product knowledge We want to know what you know. Tell us and you have a chance to win $100 http://www.zoomerang.com/survey.zgi?HRPT1X3RYQNC5V4MLNSV3E54