performance enhancements and other questions

dalgoda@ix.netcom.com (Mike Castle) · Sat, 7 Jun 1997 03:24:48 -0600 (MDT)

First, I just upgraded from 1.0.3 to 1.0.6.5.  I ended up having to delete
the cache.mmap in order for the program to work.  Otherwise the GROUP
command kept causing a SEGV in nntpcached.  Still testing to see how it
works.   Is this normal/expected?  I didn't see mention in any
documentation.  Btw, system is a Linux system with recent vintage kernel
and libraries.  I can boot in several configurations, and problem show  up
in them all.

-h option isn't mentioned in man page.  Minor annoyance.  Not yet used the
program long enough to suggest a patch.

Calling setproctitle (something I abhor with a passion anyway), before
getopt in main causes a segfault.  Simply moving it down after getopt fixes
the problem (again, under Linux).

Now, for the big thing.  Performance.

Take, for instance, the group misc.jobs.offered. 

On netcom, there are currently 45,000 posts in that group.  Now, I go and
fire up trn to read that group, and even through nntpcache, well, lets say
I've not yet been able to read the group.  

Now, nntpcached is pretty damned good for the smaller groups that I read
(ie, less than 3000 articles per group).  But once you get around 45000
articles, you start filling that directory up with a hell of a lot of files
(*_head and *_xover).  And, well, most file systems in the unix world just
aren't logarithmic when it comes to directory lookups.  As a result, it
takes forever to test for each _head file, and to create it once you get it
from the remote server.  _xover files are a bit better, due to the
consolidation of 512 articles per file.  

So, what I suggest taking a hint from INN and get rid of direct support for
HEAD.  

XOVER and HEAD have the same information.  Why duplicate it with both _head
and _xover files?   

Instead, use only _xover files, and whenever a HEAD request is made,
regenerate the header from the xover information.  This is what INN does.

Also, always provide xover information, even if the remote server doesn't
provide it.   Take header information from the remote server and generate
xover information from it.  

Basically, I would suggest the following path:

client calls HEAD.

nntpcache tries to regen header from xover information.  If it can, great.
If it can't, then try to get xover info from remote server.  

If remote server supports xover, great.  Populate the cache and fulfill
request.

If remote server does not support xover, then get head, turn into xover,
and populate cache.  At this point, depending on code design, either use
that header info to fulfill request or regen head from xover info.  Which
ever would be cleaner (I don't think performance would matter one way or
the other, and non convoluted code should be the deciding factor).

Also, xover is a streaming protocol.  It seems to me that if the remote
server supports xover, it would be more efficient to go ahead and try to
get multiple ones at once rather than just the ones the client requests.
Possibly even spawning off a subtask to do this.  Although a user written
process to prepopulate the database (I use a simple perl script I wrote
using a couple modules off CPAN, namely News::Newsrc and News::NNTPClient).

Also, it seems to me that the XHDR commands could be handled by using the
XOVER database rather than hitting the server.  Assuming, of course that
the xover information that the server provides is sufficient to meet the
request of the xhdr command.  The valid lists of required xover fields and
those of rfc1036 should be identical.

And finally, the nntpcache code comes with version 4 of Barber's Internet
draft for nntp extensions.  It expired last November.  Version 7, expiring
next November, is available.

mrc
-- 
       Mike Castle       Life is like a clock:  You can work constantly
  dalgoda@ix.netcom.com  and be right all the time, or not work at all
 [Note the 4 line .sig]  and be right at least twice a day.  -- mrc
    We are all of us living in the shadow of Manhattan.  -- Watchmen