Richard, this sounds like a bug to the kernel. Not sure if it is what your checking but your use of the file system is a bit heavy and it might be your up against uncharted effects. I would suggest switching to ext2 and see if the problem stops. That would prove the problem is a ext3 problem. Also if you are not used to using the Redhat Bugzilla service I can help you get started. On Fri, 6 Sep 2002, Richard Eames wrote: > Any help with this problem would be very much appreciated (even "it's not > 7.3 or ext3 pointers, look somewhere else"). > > I've seen a similar post to ext3-users, but since that one received no reply > and I'm not convinced it's a ext3 problem (it only appears on our 7.3 hosts) > , I'm CCing to the valhalla list. > > We have the same problem on ALL our Redhat 7.3 machines (various dual > processor Dell 2400, 2500, 2600 machines with RAID cards). There appears to > be no consistent time it starts (except that it's always out of business > hours 9-5) but once it does the machine eventually dies. At first we though > it was our amanda backups, but they're not always running when it starts. > We've tightened the machines down as far as we can afford and patched one to > the latest rpm and kernel versions available from redhat but no joy. Once it > starts the file opens become unreliable and syslog and other processes that > rely on sockets start to behave in strange ways. At one stage, on our amanda > host the amanda backup kept going long after everything else stopped > working, until it needed to rename the log files and then it died too > (amanda keeps its log file open the entire time until the end, unlike > syslog). > > I've read through the archives for ext3 and valhalla and only found one > email concerning this problem (no reply) and looked through the Redhat > errata, and google etc. I've also checked the proc filesystem and can't find > any large numbers in inode-nr etc. > > The only way I've found to get rid of the problem is a reboot. > > > Here's a copy of the sar output from one host. Note the interesting > dentunusd values at one stage. > > 00:01:01 dentunusd file-sz %file-sz inode-sz super-sz %super-sz > dquot-sz %dquot-sz rtsig-sz %rtsig-sz > 01:01:01 351831 506 0.24 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:06:01 351832 469 0.22 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:11:01 351832 507 0.24 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:16:01 351833 507 0.24 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:21:01 351834 467 0.22 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:26:01 351835 508 0.24 4252446106 0 0.00 > 0 0.00 2 0.20 > 01:31:01 4294965457 461 0.22 4251971351 0 0.00 > 0 0.00 2 0.20 > 01:36:01 4294965457 460 0.22 4251971351 0 0.00 > 0 0.00 2 0.20 > 01:41:01 4294965461 459 0.22 4251971356 0 0.00 > 0 0.00 2 0.20 > * > * deleted to save bandwidth > * > 03:36:01 4294966740 509 0.24 4251971696 0 0.00 > 0 0.00 2 0.20 > 03:41:01 4294966741 508 0.24 4251971696 0 0.00 > 0 0.00 2 0.20 > 03:46:01 4294965710 468 0.22 4251971527 0 0.00 > 0 0.00 2 0.20 > 03:51:01 4294965736 507 0.24 4251971527 0 0.00 > 0 0.00 2 0.20 > 03:56:01 4294965752 509 0.24 4251971539 0 0.00 > 0 0.00 2 0.20 > 04:01:00 4294965763 508 0.24 4251971546 0 0.00 > 0 0.00 2 0.20 > 04:06:00 227450 470 0.22 4252135348 0 0.00 > 0 0.00 2 0.20 > 04:11:00 227935 470 0.22 4251950501 0 0.00 > 0 0.00 2 0.20 > 04:16:01 203080 472 0.23 4251887721 > > > And another host (note how fast it happens, it's not a gradual build up). > > 00:01:00 dentunusd file-sz %file-sz inode-sz super-sz %super-sz > dquot-sz %dquot-sz rtsig-sz %rtsig-sz > * > * boring stuff edited out > * > 04:50:59 64932 992 0.95 61428 0 0.00 > 0 0.00 2 0.20 > 04:55:59 64947 992 0.95 61442 0 0.00 > 0 0.00 2 0.20 > 05:01:01 64970 992 0.95 61461 0 0.00 > 0 0.00 2 0.20 > 05:06:01 65098 983 0.94 61312 0 0.00 > 0 0.00 2 0.20 > 05:11:01 65121 983 0.94 61314 0 0.00 > 0 0.00 2 0.20 > 05:16:01 68 977 0.93 4294960298 0 0.00 > 0 0.00 3 0.29 > 05:21:01 622 992 0.95 4294960717 0 0.00 > 0 0.00 2 0.20 > 05:26:01 1252 1153 1.10 4294961116 0 0.00 > 0 0.00 1 0.10 > 05:31:01 1500 1175 1.12 4294961376 0 0.00 > 0 0.00 1 0.10 > 05:36:01 1499 1160 1.11 4294961380 0 0.00 > 0 0.00 1 0.10 > 05:41:01 1500 1160 1.11 4294961380 0 0.00 > 0 0.00 1 0.10 > 05:46:01 1503 1175 1.12 4294961376 0 0.00 > 0 0.00 1 0 > > > > One very strange thing, the average line from sar for the last one is > > Average: 9306 842 0.80 4298 0 0.00 > 0 0.00 1 0.10 > > > But given that the sar file only has less than 50% of inode-sz values less > than 4 billion I'm a little perplexed by this line. > > > > > _______________________________________________ > Valhalla-list mailing list > Valhalla-list@redhat.com > https://listman.redhat.com/mailman/listinfo/valhalla-list > -- 72, Karl K5DI _ __ _ _ _ _ _ _ | | |_ _|| \| || | | | \ \/ / | |__ | | | .` || |_| | > < |____|__ ||_|\_|\____/ /_/\_\ _______________________________________________ Ext3-users@redhat.com https://listman.redhat.com/mailman/listinfo/ext3-users