Re: redhat 9 dies randomly

"kalin mintchev" <kalin@xxxxxx> · Tue, 10 Aug 2004 06:06:03 -0400 (EDT)

it's kinda good to know that i'm not alone. but the fact that there isn't
any clear solution to this is a bit worrisome.

i doubt it's a hardware too. and i kinda suspect that the cpu load and how
that load is managed by the system is the problem. the same machine was
running  qmail on redhat 7.2 for months before i put a new disk in it and
rebuild it with redhat 9. it didn't have spamd before and it didn't have
squirrelmail which executes to many imap commands every time it refreshes
the main inbox page.
i've seen 85 - 90% cpu load only refreshing SM's inbox. for about 5 - 6
seconds in a row. it works with courier-imap. and it's not the imap server
because with any other clients the cpu load is between 0.1 and 3 %. the
load with SM is mostly folder filtering commands...
so i think it has to do something with memory management at the time when
the cpu is almost 100% usage - imap + spamd. some threshold where the
system just freaks out and crashes. that's why i was wondering if there is
a way to know what was the cpu and memory usage at the time of the
crash...

and i don't run dyndns - it's BIND...

thanks...

> I posted yesterday a question about bad blocks in an xfs file system
> to summarize, I have a 1 TB xfs filesystem, and the kernel complains about
> unreadable sectors on the HDs. If i run the low-level tools (from western
> or maxtor), the disks are ok. when I rebuild the fs, everything is fine (I
> do a complete soft scan by dd'ing the whole fs to /dev/null). after a few
> days, I get new unreadable sectors errors again.
>
> two things make me think we have the same problem :
> - my machine also died unexpectadly (and rebooted due to some
> configuration
> option somewhere), during the last days before I reformated the disks
> - the errors (including reboots) seem to occur at moments where the system
> load is very high
>
> obviously this is NOT a h/w problem, because I actually have 2 identical
> machines which behave the same
>
> also these machines DO be loaded, they provide a 1 TB file space for
> nighlty backups, and can have a sustained io load of 20 MB/sec for hours
>
> my current idea is to reduce io speed on disks (either by deactivating
> UDMA, just lowering the UDMA level, or forcing it by using 40 wire cables
> instead of 80 wires ones)
>
> I'll let you know
> Please tell me what you're doing and which results you get
>
> A 20:16 09/08/2004 -0400, vous avez écrit :
>>> What exactly do you mean by dying?
>>crashing....  stops any processing...
>>
>>> Try turning off DMA on all your disk drives.  It craps
>>> out performance a bit, but if you have/had multiple ide devices and one
>>> of
>>> them doesn't like DMA it can make your machine just hang.
>>
>>ok. anything else besides these 2 of?
>>CONFIG_BLK_DEV_IDEDMA_PCI
>>CONFIG_BLK_DEV_IDEDMA
>>
>>>
>>> Wayner
>>
>>what about this:
>>also is there a way to find out what was the cpu load or memory usage at
>>the time of the crash?
>
> 			- * - * - * - * - * - * -
> Bien sûr que je suis perfectionniste !
> Mais ne pourrais-je pas l'être mieux ?
> 	Thierry ITTY
> eMail : Thierry.Itty@xxxxxxxxxxxx		FRANCE
>
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list
>

--
Software is like sex: It's better when it's free. (Linus Torvalds)

-- 
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list