Re: RHEL4 Sun Java Messaging Server deadlock (was: redhat-list Digest, Vol 84, Issue 3)

(Imed Chihi) ØÙØØ ØÙØÙØÙ <imed.chihi@xxxxxxxxx> · Fri, 4 Feb 2011 20:47:27 +0100

> Date: Wed, 02 Feb 2011 13:14:02 -0500
> From: John Dalbec <jpdalbec@xxxxxxx>
> To: redhat-list@xxxxxxxxxx
> Subject: Re: RHEL4 Sun Java Messaging Server deadlock
> Message-ID: <4D499EEA.5000002@xxxxxxx>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> On 1/6/2011 12:00 PM, redhat-list-request@xxxxxxxxxx wrote:
>> Message: 3
>> Date: Wed, 5 Jan 2011 22:37:29 +0100
>> From: (Imed Chihi) ???? ?????? Â Â Â Â<imed.chihi@xxxxxxxxx>
>> To:redhat-list@xxxxxxxxxx
>> Subject: Re: RHEL4 Sun Java Messaging Server deadlock (was:
>> Â Â Â redhat-list Â Â Digest, Vol 83, Issue 3)
>>> > ÂDate: Tue, 04 Jan 2011 12:46:47 -0500
>>> > ÂFrom: John Dalbec<jpdalbec@xxxxxxx>
>>> > ÂTo:redhat-list@xxxxxxxxxx
>>> > ÂSubject: Re: redhat-list Digest, Vol 83, Issue 2
>>> >
>>> > ÂI got Alt+SysRq+t output, but it looks corrupted in /var/log/messages.
>>> > ÂI suspect that syslogd couldn't keep up with klogd and the ring buffer
>>> > Âwrapped. ?The system is not starved for CPU, but then I have 24 cores.
>>> > ÂIf 32-bit +> Â16GB is trouble then why does the kernel-hugemem package
>>> > Âeven exist? ?Or is that actually a 64-bit kernel?
>> The kernel-hugemem is a 32-bit-only kernel. ÂWhen RHEL 4 was released
>> (around February 2005), 64-bit systems we not that ubiquitous. ÂIt
>> still made sense to use a 32-bit OS with the typical 8 GB servers.
>>
>> The context has changed now, and Red Hat no longer support more than
>> 16 GB on 32-bit platforms as of RHEL 5.
>>
>> The output from Alt+SysRq+m should always fit in a kernel log buffer.
>> Try collecting that one to be certain that the issue is related to VM
>> management. ÂHowever, I'd seriously suggest moving to RHEL 4 for
>> x86_64 as you should still be able to run your 32-bit application,
>> minus a whole class of VM hassles.
>>
>> Â -Imed
>
> The SysRq-m output had some per-cpu stuff that I don't trust because
> every core was reporting the same numbers. ÂThe system-wide information
> follows. ÂUnder DMA it says "all_unreclaimable? yes" but there appears
> to be plenty of free space. ÂDo you see any problems?
> Thanks,
> John
>
> Free pages: Â Â26020480kB (25984832kB HighMem)
> Active:719264 inactive:873204 dirty:2982 writeback:0 unstable:0
> free:6505120 slab:198687 mapped:367369 pagetables:6741
> DMA free:12528kB min:32kB low:64kB high:96kB active:0kB inactive:0kB
> present:16384kB pages_scanned:0 all_unreclaimable? yes
> protections[]: 0 0 0
> Normal free:23120kB min:7976kB low:15952kB high:23928kB active:443852kB
> inactive:505268kB present:4014080kB pages_scanned:0 all_unreclaimable? no
> protections[]: 0 0 0
> HighMem free:25984832kB min:512kB low:1024kB high:1536kB
> active:2433204kB inactive:2987548kB present:31621120kB pages_scanned:0
> all_unreclaimable? no
> protections[]: 0 0 0
> DMA: 4*4kB 6*8kB 3*16kB 2*32kB 3*64kB 3*128kB 2*256kB 0*512kB 1*1024kB
> 1*2048kB 2*4096kB = 12528kB
> Normal: 1982*4kB 867*8kB 22*16kB 7*32kB 4*64kB 2*128kB 2*256kB 1*512kB
> 2*1024kB 2*2048kB 0*4096kB = 23120kB
> HighMem: 6*4kB 8975*8kB 22157*16kB 25695*32kB 44932*64kB 21426*128kB
> 9300*256kB 4706*512kB 3330*1024kB 1529*2048kB 1901*4096kB = 25984832kB
> 1293500 pagecache pages
> Swap cache: add 0, delete 0, find 0/0, race 0+0
> 0 bounce buffer pages
> Free swap: Â Â Â 16777208kB
> 8912896 pages of RAM
> 7864320 pages of HIGHMEM
> 597583 reserved pages
> 719482 pages shared
> 0 pages swap cached

all_unreclaimable is a flag which, when set, tells the virtual memory
daemons not to bother scanning pages in the zone in question in order
to try to free memory.  Anyway, the DMA zone is insignificantly tiny
(16MB) that it cannot possibly have any effect in a 32GB machine.

By the way, there seems to be plenty of free HighMem memory, so the
problem cannot possibly be due to overcommit.

Based on the above, I could suggest two theories to explain what's happening:

1. you have a Normal zone starvation
Try to set vm.lower_zone_protection to something large enough like 100 MB:
sysctl -w vm.lower_zone_protection 100
If this theory is correct, then the setting should fix the issue.

2. you have a pagecache flushing storm
A huge size of dirty pages from the IO of large data sets would stall
the system while being sync'ed to disk.  This typically occurs once
the pagecache size has grown to significant sizes.  Mounting the
filesystem in sync mode (mount -oremount,sync /dev/device) would "fix"
the issue.  However, synchronous IO is painfully slow, but the test
would at least tell where the problem is.  If this turns out to be the
problem, then we could think of other less annoying options for a
bearable fix.

Good luck,

 -Imed

-- 
Imed Chihi - ØÙØØ ØÙØÙØÙ
http://perso.hexabyte.tn/ichihi/

-- 
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list