Date: Fri, 4 Feb 2011 20:47:27 +0100
From: (Imed Chihi) ???? ?????? <imed.chihi@xxxxxxxxx>
To: redhat-list@xxxxxxxxxx
Subject: Re: RHEL4 Sun Java Messaging Server deadlock (was:
redhat-list Digest, Vol 84, Issue 3)
Message-ID:
<AANLkTikZOdvX-tTkd0Z133bJTh+ooqWQp50A+eaHKgX+@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset=UTF-8
all_unreclaimable is a flag which, when set, tells the virtual memory
daemons not to bother scanning pages in the zone in question in order
to try to free memory. Anyway, the DMA zone is insignificantly tiny
(16MB) that it cannot possibly have any effect in a 32GB machine.
By the way, there seems to be plenty of free HighMem memory, so the
problem cannot possibly be due to overcommit.
Based on the above, I could suggest two theories to explain what's happening:
1. you have a Normal zone starvation
Try to set vm.lower_zone_protection to something large enough like 100 MB:
sysctl -w vm.lower_zone_protection 100
If this theory is correct, then the setting should fix the issue.
2. you have a pagecache flushing storm
A huge size of dirty pages from the IO of large data sets would stall
the system while being sync'ed to disk. This typically occurs once
the pagecache size has grown to significant sizes. Mounting the
filesystem in sync mode (mount -oremount,sync /dev/device) would "fix"
the issue. However, synchronous IO is painfully slow, but the test
would at least tell where the problem is. If this turns out to be the
problem, then we could think of other less annoying options for a
bearable fix.
Good luck,
-Imed
It's baack...
/proc/sys/vm/lower_zone_protection:100
I don't think running for two months with synchronous I/O is an option.
Apr 26 11:08:26 myysumail kernel: cpu 23 hot: low 32, high 96, batch 16
Apr 26 11:08:26 myysumail kernel: cpu 23 cold: low 0, high 32, batch 16
Apr 26 11:08:26 myysumail kernel:
Apr 26 11:08:26 myysumail kernel: Free pages: 20938624kB (20896192kB
HighMem)
Apr 26 11:08:26 myysumail kernel: Active:1158012 inactive:1741131
dirty:6213 wri
teback:1 unstable:0 free:5234656 slab:162944 mapped:345466 pagetables:6398
Apr 26 11:08:26 myysumail kernel: DMA free:12528kB min:32kB low:64kB
high:96kB a
ctive:0kB inactive:0kB present:16384kB pages_scanned:0
all_unreclaimable? yes
Apr 26 11:08:26 myysumail kernel: protections[]: 0 398800 424400
Apr 26 11:08:26 myysumail kernel: Normal free:29904kB min:7976kB
low:15952kB hig
h:23928kB active:520188kB inactive:565376kB present:4014080kB
pages_scanned:0 al
l_unreclaimable? no
Apr 26 11:08:26 myysumail kernel: protections[]: 0 0 25600
Apr 26 11:08:26 myysumail kernel: HighMem free:20896192kB min:512kB
low:1024kB h
igh:1536kB active:4111860kB inactive:6399148kB present:31621120kB
pages_scanned:
0 all_unreclaimable? no
Apr 26 11:08:26 myysumail kernel: protections[]: 0 0 0
Apr 26 11:08:26 myysumail kernel: DMA: 4*4kB 6*8kB 3*16kB 2*32kB 3*64kB
3*128kB
2*256kB 0*512kB 1*1024kB 1*2048kB 2*4096kB = 12528kB
Apr 26 11:08:26 myysumail kernel: Normal: 3830*4kB 717*8kB 85*16kB
8*32kB 11*64k
B 7*128kB 6*256kB 4*512kB 0*1024kB 1*2048kB 0*4096kB = 29904kB
Apr 26 11:08:26 myysumail kernel: HighMem: 2904*4kB 1080*8kB 482*16kB
76*32kB 23
694*64kB 7005*128kB 9663*256kB 3151*512kB 705*1024kB 86*2048kB
3288*4096kB = 208
96192kB
Apr 26 11:08:26 myysumail kernel: 2632760 pagecache pages
Apr 26 11:08:26 myysumail kernel: Swap cache: add 0, delete 0, find 0/0,
race 0+
0
Apr 26 11:08:26 myysumail kernel: 0 bounce buffer pages
Apr 26 11:08:26 myysumail kernel: Free swap: 16777208kB
Apr 26 11:08:26 myysumail kernel: 8912896 pages of RAM
Apr 26 11:08:26 myysumail kernel: 7864320 pages of HIGHMEM
Apr 26 11:08:26 myysumail kernel: 597583 reserved pages
Apr 26 11:08:26 myysumail kernel: 1548546 pages shared
Apr 26 11:08:26 myysumail kernel: 0 pages swap cached
205 205 TS - 0 24 22 0.0 D start_this_handle pdflush
5021 5021 TS - 0 24 8 0.0 D journal_commit_trans kjournald
PID: 205 TASK: 81515830 CPU: 22 COMMAND: "pdflush"
#0 [814f3d44] rwsem_down_read_failed at 22d49de
#1 [814f3d98] add_wait_queue_exclusive at 2120dbb
#2 [814f3e5c] dio_bio_end_io at 217b638
#3 [814f3ed0] __pdflush at 21461be
#4 [814f3ee4] sync_sb_inodes at 2179f67
#5 [814f3f28] mpage_end_io_read at 217a485
#6 [814f3f38] dirty_writeback_centisecs_handler at 2145ab1
#7 [814f3f40] do_IRQ at 2107e0a
#8 [814f3f9c] __pdflush at 21462b5
#9 [814f3fd0] kthread_create at 2134227
#10 [814f3ff0] kernel_thread_helper at 21041f3
PID: 5021 TASK: 7fd2adf0 CPU: 8 COMMAND: "kjournald"
#0 [7e85fd84] rwsem_down_read_failed at 22d49de
#1 [7e85fd90] finish_wait at 2120ea5
#2 [7e85fda0] scheduler_tick at 211f273
#3 [7e85fdd8] add_wait_queue_exclusive at 2120dbb
#4 [7e85fe98] find_busiest_group at 211e8f2
#5 [7e85ff04] rwsem_down_read_failed at 22d4a09
#6 [7e85ff0c] finish_wait at 2120ea5
#7 [7e85ff1c] scheduler_tick at 211f273
#8 [7e85ff54] del_timer_sync at 212a271
#9 [7e85ffa4] schedule_tail at 211e12c
#10 [7e85fff0] kernel_thread_helper at 21041f3
If one of the user threads is involved, how can I identify it?
Thanks,
John
--
redhat-list mailing list
unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe
https://www.redhat.com/mailman/listinfo/redhat-list