On 9/11/2009 12:43 PM, Noriko Hosoi wrote:
On 09/10/2009 07:46 PM, Kevin Bowling wrote:
Hi,
I have been running FDS/389 on a F11 xen DomU for several months. I
use it as the backend for UNIX username/passwords and also for
redMine (a Ruby on Rails bug tracker) for http://www.gnucapplus.org/.
This VM would regularly lock up every week or so when 389 was still
called FDS. I've since upgraded to 389 by issuing 'yum upgrade' as
well as running the 'setup-...-.pl -u' script and now it barely goes
a day before crashing. When ldap crashes, the whole box basically
becomes unresponsive.
I left the Xen hardware console open to see what was up and the only
thing I could conclude was that 389 was crashing (if I issued a
service start it came back to life). Doing anything like a top or ls
will completely kill the box. Likewise, the logs show nothing at or
before the time of crash. I suspected too few file descriptors but
changing that to a very high number had no impact.
I was about to do a rip and replace with OpenLDAP which I use very
sucesessfully for our corporate systems but figured I ought to see if
anyone here can help or if I can submit any kind of meaningful bug
report first. I assume I will need to run 389's slapd without
daemonizing it and hope it spits something useful out to stderr. Any
advice here would be greatly appreciated, as would any success
stories of using 389 on F11.
Hello Kevin,
You specified the platform "F11 xen DomU". Did you have a chance to
run the 389 server on any other platforms? I'm wondering if the crash
is observed only on the specific platform or not. Is the server
running on the 64-bit machine or 32-bit?
If you start the server with "-d 1" option, the server will run as the
trace mode. (E.g., /usr/lib[64]/dirsrv/slapd-YOURID/start-slapd -d 1)
I'm afraid it might be a memory leak. When you restart the 389
server, could you check the size of ns-slapd some time like every hour
and see if the server size keeps growing or stops? Also, the server
quits if it fails to write to the errors log. If it happens, it's
logged in the system log. Does the messages file on the system
happen to have some logs related to the 389 server?
Thanks,
--noriko
I'm not subscribed to the list so please CC.
Regards,
Kevin Bowing
It was stable for 17 days while running with debug enabled to console.
I upgraded to the F11 2.6.30 kernel rebase, and now I get some debugging
info on the console. I'm taking a wild guess that it is timing
related. Where should I place a bug report?
Regards,
Kevin
[root@buildbox-a2 ~]# xm console 8
INFO: task kjournald:61 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff88003e932000 0 61 2
ffff88003e919d40 0000000000000246 ffffffff8100e45c 0000000000000000
000000001cee5db8 ffff88003e919d20 ffffffff8100ee82 0000000000000202
ffff88003e9c83a8 000000000000e2e8 ffff88003e9c83a8 0000000000012d00
Call Trace:
[<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
[<ffffffff8100ee82>] ? check_events+0x12/0x20
[<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
[<ffffffff81496bf6>] schedule+0x21/0x49
[<ffffffff811b8b33>] journal_commit_transaction+0x13d/0xe42
[<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
[<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
[<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
[<ffffffff811bcdf7>] kjournald+0xfd/0x253
[<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
[<ffffffff811bccfa>] ? kjournald+0x0/0x253
[<ffffffff81070709>] kthread+0x6d/0xae
[<ffffffff8101313a>] child_rip+0xa/0x20
[<ffffffff81012afd>] ? restore_args+0x0/0x30
[<ffffffff81013130>] ? child_rip+0x0/0x20
INFO: task ns-slapd:1034 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd D ffffc20000000000 0 1034 1
ffff88003dd87908 0000000000000282 ffff88003dd87868 ffffffff8100ed0d
ffff88003dd86000 00000000e59205a0 ffff88003dd87888 ffffffff8107957a
ffff88003d4fe0e8 000000000000e2e8 ffff88003d4fe0e8 0000000000012d00
Call Trace:
[<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x32
[<ffffffff8107957a>] ? clocksource_read+0x22/0x38
[<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
[<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
[<ffffffff81496bf6>] schedule+0x21/0x49
[<ffffffff81496c62>] io_schedule+0x44/0x6c
[<ffffffff8113b141>] sync_buffer+0x53/0x6b
[<ffffffff81497294>] __wait_on_bit_lock+0x55/0xb2
[<ffffffff810d2d1f>] ? find_get_page+0x64/0xa3
[<ffffffff8149736e>] out_of_line_wait_on_bit_lock+0x7d/0x9c
[<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
[<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
[<ffffffff8113b380>] __lock_buffer+0x3d/0x53
[<ffffffff811b6eda>] lock_buffer+0x49/0x64
[<ffffffff811b7a15>] do_get_write_access+0x82/0x3f3
[<ffffffff811bbdb3>] ? journal_add_journal_head+0xce/0x162
[<ffffffff811b7dc0>] journal_get_write_access+0x3a/0x65
[<ffffffff8118c209>] __ext3_journal_get_write_access+0x34/0x74
[<ffffffff8117e464>] ext3_reserve_inode_write+0x50/0xaa
[<ffffffff8117e50d>] ext3_mark_inode_dirty+0x4f/0x80
[<ffffffff8117e6b8>] ext3_dirty_inode+0x79/0xa7
[<ffffffff81135095>] __mark_inode_dirty+0x45/0x190
[<ffffffff81129603>] file_update_time+0xc0/0x113
[<ffffffff810eb167>] do_wp_page+0x610/0x658
[<ffffffff81INFO: task kjournald:61 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald D ffff88003e932000 0 61 2
ffff88003e919d40 0000000000000246 ffffffff8100e45c 0000000000000000
000000001cee5db8 ffff88003e919d20 ffffffff8100ee82 0000000000000202
ffff88003e9c83a8 000000000000e2e8 ffff88003e9c83a8 0000000000012d00
Call Trace:
[<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
[<ffffffff8100ee82>] ? check_events+0x12/0x20
[<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
[<ffffffff81496bf6>] schedule+0x21/0x49
[<ffffffff811b8b33>] journal_commit_transaction+0x13d/0xe42
[<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
[<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
[<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
[<ffffffff811bcdf7>] kjournald+0xfd/0x253
[<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
[<ffffffff811bccfa>] ? kjournald+0x0/0x253
[<ffffffff81070709>] kthread+0x6d/0xae
[<ffffffff8101313a>] child_rip+0xa/0x20
[<ffffffff81012afd>] ? restore_args+0x0/0x30
[<ffffffff81013130>] ? child_rip+0x0/0x20
INFO: task ns-slapd:1034 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd D ffffc20000000000 0 1034 1
ffff88003dd87908 0000000000000282 ffff88003dd87868 ffffffff8100ed0d
ffff88003dd86000 00000000e59205a0 ffff88003dd87888 ffffffff8107957a
ffff88003d4fe0e8 000000000000e2e8 ffff88003d4fe0e8 0000000000012d00
Call Trace:
[<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x32
[<ffffffff8107957a>] ? clocksource_read+0x22/0x38
[<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
[<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
[<ffffffff81496bf6>] schedule+0x21/0x49
[<ffffffff81496c62>] io_schedule+0x44/0x6c
[<ffffffff8113b141>] sync_buffer+0x53/0x6b
[<ffffffff81497294>] __wait_on_bit_lock+0x55/0xb2
[<ffffffff810d2d1f>] ? find_get_page+0x64/0xa3
[<ffffffff8149736e>] out_of_line_wait_on_bit_lock+0x7d/0x9c
[<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
[<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
[<ffffffff8113b380>] __lock_buffer+0x3d/0x53
[<ffffffff811b6eda>] lock_buffer+0x49/0x64
[<ffffffff811b7a15>] do_get_write_access+0x82/0x3f3
[<ffffffff811bbdb3>] ? journal_add_journal_head+0xce/0x162
[<ffffffff811b7dc0>] journal_get_write_access+0x3a/0x65
[<ffffffff8118c209>] __ext3_journal_get_write_access+0x34/0x74
[<ffffffff8117e464>] ext3_reserve_inode_write+0x50/0xaa
[<ffffffff8117e50d>] ext3_mark_inode_dirty+0x4f/0x80
[<ffffffff8117e6b8>] ext3_dirty_inode+0x79/0xa7
[<ffffffff81135095>] __mark_inode_dirty+0x45/0x190
[<ffffffff81129603>] file_update_time+0xc0/0x113
[<ffffffff810eb167>] do_wp_page+0x610/0x658
[<ffffffff8100bc21>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
[<ffffffff810eccd9>] handle_mm_fault+0x6a2/0x72e
[<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
[<ffffffff8149be99>] do_page_fault+0x226/0x24f
[<ffffffff81499965>] page_fault+0x25/0x30
INFO: task ns-slapd:1040 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd D ffff88003e932024 0 1040 1
ffff88003bc119f8 0000000000000282 ffffffff8100e45c ffffc20000025410
00000000f1efb74c ffff88003bc119d8 ffffffff8100ee82 0000000000000004
ffff88003bc0b248 000000000000e2e8 ffff88003bc0b248 0000000000012d00
Call Trace:
[<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
[<ffffffff8100ee82>] ? check_events+0x12/0x20
[<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
[<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
[<ffffffff8100ee82>] ? check_events+0x12/0x20
[<ffffffff81496bf6>] schedule+0x21/0x49
[<ffffffff811b841d>] start_this_handle+0x2d4/0x373
[<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
[<ffffffff811b865d>] journal_start+0xb7/0x106
[<ffffffff81187903>] ext3_journal_start_sb+0x62/0x78
[<ffffffff8117d60b>] ext3_journal_start+0x28/0x3e
[<ffffffff8117e67d>] ext3_dirty_inode+0x3e
--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users