Re: [389-users] 389 unusable on F11?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/11/2009 12:43 PM, Noriko Hosoi wrote:
On 09/10/2009 07:46 PM, Kevin Bowling wrote:
Hi,

I have been running FDS/389 on a F11 xen DomU for several months. I use it as the backend for UNIX username/passwords and also for redMine (a Ruby on Rails bug tracker) for http://www.gnucapplus.org/.

This VM would regularly lock up every week or so when 389 was still called FDS. I've since upgraded to 389 by issuing 'yum upgrade' as well as running the 'setup-...-.pl -u' script and now it barely goes a day before crashing. When ldap crashes, the whole box basically becomes unresponsive.

I left the Xen hardware console open to see what was up and the only thing I could conclude was that 389 was crashing (if I issued a service start it came back to life). Doing anything like a top or ls will completely kill the box. Likewise, the logs show nothing at or before the time of crash. I suspected too few file descriptors but changing that to a very high number had no impact.

I was about to do a rip and replace with OpenLDAP which I use very sucesessfully for our corporate systems but figured I ought to see if anyone here can help or if I can submit any kind of meaningful bug report first. I assume I will need to run 389's slapd without daemonizing it and hope it spits something useful out to stderr. Any advice here would be greatly appreciated, as would any success stories of using 389 on F11.
Hello Kevin,

You specified the platform "F11 xen DomU". Did you have a chance to run the 389 server on any other platforms? I'm wondering if the crash is observed only on the specific platform or not. Is the server running on the 64-bit machine or 32-bit?

If you start the server with "-d 1" option, the server will run as the trace mode. (E.g., /usr/lib[64]/dirsrv/slapd-YOURID/start-slapd -d 1)

I'm afraid it might be a memory leak. When you restart the 389 server, could you check the size of ns-slapd some time like every hour and see if the server size keeps growing or stops? Also, the server quits if it fails to write to the errors log. If it happens, it's logged in the system log. Does the messages file on the system happen to have some logs related to the 389 server?

Thanks,
--noriko

I'm not subscribed to the list so please CC.

Regards,

Kevin Bowing

It was stable for 17 days while running with debug enabled to console. I upgraded to the F11 2.6.30 kernel rebase, and now I get some debugging info on the console. I'm taking a wild guess that it is timing related. Where should I place a bug report?

Regards,
Kevin

[root@buildbox-a2 ~]# xm console 8
INFO: task kjournald:61 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88003e932000     0    61      2
 ffff88003e919d40 0000000000000246 ffffffff8100e45c 0000000000000000
 000000001cee5db8 ffff88003e919d20 ffffffff8100ee82 0000000000000202
 ffff88003e9c83a8 000000000000e2e8 ffff88003e9c83a8 0000000000012d00
Call Trace:
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff8100ee82>] ? check_events+0x12/0x20
 [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff811b8b33>] journal_commit_transaction+0x13d/0xe42
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
 [<ffffffff811bcdf7>] kjournald+0xfd/0x253
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff811bccfa>] ? kjournald+0x0/0x253
 [<ffffffff81070709>] kthread+0x6d/0xae
 [<ffffffff8101313a>] child_rip+0xa/0x20
 [<ffffffff81012afd>] ? restore_args+0x0/0x30
 [<ffffffff81013130>] ? child_rip+0x0/0x20
INFO: task ns-slapd:1034 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd      D ffffc20000000000     0  1034      1
 ffff88003dd87908 0000000000000282 ffff88003dd87868 ffffffff8100ed0d
 ffff88003dd86000 00000000e59205a0 ffff88003dd87888 ffffffff8107957a
 ffff88003d4fe0e8 000000000000e2e8 ffff88003d4fe0e8 0000000000012d00
Call Trace:
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x32
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff81496c62>] io_schedule+0x44/0x6c
 [<ffffffff8113b141>] sync_buffer+0x53/0x6b
 [<ffffffff81497294>] __wait_on_bit_lock+0x55/0xb2
 [<ffffffff810d2d1f>] ? find_get_page+0x64/0xa3
 [<ffffffff8149736e>] out_of_line_wait_on_bit_lock+0x7d/0x9c
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
 [<ffffffff8113b380>] __lock_buffer+0x3d/0x53
 [<ffffffff811b6eda>] lock_buffer+0x49/0x64
 [<ffffffff811b7a15>] do_get_write_access+0x82/0x3f3
 [<ffffffff811bbdb3>] ? journal_add_journal_head+0xce/0x162
 [<ffffffff811b7dc0>] journal_get_write_access+0x3a/0x65
 [<ffffffff8118c209>] __ext3_journal_get_write_access+0x34/0x74
 [<ffffffff8117e464>] ext3_reserve_inode_write+0x50/0xaa
 [<ffffffff8117e50d>] ext3_mark_inode_dirty+0x4f/0x80
 [<ffffffff8117e6b8>] ext3_dirty_inode+0x79/0xa7
 [<ffffffff81135095>] __mark_inode_dirty+0x45/0x190
 [<ffffffff81129603>] file_update_time+0xc0/0x113
 [<ffffffff810eb167>] do_wp_page+0x610/0x658
 [<ffffffff81INFO: task kjournald:61 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kjournald     D ffff88003e932000     0    61      2
 ffff88003e919d40 0000000000000246 ffffffff8100e45c 0000000000000000
 000000001cee5db8 ffff88003e919d20 ffffffff8100ee82 0000000000000202
 ffff88003e9c83a8 000000000000e2e8 ffff88003e9c83a8 0000000000012d00
Call Trace:
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff8100ee82>] ? check_events+0x12/0x20
 [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff811b8b33>] journal_commit_transaction+0x13d/0xe42
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff810632bc>] ? try_to_del_timer_sync+0x69/0x87
 [<ffffffff811bcdf7>] kjournald+0xfd/0x253
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff811bccfa>] ? kjournald+0x0/0x253
 [<ffffffff81070709>] kthread+0x6d/0xae
 [<ffffffff8101313a>] child_rip+0xa/0x20
 [<ffffffff81012afd>] ? restore_args+0x0/0x30
 [<ffffffff81013130>] ? child_rip+0x0/0x20
INFO: task ns-slapd:1034 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd      D ffffc20000000000     0  1034      1
 ffff88003dd87908 0000000000000282 ffff88003dd87868 ffffffff8100ed0d
 ffff88003dd86000 00000000e59205a0 ffff88003dd87888 ffffffff8107957a
 ffff88003d4fe0e8 000000000000e2e8 ffff88003d4fe0e8 0000000000012d00
Call Trace:
 [<ffffffff8100ed0d>] ? xen_clocksource_get_cycles+0x1c/0x32
 [<ffffffff8107957a>] ? clocksource_read+0x22/0x38
 [<ffffffff81074986>] ? ktime_get_ts+0x61/0x7d
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff81496c62>] io_schedule+0x44/0x6c
 [<ffffffff8113b141>] sync_buffer+0x53/0x6b
 [<ffffffff81497294>] __wait_on_bit_lock+0x55/0xb2
 [<ffffffff810d2d1f>] ? find_get_page+0x64/0xa3
 [<ffffffff8149736e>] out_of_line_wait_on_bit_lock+0x7d/0x9c
 [<ffffffff8113b0ee>] ? sync_buffer+0x0/0x6b
 [<ffffffff81070c5a>] ? wake_bit_function+0x0/0x5a
 [<ffffffff8113b380>] __lock_buffer+0x3d/0x53
 [<ffffffff811b6eda>] lock_buffer+0x49/0x64
 [<ffffffff811b7a15>] do_get_write_access+0x82/0x3f3
 [<ffffffff811bbdb3>] ? journal_add_journal_head+0xce/0x162
 [<ffffffff811b7dc0>] journal_get_write_access+0x3a/0x65
 [<ffffffff8118c209>] __ext3_journal_get_write_access+0x34/0x74
 [<ffffffff8117e464>] ext3_reserve_inode_write+0x50/0xaa
 [<ffffffff8117e50d>] ext3_mark_inode_dirty+0x4f/0x80
 [<ffffffff8117e6b8>] ext3_dirty_inode+0x79/0xa7
 [<ffffffff81135095>] __mark_inode_dirty+0x45/0x190
 [<ffffffff81129603>] file_update_time+0xc0/0x113
 [<ffffffff810eb167>] do_wp_page+0x610/0x658
 [<ffffffff8100bc21>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e
 [<ffffffff810eccd9>] handle_mm_fault+0x6a2/0x72e
 [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff8149be99>] do_page_fault+0x226/0x24f
 [<ffffffff81499965>] page_fault+0x25/0x30
INFO: task ns-slapd:1040 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ns-slapd      D ffff88003e932024     0  1040      1
 ffff88003bc119f8 0000000000000282 ffffffff8100e45c ffffc20000025410
 00000000f1efb74c ffff88003bc119d8 ffffffff8100ee82 0000000000000004
 ffff88003bc0b248 000000000000e2e8 ffff88003bc0b248 0000000000012d00
Call Trace:
 [<ffffffff8100e45c>] ? xen_force_evtchn_callback+0x20/0x36
 [<ffffffff8100ee82>] ? check_events+0x12/0x20
 [<ffffffff8100ee6f>] ? xen_restore_fl_direct_end+0x0/0x1
 [<ffffffff814993de>] ? _spin_unlock_irqrestore+0x4e/0x64
 [<ffffffff8100ee82>] ? check_events+0x12/0x20
 [<ffffffff81496bf6>] schedule+0x21/0x49
 [<ffffffff811b841d>] start_this_handle+0x2d4/0x373
 [<ffffffff81070bfb>] ? autoremove_wake_function+0x0/0x5f
 [<ffffffff811b865d>] journal_start+0xb7/0x106
 [<ffffffff81187903>] ext3_journal_start_sb+0x62/0x78
 [<ffffffff8117d60b>] ext3_journal_start+0x28/0x3e
 [<ffffffff8117e67d>] ext3_dirty_inode+0x3e


--
389 users mailing list
389-users@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-directory-users

[Index of Archives]     [Fedora Directory Users]     [Fedora Directory Devel]     [Fedora Announce]     [Fedora Legacy Announce]     [Kernel]     [Fedora Legacy]     [Share Photos]     [Fedora Desktop]     [PAM]     [Red Hat Watch]     [Red Hat Development]     [Big List of Linux Books]     [Gimp]     [Yosemite News]

  Powered by Linux