Hi Sage, On Fri, Oct 29, 2010 at 1:08 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > On Fri, 29 Oct 2010, Henry C Chang wrote: >> Hi, >> >> getattr on mds hanged again. >> >> I have already reverted d91f2438d881514e4a923fd786dbd94b764a9440. >> Although the probability is significant lowered down, it still has the >> chance to hang on getattr. >> >> Attached are the logs of mds and the hanging client. :( >> >> I'm using ceph-client-standalone master-backport branch on 2.6.32 kernel. > > It looks like ceph_check_caps is hung somehow: > > ceph:ceph: handle_caps from mds0 > ceph:ceph: mds0 seq 99 cap seq 28 > ceph:ceph: op revoke ino 10000000bd1.fffffffffffffffe inode > ffff8800a6251d88 > ceph:ceph: handle_cap_grant inode ffff8800a6251d88 cap ffff8800a635b780 > mds0 seq 28 pAsLsXsFr > ceph:ceph: size 4294967296 max_size 8594128896, i_size 4294967296 > ceph:ceph: try_nonblocking_invalidate ffff8800a6251d88 success > ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780 > issued pAsLsXsFscr > ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780 > issued pAsLsXsFscr > ceph:ceph: ffff8800a6251d88 mode 0100644 uid.gid 0.0 > ceph:ceph: my wanted = pAsxXsxFsxcrwb, used = pFcr, dirty - > ceph:ceph: revocation: pAsLsXsFscr -> pAsLsXsFr (revoking Fsc) > ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780 > issued pAsLsXsFr > ceph:ceph: check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFcr > dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl > AUTHONLY NODELAY > ceph:ceph: mds0 revoking Fsc > ceph:ceph: mdsc put_session ffff8800b41c6000 3 -> 2 > ceph:ceph: mdsc con_put ffff8800b41c6000 (2) > ceph:ceph: aio_read ffff8800a6251d88 10000000bd1.fffffffffffffffe > dropping cap refs on Fcr = 512 > ceph:ceph: put_cap_refs ffff8800a6251d88 had Fcr last > ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780 > issued pAsLsXsFr > ceph:ceph: check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFc > dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl > ceph:ceph: check_caps trying to invalidate on ffff8800a6251d88 > ceph:ceph: try_nonblocking_invalidate ffff8800a6251d88 failed > ceph:ceph: check_caps queuing invalidate > > --> this means queue_invalidate = 1, and check_caps will call > ceph_queue_invalidate on exit, which will always print something... > > ceph:ceph: __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780 issued pAsLsXsFr > ceph:ceph: check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFc > dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl > ceph:ceph: mds0 revoking Fsc > ceph:ceph: __cap_delay_cancel ffff8800a6251d88 > > ...but that never happens. Probably the CPU got blocked somewhere? Can > you see what the system is doing at this point? sysrq-t, or check the > process list for ceph-msgr and cat it's stack (/proc/$pid/stack)? The > task should be blocked in ceph_check_caps() somewhere... it didn't go to ceph_queue_invalidate because delayed and is_delayed = 0. if (delayed && is_delayed) force_requeue = 1; /* __send_cap delayed release; requeue */ if (!delayed && !is_delayed) __cap_delay_cancel(mdsc, ci); else if (!is_delayed || force_requeue) __cap_delay_requeue(mdsc, ci); spin_unlock(&inode->i_lock); if (queue_invalidate) ceph_queue_invalidate(inode); so it goes to __cap_delay_cancel(). Should I said, the correct behavior is to go to ceph_queue_invalidate(inode)? > > (BTW, if you're building your own kernel, one thing that I've found > helpful is enabling the CONFIG_PRINTK_TIME option in .config, and updating > kernel/printk.c to also include current->pid in the line prefix. That > helps sort out what tasks are doing what when. But if you're stuck on > 2.6.32 for some reason that probably not the case!) > > Thanks! > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html