Re: mds getattr locked again

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sage,

On Fri, Oct 29, 2010 at 1:08 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Fri, 29 Oct 2010, Henry C Chang wrote:
>> Hi,
>>
>> getattr on mds hanged again.
>>
>> I have already reverted d91f2438d881514e4a923fd786dbd94b764a9440.
>> Although the probability is significant lowered down, it still has the
>> chance to hang on getattr.
>>
>> Attached are the logs of mds and the hanging client. :(
>>
>> I'm using ceph-client-standalone master-backport branch on 2.6.32 kernel.
>
> It looks like ceph_check_caps is hung somehow:
>
> ceph:ceph:  handle_caps from mds0
> ceph:ceph:   mds0 seq 99 cap seq 28
> ceph:ceph:   op revoke ino 10000000bd1.fffffffffffffffe inode
> ffff8800a6251d88
> ceph:ceph:  handle_cap_grant inode ffff8800a6251d88 cap ffff8800a635b780
> mds0 seq 28 pAsLsXsFr
> ceph:ceph:   size 4294967296 max_size 8594128896, i_size 4294967296
> ceph:ceph:  try_nonblocking_invalidate ffff8800a6251d88 success
> ceph:ceph:  __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780
> issued pAsLsXsFscr
> ceph:ceph:  __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780
> issued pAsLsXsFscr
> ceph:ceph:  ffff8800a6251d88 mode 0100644 uid.gid 0.0
> ceph:ceph:   my wanted = pAsxXsxFsxcrwb, used = pFcr, dirty -
> ceph:ceph:  revocation: pAsLsXsFscr -> pAsLsXsFr (revoking Fsc)
> ceph:ceph:  __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780
> issued pAsLsXsFr
> ceph:ceph:  check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFcr
> dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl
> AUTHONLY NODELAY
> ceph:ceph:   mds0 revoking Fsc
> ceph:ceph:  mdsc put_session ffff8800b41c6000 3 -> 2
> ceph:ceph:  mdsc con_put ffff8800b41c6000 (2)
> ceph:ceph:  aio_read ffff8800a6251d88 10000000bd1.fffffffffffffffe
> dropping cap refs on Fcr = 512
> ceph:ceph:  put_cap_refs ffff8800a6251d88 had Fcr last
> ceph:ceph:  __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780
> issued pAsLsXsFr
> ceph:ceph:  check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFc
> dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl
> ceph:ceph:  check_caps trying to invalidate on ffff8800a6251d88
> ceph:ceph:  try_nonblocking_invalidate ffff8800a6251d88 failed
> ceph:ceph:  check_caps queuing invalidate
>
> --> this means queue_invalidate = 1, and check_caps will call
> ceph_queue_invalidate on exit, which will always print something...
>
> ceph:ceph:  __ceph_caps_issued ffff8800a6251d88 cap ffff8800a635b780 issued pAsLsXsFr
> ceph:ceph:  check_caps ffff8800a6251d88 file_want pAsxXsxFsxcrwb used pFc
> dirty - flushing - issued pAsLsXsFr revoking Fsc retain pAsxLsxXsxFsxcrwbl
> ceph:ceph:   mds0 revoking Fsc
> ceph:ceph:  __cap_delay_cancel ffff8800a6251d88
>
> ...but that never happens.  Probably the CPU got blocked somewhere?  Can
> you see what the system is doing at this point?  sysrq-t, or check the
> process list for ceph-msgr and cat it's stack (/proc/$pid/stack)?  The
> task should be blocked in ceph_check_caps() somewhere...

it didn't go to ceph_queue_invalidate because delayed and is_delayed = 0.

         if (delayed && is_delayed)
                 force_requeue = 1;   /* __send_cap delayed release; requeue */
         if (!delayed && !is_delayed)
                 __cap_delay_cancel(mdsc, ci);
         else if (!is_delayed || force_requeue)
                 __cap_delay_requeue(mdsc, ci);

         spin_unlock(&inode->i_lock);

         if (queue_invalidate)
                 ceph_queue_invalidate(inode);

so it goes to __cap_delay_cancel().

Should I said, the correct behavior is to go to ceph_queue_invalidate(inode)?


>
> (BTW, if you're building your own kernel, one thing that I've found
> helpful is enabling the CONFIG_PRINTK_TIME option in .config, and updating
> kernel/printk.c to also include current->pid in the line prefix.  That
> helps sort out what tasks are doing what when.  But if you're stuck on
> 2.6.32 for some reason that probably not the case!)
>
> Thanks!
> sage
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux