Re: kernel crash after 'ceph: mds0 caps stale' and 'mds0 hung' -- issue with timestamps or HVM virtualization on EC2?

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 9 Feb 2015 12:51:30 -0800

On Mon, Feb 9, 2015 at 11:58 AM, Christopher Armstrong
<chris@xxxxxxxxxxxx> wrote:
> Hi folks,
>
> One of our users is seeing machine crashes almost daily. He's using Ceph
> v0.87 giant, and is seeing this crash:
> https://gist.githubusercontent.com/ianblenke/b74e5aa5547130ebc0fb/raw/c3eeab076310d149443fd6118113b9d94f176303/gistfile1.txt
>
> It seems easy to trigger this by rsyncing to the CephFS mount. We're using
> the kernel client here, so I'm wondering if it's related to this timestamp
> bug:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-January/045838.html

These are definitely not related.

> Does anyone have any insight into the crash? Some confirmation that it's
> related to system clocks/timestamps would be helpful.
>
> Another note is that we're using HVM virtualization on EC2. Not sure if
> people have run into this before or not.

Zheng might have some idea about these, but I'm guessing there's a
code issue and some deadlock with file capabilities.

If you can look at the MDS' admin socket and dump the ops in flight
and the session info that might be helpful too. ("ceph daemon mds.a
dump_ops_in_flight", etc)
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com