So I think I can reliably reproduce this crash from a ceph client.
```
```
root@kh08-8:~# ceph -s
cluster:
id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
health: HEALTH_OK
services:
mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
mgr: kh08-8(active)
mds: cephfs-1/1/1 up {0=kh09-8=up:active}, 1 up:standby
osd: 570 osds: 570 up, 570 in
```
then from a client try to mount aufs over cephfs:
```
mount -vvv -t aufs -o br=/cephfs=rw:/mnt/aufs=rw -o udba=reval none /aufs
```
Now watch as your ceph mds servers fail:
```
then from a client try to mount aufs over cephfs:
```
mount -vvv -t aufs -o br=/cephfs=rw:/mnt/aufs=rw -o udba=reval none /aufs
```
Now watch as your ceph mds servers fail:
```
root@kh08-8:~# ceph -s
cluster:
id: 9f58ee5a-7c5d-4d68-81ee-debe16322544
health: HEALTH_WARN
insufficient standby MDS daemons available
services:
mon: 3 daemons, quorum kh08-8,kh09-8,kh10-8
mgr: kh08-8(active)
mds: cephfs-1/1/1 up {0=kh10-8=up:active(laggy or crashed)}
```
I am now stuck in a degraded and I can't seem to get them to start again.
I am now stuck in a degraded and I can't seem to get them to start again.
On Mon, Apr 30, 2018 at 5:06 PM, Sean Sullivan <lookcrabs@xxxxxxxxx> wrote:
I had 2 MDS servers (one active one standby) and both were down. I took a dumb chance and marked the active as down (it said it was up but laggy). Then started the primary again and now both are back up. I have never seen this before I am also not sure of what I just did.On Mon, Apr 30, 2018 at 4:32 PM, Sean Sullivan <lookcrabs@xxxxxxxxx> wrote:I was creating a new user and mount point. On another hardware node I mounted CephFS as admin to mount as root. I created /aufstest and then unmounted. From there it seems that both of my mds nodes crashed for some reason and I can't start them any more.
https://pastebin.com/1ZgkL9fa -- my mds log
I have never had this happen in my tests so now I have live data here. If anyone can lend a hand or point me in the right direction while troubleshooting that would be a godsend!
I tried cephfs-journal-tool inspect and it reports that the journal should be fine. I am not sure why it's crashing:/home/lacadmin# cephfs-journal-tool journal inspectOverall journal integrity: OK
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com