Corrupt full osdmap on RBD Kernel Image mount (Jewel 10.2.2)

Ivan Grcic <ivan.grcic@xxxxxxxxx> · Wed, 24 Aug 2016 16:56:45 +0200

Dear Cephers,

For some time now I am running a small Ceph cluster made of 4OSD +
1MON Servers, and evaluating  possible Ceph usages in our storage
infrastructure. Until few weeks ago I was running Hammer release,
using mostly RBD Clients mounting replicated pool images. Everything
was running  stable.

Recently I updated to Jewel v10.2.2 (actually did a clean install).
Last week I tested  Ceph Tiering capabilities, backed by an erasure
Coded Pools. Everything was running just fine until one moment when I
couldn't mount my images anymore. I got following error when mounting
rbd:

sudo rbd map test
rbd: sysfs write failed
In some cases useful info is found in syslog - try "dmesg | tail" or so.
rbd: map failed: (5) Input/output error

Error is backed up by dumping a several MB of hex osdmap in
/var/log/syslog and /var/log/kern.log, filling several GB of logs in a
short time (dump attached)

I thought I must have done something wrong, while I did a lot of
testing and in the process I recreated lots of pools, and shuffled a
lot of data  because of testing various crushset combinations.

So I began from start, installing cluster again. Everything was
working for some time, but the error occurred again (also for normal
replicated pool)

Ceph Servers and clients are running Ubuntu 14.04 (kernel 3.19)

Cluster state is HEALTH_OK

RBD Images have new features disabled:
rbd -p ecpool feature disable test exclusive-lock, object-map,
fast-diff, deep-flatten

Does anyone have  some tips here, has similar already happened to
anyone? Should I just go on and do a v4.4 kernel update?

Thank you,
Ivan
Attachment:
ceph_osd_dump

Description: Binary data
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com