Re: Jewel upgrade and feature set mismatch

Ilya Dryomov <idryomov@xxxxxxxxx> · Wed, 24 May 2017 14:31:19 +0200

On Wed, May 24, 2017 at 1:47 PM, Shain Miley <SMiley@xxxxxxx> wrote:
> Hello,
> We just upgraded from Hammer to Jewel, and after the cluster once again
> reported a healthy state I set the crush tunables to ‘optimal’ (from
> legacy).
> 12 hours later and the cluster is almost done with the pg remapping under
> the new rules.
>
> The issue I am having is the server where we mount the krbd images is
> showing errors in the kern.log:
>
> May 24 07:28:14 rbd1 kernel: [5600763.226208] libceph: osd192
> 10.35.1.235:6844 feature set mismatch, my 2b84a042a42 < server's
> 40002b84a042a42, missing 400000000000000
>
> And I can no longer list any of the mounted filesystems or unmap the rbd
> images, etc.
>
> My options seem to be:
>
> 1)set the tunables back to legacy and see if the rbd server starts
> responding.
>
> 2)upgrade the kernel on the rbd server to at least version 4.5 (currently
> using 3.18 on Ubuntu 14.04).

As per [1], I'd recommend upgrading to 4.9.z.

>
> 3)disable some features on our current images?

This is the "cluster" feature bit, not the image feature.  Don't enable
new image features just because they are there in jewel though ;)

>
> I would like to try option 2 first…but I am wondering if is safe to reboot
> the server with the rbd images still mapped…is there any chance of data loss
> from an rbd image getting corrupted?

Take a look at /sys/kernel/debug/ceph/*/osdc.  If it's empty, there are
no in-flight requests and you should be able to cold reboot safely.  If
there is a lot of pending requests, the safest option is to revert the
tunables setting.

[1] http://docs.ceph.com/docs/master/start/os-recommendations/

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com