Re: Jewel upgrade and feature set mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Thanks for all your help so far...very useful information indeed.


Here is the debug output from the file you referenced below:


root@rbd1:/sys/kernel/debug/ceph/504b5794-34bd-44e7-a8c3-0494cf800c23.client67751889# cat osdc 2311 osd144 3.1347f3bc rb.0.25f2ab0.238e1f29.000000000000 read 14216 osd65 3.bd82049c rb.0.1ae3061.238e1f29.000000000000 read 14391 osd44 3.875890a0 rb.0.fe307e.238e1f29.000000393889 set-alloc-hint,write 14560 osd61 3.1ab27784 rb.0.17d451c.238e1f29.000000131308 set-alloc-hint,write 14561 osd33 3.cc377593 rb.0.e411a0.238e1f29.0000001e007b set-alloc-hint,write 14568 osd192 3.1b4f6fbd rb.0.113e639.238e1f29.000000393a11 set-alloc-hint,write 15319 osd192 3.b61f59fd npr_archive_library_img.rbd 942122'299183126872064 watch 15320 osd100 3.2d0fc3c8 npr_archive_music_img.rbd 365920'299183126872064 watch 15321 osd108 3.93b6741d npr_archive_multimedia_img.rbd 836232'299183126872064 watch 15322 osd64 3.27bf5fe npr_archive_online_production_img.rbd 945218'299183126872064 watch 15323 osd154 3.1ca3def1 npr_archive_design_img.rbd 359827'299183126872064 watch 15324 osd161 3.edeaca14 npr_archive_orpheus_img.rbd 871904'299183126872064 watch

Do you think those 4 write operations are enough to make me think twice about a reboot?

Thanks again,

Shain



On 05/24/2017 08:31 AM, Ilya Dryomov wrote:
On Wed, May 24, 2017 at 1:47 PM, Shain Miley <SMiley@xxxxxxx> wrote:
Hello,
We just upgraded from Hammer to Jewel, and after the cluster once again
reported a healthy state I set the crush tunables to ‘optimal’ (from
legacy).
12 hours later and the cluster is almost done with the pg remapping under
the new rules.

The issue I am having is the server where we mount the krbd images is
showing errors in the kern.log:

May 24 07:28:14 rbd1 kernel: [5600763.226208] libceph: osd192
10.35.1.235:6844 feature set mismatch, my 2b84a042a42 < server's
40002b84a042a42, missing 400000000000000

And I can no longer list any of the mounted filesystems or unmap the rbd
images, etc.

My options seem to be:

1)set the tunables back to legacy and see if the rbd server starts
responding.

2)upgrade the kernel on the rbd server to at least version 4.5 (currently
using 3.18 on Ubuntu 14.04).
As per [1], I'd recommend upgrading to 4.9.z.

3)disable some features on our current images?
This is the "cluster" feature bit, not the image feature.  Don't enable
new image features just because they are there in jewel though ;)

I would like to try option 2 first…but I am wondering if is safe to reboot
the server with the rbd images still mapped…is there any chance of data loss
from an rbd image getting corrupted?
Take a look at /sys/kernel/debug/ceph/*/osdc.  If it's empty, there are
no in-flight requests and you should be able to cold reboot safely.  If
there is a lot of pending requests, the safest option is to revert the
tunables setting.

[1] http://docs.ceph.com/docs/master/start/os-recommendations/

Thanks,

                 Ilya

--
NPR | Shain Miley | Manager of Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux