Hi,
Thanks for all your help so far...very useful information indeed.
Here is the debug output from the file you referenced below:
root@rbd1:/sys/kernel/debug/ceph/504b5794-34bd-44e7-a8c3-0494cf800c23.client67751889#
cat osdc
2311 osd144 3.1347f3bc
rb.0.25f2ab0.238e1f29.000000000000 read
14216 osd65 3.bd82049c
rb.0.1ae3061.238e1f29.000000000000 read
14391 osd44 3.875890a0 rb.0.fe307e.238e1f29.000000393889
set-alloc-hint,write
14560 osd61 3.1ab27784 rb.0.17d451c.238e1f29.000000131308
set-alloc-hint,write
14561 osd33 3.cc377593 rb.0.e411a0.238e1f29.0000001e007b
set-alloc-hint,write
14568 osd192 3.1b4f6fbd rb.0.113e639.238e1f29.000000393a11
set-alloc-hint,write
15319 osd192 3.b61f59fd npr_archive_library_img.rbd
942122'299183126872064 watch
15320 osd100 3.2d0fc3c8 npr_archive_music_img.rbd
365920'299183126872064 watch
15321 osd108 3.93b6741d npr_archive_multimedia_img.rbd
836232'299183126872064 watch
15322 osd64 3.27bf5fe npr_archive_online_production_img.rbd
945218'299183126872064 watch
15323 osd154 3.1ca3def1 npr_archive_design_img.rbd
359827'299183126872064 watch
15324 osd161 3.edeaca14 npr_archive_orpheus_img.rbd
871904'299183126872064 watch
Do you think those 4 write operations are enough to make me think twice
about a reboot?
Thanks again,
Shain
On 05/24/2017 08:31 AM, Ilya Dryomov wrote:
On Wed, May 24, 2017 at 1:47 PM, Shain Miley <SMiley@xxxxxxx> wrote:
Hello,
We just upgraded from Hammer to Jewel, and after the cluster once again
reported a healthy state I set the crush tunables to ‘optimal’ (from
legacy).
12 hours later and the cluster is almost done with the pg remapping under
the new rules.
The issue I am having is the server where we mount the krbd images is
showing errors in the kern.log:
May 24 07:28:14 rbd1 kernel: [5600763.226208] libceph: osd192
10.35.1.235:6844 feature set mismatch, my 2b84a042a42 < server's
40002b84a042a42, missing 400000000000000
And I can no longer list any of the mounted filesystems or unmap the rbd
images, etc.
My options seem to be:
1)set the tunables back to legacy and see if the rbd server starts
responding.
2)upgrade the kernel on the rbd server to at least version 4.5 (currently
using 3.18 on Ubuntu 14.04).
As per [1], I'd recommend upgrading to 4.9.z.
3)disable some features on our current images?
This is the "cluster" feature bit, not the image feature. Don't enable
new image features just because they are there in jewel though ;)
I would like to try option 2 first…but I am wondering if is safe to reboot
the server with the rbd images still mapped…is there any chance of data loss
from an rbd image getting corrupted?
Take a look at /sys/kernel/debug/ceph/*/osdc. If it's empty, there are
no in-flight requests and you should be able to cold reboot safely. If
there is a lot of pending requests, the safest option is to revert the
tunables setting.
[1] http://docs.ceph.com/docs/master/start/os-recommendations/
Thanks,
Ilya
--
NPR | Shain Miley | Manager of Infrastructure, Digital Media | smiley@xxxxxxx | 202.513.3649
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com