Hi Jason, Am 24.07.19 um 00:40 schrieb Jason Dillaman: > >> Sure, which kernel do you prefer? > You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen environment. Can you use a matching kernel version? Thats true, our virtual machines of our xen environments completly run on rbd-nbd devices. Every host runs dozends of rbd-nbd maps which are visible as xen disks in the virtual systems. (https://github.com/vico-research-and-consulting/RBDSR) It seems that xenserver has a special behavior with device timings because 1.5 years ago we had a outage of 1.5 hours of our ceph cluster which blocked all write requests (overfull disks because of huge usage growth). In this situation all virtualmachines continue their work without problems after the cluster was back. We haven't set any timeouts using nbd_set_timeout.c on these systems. We never experienced problems with these rbd-nbd instances. [root@xen-s31 ~]# rbd nbd ls pid pool image snap device 10405 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-72f4e61d-acb9-4679-9b1d-fe0324cb5436 - /dev/nbd3 12731 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-88f8889a-05dc-49ab-a7de-8b5f3961f9c9 - /dev/nbd4 13123 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-37243066-54b0-453a-8bf3-b958153a680d - /dev/nbd5 15342 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 RBD-2bee9bf7-4fed-4735-a749-2d4874181686 - /dev/nbd6 15702 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-5b93eb93-ebe7-4711-a16a-7893d24c1bbf - /dev/nbd7 27568 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-616a74b5-3f57-4123-9505-dbd4c9aa9be3 - /dev/nbd8 21112 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-5c673a73-7827-44cc-802c-8d626da2f401 - /dev/nbd9 15726 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-1069a275-d97f-48fd-9c52-aed1d8ac9eab - /dev/nbd10 4368 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-23b72184-0914-4924-8f7f-10868af7c0ab - /dev/nbd11 4642 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-bf13cf77-6115-466e-85c5-aa1d69a570a0 - /dev/nbd12 9438 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-a2071aa0-5f63-4425-9f67-1713851fc1ca - /dev/nbd13 29191 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-fd9a299f-dad9-4ab9-b6c9-2e9650cda581 - /dev/nbd14 4493 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-1bbb4135-e9ed-4720-a41a-a49b998faf42 - /dev/nbd15 4683 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-374cadac-d969-49eb-8269-aa125cba82d8 - /dev/nbd16 1736 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-478a20cc-58dd-4cd9-b8b1-6198014e21b1 - /dev/nbd17 3648 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-6e28ec15-747a-43c9-998d-e9f2a600f266 - /dev/nbd18 9993 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-61ae5ef3-9efb-4fe6-8882-45d54558313e - /dev/nbd19 10324 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-f7d27673-c268-47b9-bd58-46dcd4626bbb - /dev/nbd20 19330 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-0d4e5568-ac93-4f27-b24f-6624f2fa4a2b - /dev/nbd21 14942 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 RBD-69832522-fd68-49f9-810f-485947ff5e44 - /dev/nbd22 20859 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-5025b066-723e-48f5-bc4e-9b8bdc1e9326 - /dev/nbd23 19247 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-095292a0-6cc2-4112-95bf-15cb3dd33e9a - /dev/nbd24 22356 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-f8229ea0-ad7b-4034-9cbe-7353792a2b7c - /dev/nbd25 22537 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-e8c0b841-50ec-4765-a3cb-30c78a4b9162 - /dev/nbd26 15105 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-6d3d3503-2b45-45e9-a17b-30ab65c2be3d - /dev/nbd27 28192 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-e04ec9e6-da4c-4b7a-b257-2cf7022ac59f - /dev/nbd28 28507 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-e6d213b3-89d6-4c09-bc65-18ed7992149d - /dev/nbd29 23206 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-638ef476-843e-4c26-8202-377f185d9d26 - /dev/nbd30 [root@xen-s31 ~]# uname -a Linux xen-s31 4.4.0+10 #1 SMP Wed Dec 6 13:56:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux [root@xen-s31 ~]# rpm -qa|grep -P "ceph|rbd" librbd1-12.2.5-0.el7.x86_64 python-rbd-12.2.5-0.el7.x86_64 ceph-common-12.2.5-0.el7.x86_64 python-cephfs-12.2.5-0.el7.x86_64 rbd-fuse-12.2.5-0.el7.x86_64 libcephfs2-12.2.5-0.el7.x86_64 rbd-nbd-12.2.5-0.el7.x86_64 Therefore i will try to use a 4.4 release - but i suppose that there some patch-differences between my ubuntu 4.4 and xenserver 4.4 kernel I will test with "4.4.0-154". Regards Marc > >> I can test with following releases: >> >> # apt-cache search linux-image-4.*.*.*-*-generic 2>&1|sed '~s,\.[0-9]*-[0-9]*-*-generic - .*,,;~s,linux-image-,,'|sort -u >> 4.10 >> 4.11 >> 4.13 >> 4.15 >> 4.4 >> 4.8 >> >> We can also perform tests by using another filesystem (i.e. ext4). >> >> From my point of view i suppose that there is something wrong nbd.ko or with rbd-nbd (excluding rbd-cache functionality) - therefore i do not think that this very promising.... > Agreed. I would also attempt to see if you have blocked ops on the OSD during these events (see Mykola’s ticket comment). > >> Regards >> Marc _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com