On Wed, Jul 24, 2019 at 12:47 PM Marc Schöchlin <ms@xxxxxxxxxx> wrote: > > Hi Jason, > > i installed kernel 4.4.0-154.181 (from ubuntu package sources) and performed the crash reproduction. > The problem also re-appeared with that kernel release. > > A gunzip with 10 gunzip processes throwed 1600 write and 330 read IOPS against the cluster/the rbd_ec volume with a transfer rate of 290MB/sec for 10 Minutes. > After that the same problem re-appeared. > > What should we do now? > > Testing with a 10.2.5 librbd/rbd-nbd ist currently not that easy for me, because the ceph apt source does not contain that version. > Do you know a package source? All the upstream packages should be available here [1], including 12.2.5. > How can i support you? Did you pull the OSD blocked ops stats to figure out what is going on with the OSDs? > Regards > Marc > > Am 24.07.19 um 07:55 schrieb Marc Schöchlin: > > Hi Jason, > > > > Am 24.07.19 um 00:40 schrieb Jason Dillaman: > >>> Sure, which kernel do you prefer? > >> You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen environment. Can you use a matching kernel version? > > > > Thats true, our virtual machines of our xen environments completly run on rbd-nbd devices. > > Every host runs dozends of rbd-nbd maps which are visible as xen disks in the virtual systems. > > (https://github.com/vico-research-and-consulting/RBDSR) > > > > It seems that xenserver has a special behavior with device timings because 1.5 years ago we had a outage of 1.5 hours of our ceph cluster which blocked all write requests > > (overfull disks because of huge usage growth). In this situation all virtualmachines continue their work without problems after the cluster was back. > > We haven't set any timeouts using nbd_set_timeout.c on these systems. > > > > We never experienced problems with these rbd-nbd instances. > > > > [root@xen-s31 ~]# rbd nbd ls > > pid pool image snap device > > 10405 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-72f4e61d-acb9-4679-9b1d-fe0324cb5436 - /dev/nbd3 > > 12731 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-88f8889a-05dc-49ab-a7de-8b5f3961f9c9 - /dev/nbd4 > > 13123 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-37243066-54b0-453a-8bf3-b958153a680d - /dev/nbd5 > > 15342 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 RBD-2bee9bf7-4fed-4735-a749-2d4874181686 - /dev/nbd6 > > 15702 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-5b93eb93-ebe7-4711-a16a-7893d24c1bbf - /dev/nbd7 > > 27568 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-616a74b5-3f57-4123-9505-dbd4c9aa9be3 - /dev/nbd8 > > 21112 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-5c673a73-7827-44cc-802c-8d626da2f401 - /dev/nbd9 > > 15726 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-1069a275-d97f-48fd-9c52-aed1d8ac9eab - /dev/nbd10 > > 4368 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-23b72184-0914-4924-8f7f-10868af7c0ab - /dev/nbd11 > > 4642 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-bf13cf77-6115-466e-85c5-aa1d69a570a0 - /dev/nbd12 > > 9438 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-a2071aa0-5f63-4425-9f67-1713851fc1ca - /dev/nbd13 > > 29191 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-fd9a299f-dad9-4ab9-b6c9-2e9650cda581 - /dev/nbd14 > > 4493 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-1bbb4135-e9ed-4720-a41a-a49b998faf42 - /dev/nbd15 > > 4683 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-374cadac-d969-49eb-8269-aa125cba82d8 - /dev/nbd16 > > 1736 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-478a20cc-58dd-4cd9-b8b1-6198014e21b1 - /dev/nbd17 > > 3648 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-6e28ec15-747a-43c9-998d-e9f2a600f266 - /dev/nbd18 > > 9993 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-61ae5ef3-9efb-4fe6-8882-45d54558313e - /dev/nbd19 > > 10324 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-f7d27673-c268-47b9-bd58-46dcd4626bbb - /dev/nbd20 > > 19330 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-0d4e5568-ac93-4f27-b24f-6624f2fa4a2b - /dev/nbd21 > > 14942 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 RBD-69832522-fd68-49f9-810f-485947ff5e44 - /dev/nbd22 > > 20859 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-5025b066-723e-48f5-bc4e-9b8bdc1e9326 - /dev/nbd23 > > 19247 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-095292a0-6cc2-4112-95bf-15cb3dd33e9a - /dev/nbd24 > > 22356 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-f8229ea0-ad7b-4034-9cbe-7353792a2b7c - /dev/nbd25 > > 22537 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-e8c0b841-50ec-4765-a3cb-30c78a4b9162 - /dev/nbd26 > > 15105 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-6d3d3503-2b45-45e9-a17b-30ab65c2be3d - /dev/nbd27 > > 28192 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-e04ec9e6-da4c-4b7a-b257-2cf7022ac59f - /dev/nbd28 > > 28507 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-e6d213b3-89d6-4c09-bc65-18ed7992149d - /dev/nbd29 > > 23206 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-638ef476-843e-4c26-8202-377f185d9d26 - /dev/nbd30 > > > > > > [root@xen-s31 ~]# uname -a > > Linux xen-s31 4.4.0+10 #1 SMP Wed Dec 6 13:56:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux > > > > [root@xen-s31 ~]# rpm -qa|grep -P "ceph|rbd" > > librbd1-12.2.5-0.el7.x86_64 > > python-rbd-12.2.5-0.el7.x86_64 > > ceph-common-12.2.5-0.el7.x86_64 > > python-cephfs-12.2.5-0.el7.x86_64 > > rbd-fuse-12.2.5-0.el7.x86_64 > > libcephfs2-12.2.5-0.el7.x86_64 > > rbd-nbd-12.2.5-0.el7.x86_64 > > > > Therefore i will try to use a 4.4 release - but i suppose that there some patch-differences between my ubuntu 4.4 and xenserver 4.4 kernel > > I will test with "4.4.0-154". > > > > Regards > > Marc > > > > > >>> I can test with following releases: > >>> > >>> # apt-cache search linux-image-4.*.*.*-*-generic 2>&1|sed '~s,\.[0-9]*-[0-9]*-*-generic - .*,,;~s,linux-image-,,'|sort -u > >>> 4.10 > >>> 4.11 > >>> 4.13 > >>> 4.15 > >>> 4.4 > >>> 4.8 > >>> > >>> We can also perform tests by using another filesystem (i.e. ext4). > >>> > >>> From my point of view i suppose that there is something wrong nbd.ko or with rbd-nbd (excluding rbd-cache functionality) - therefore i do not think that this very promising.... > >> Agreed. I would also attempt to see if you have blocked ops on the OSD during these events (see Mykola’s ticket comment). > >> > >>> Regards > >>> Marc > > _______________________________________________ > > ceph-users mailing list > > ceph-users@xxxxxxxxxxxxxx > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com [1] http://download.ceph.com/debian-luminous/pool/main/c/ceph/ -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com