Re: reproducable rbd-nbd crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 24, 2019 at 12:47 PM Marc Schöchlin <ms@xxxxxxxxxx> wrote:
>
> Hi Jason,
>
> i installed kernel 4.4.0-154.181 (from ubuntu package sources) and performed the crash reproduction.
> The problem also re-appeared with that kernel release.
>
> A gunzip with 10 gunzip processes throwed 1600 write and 330 read IOPS against the cluster/the rbd_ec volume with a transfer rate of 290MB/sec for 10 Minutes.
> After that the same problem re-appeared.
>
> What should we do now?
>
> Testing with a 10.2.5 librbd/rbd-nbd ist currently not that easy for me, because the ceph apt source does not contain that version.
> Do you know a package source?

All the upstream packages should be available here [1], including 12.2.5.

> How can i support you?

Did you pull the OSD blocked ops stats to figure out what is going on
with the OSDs?

> Regards
> Marc
>
> Am 24.07.19 um 07:55 schrieb Marc Schöchlin:
> > Hi Jason,
> >
> > Am 24.07.19 um 00:40 schrieb Jason Dillaman:
> >>> Sure, which kernel do you prefer?
> >> You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen environment. Can you use a matching kernel version?
> >
> > Thats true, our virtual machines of our xen environments completly run on rbd-nbd devices.
> > Every host runs dozends of rbd-nbd maps which are visible as xen disks in the virtual systems.
> > (https://github.com/vico-research-and-consulting/RBDSR)
> >
> > It seems that xenserver has a special behavior with device timings because 1.5 years ago we had a outage of 1.5 hours of our ceph cluster which blocked all write requests
> > (overfull disks because of huge usage growth). In this situation all virtualmachines continue their work without problems after the cluster was back.
> > We haven't set any timeouts using nbd_set_timeout.c on these systems.
> >
> > We never experienced problems with these rbd-nbd instances.
> >
> > [root@xen-s31 ~]# rbd nbd ls
> > pid   pool                                                           image                                    snap device
> > 10405 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-72f4e61d-acb9-4679-9b1d-fe0324cb5436 -    /dev/nbd3
> > 12731 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-88f8889a-05dc-49ab-a7de-8b5f3961f9c9 -    /dev/nbd4
> > 13123 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-37243066-54b0-453a-8bf3-b958153a680d -    /dev/nbd5
> > 15342 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 RBD-2bee9bf7-4fed-4735-a749-2d4874181686 -    /dev/nbd6
> > 15702 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-5b93eb93-ebe7-4711-a16a-7893d24c1bbf -    /dev/nbd7
> > 27568 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-616a74b5-3f57-4123-9505-dbd4c9aa9be3 -    /dev/nbd8
> > 21112 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-5c673a73-7827-44cc-802c-8d626da2f401 -    /dev/nbd9
> > 15726 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-1069a275-d97f-48fd-9c52-aed1d8ac9eab -    /dev/nbd10
> > 4368  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-23b72184-0914-4924-8f7f-10868af7c0ab -    /dev/nbd11
> > 4642  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-bf13cf77-6115-466e-85c5-aa1d69a570a0 -    /dev/nbd12
> > 9438  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-a2071aa0-5f63-4425-9f67-1713851fc1ca -    /dev/nbd13
> > 29191 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-fd9a299f-dad9-4ab9-b6c9-2e9650cda581 -    /dev/nbd14
> > 4493  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-1bbb4135-e9ed-4720-a41a-a49b998faf42 -    /dev/nbd15
> > 4683  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-374cadac-d969-49eb-8269-aa125cba82d8 -    /dev/nbd16
> > 1736  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-478a20cc-58dd-4cd9-b8b1-6198014e21b1 -    /dev/nbd17
> > 3648  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-6e28ec15-747a-43c9-998d-e9f2a600f266 -    /dev/nbd18
> > 9993  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-61ae5ef3-9efb-4fe6-8882-45d54558313e -    /dev/nbd19
> > 10324 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-f7d27673-c268-47b9-bd58-46dcd4626bbb -    /dev/nbd20
> > 19330 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-0d4e5568-ac93-4f27-b24f-6624f2fa4a2b -    /dev/nbd21
> > 14942 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 RBD-69832522-fd68-49f9-810f-485947ff5e44 -    /dev/nbd22
> > 20859 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-5025b066-723e-48f5-bc4e-9b8bdc1e9326 -    /dev/nbd23
> > 19247 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-095292a0-6cc2-4112-95bf-15cb3dd33e9a -    /dev/nbd24
> > 22356 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-f8229ea0-ad7b-4034-9cbe-7353792a2b7c -    /dev/nbd25
> > 22537 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-e8c0b841-50ec-4765-a3cb-30c78a4b9162 -    /dev/nbd26
> > 15105 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-6d3d3503-2b45-45e9-a17b-30ab65c2be3d -    /dev/nbd27
> > 28192 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-e04ec9e6-da4c-4b7a-b257-2cf7022ac59f -    /dev/nbd28
> > 28507 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-e6d213b3-89d6-4c09-bc65-18ed7992149d -    /dev/nbd29
> > 23206 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-638ef476-843e-4c26-8202-377f185d9d26 -    /dev/nbd30
> >
> >
> > [root@xen-s31 ~]# uname -a
> > Linux xen-s31 4.4.0+10 #1 SMP Wed Dec 6 13:56:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> >
> > [root@xen-s31 ~]# rpm -qa|grep -P "ceph|rbd"
> > librbd1-12.2.5-0.el7.x86_64
> > python-rbd-12.2.5-0.el7.x86_64
> > ceph-common-12.2.5-0.el7.x86_64
> > python-cephfs-12.2.5-0.el7.x86_64
> > rbd-fuse-12.2.5-0.el7.x86_64
> > libcephfs2-12.2.5-0.el7.x86_64
> > rbd-nbd-12.2.5-0.el7.x86_64
> >
> > Therefore i will try to use a 4.4 release - but i suppose that there some patch-differences between my ubuntu 4.4 and xenserver 4.4 kernel
> > I will test with "4.4.0-154".
> >
> > Regards
> > Marc
> >
> >
> >>> I can test with following releases:
> >>>
> >>> # apt-cache search linux-image-4.*.*.*-*-generic 2>&1|sed '~s,\.[0-9]*-[0-9]*-*-generic - .*,,;~s,linux-image-,,'|sort -u
> >>> 4.10
> >>> 4.11
> >>> 4.13
> >>> 4.15
> >>> 4.4
> >>> 4.8
> >>>
> >>> We can also perform tests by using another filesystem (i.e. ext4).
> >>>
> >>> From my point of view i suppose that there is something wrong nbd.ko or with rbd-nbd (excluding rbd-cache functionality) - therefore i do not think that this very promising....
> >> Agreed. I would also attempt to see if you have blocked ops on the OSD during these events (see Mykola’s ticket comment).
> >>
> >>> Regards
> >>> Marc
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@xxxxxxxxxxxxxx
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[1] http://download.ceph.com/debian-luminous/pool/main/c/ceph/


-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux