Re: reproducable rbd-nbd crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

i installed kernel 4.4.0-154.181 (from ubuntu package sources) and performed the crash reproduction.
The problem also re-appeared with that kernel release.

A gunzip with 10 gunzip processes throwed 1600 write and 330 read IOPS against the cluster/the rbd_ec volume with a transfer rate of 290MB/sec for 10 Minutes.
After that the same problem re-appeared.

What should we do now?

Testing with a 10.2.5 librbd/rbd-nbd ist currently not that easy for me, because the ceph apt source does not contain that version.
Do you know a package source?

How can i support you?

Regards
Marc

Am 24.07.19 um 07:55 schrieb Marc Schöchlin:
> Hi Jason,
>
> Am 24.07.19 um 00:40 schrieb Jason Dillaman:
>>> Sure, which kernel do you prefer?
>> You said you have never had an issue w/ rbd-nbd 12.2.5 in your Xen environment. Can you use a matching kernel version? 
>
> Thats true, our virtual machines of our xen environments completly run on rbd-nbd devices.
> Every host runs dozends of rbd-nbd maps which are visible as xen disks in the virtual systems.
> (https://github.com/vico-research-and-consulting/RBDSR)
>
> It seems that xenserver has a special behavior with device timings because 1.5 years ago we had a outage of 1.5 hours of our ceph cluster which blocked all write requests
> (overfull disks because of huge usage growth). In this situation all virtualmachines continue their work without problems after the cluster was back.
> We haven't set any timeouts using nbd_set_timeout.c on these systems.
>
> We never experienced problems with these rbd-nbd instances.
>
> [root@xen-s31 ~]# rbd nbd ls
> pid   pool                                                           image                                    snap device    
> 10405 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-72f4e61d-acb9-4679-9b1d-fe0324cb5436 -    /dev/nbd3 
> 12731 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-88f8889a-05dc-49ab-a7de-8b5f3961f9c9 -    /dev/nbd4 
> 13123 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-37243066-54b0-453a-8bf3-b958153a680d -    /dev/nbd5 
> 15342 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 RBD-2bee9bf7-4fed-4735-a749-2d4874181686 -    /dev/nbd6 
> 15702 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-5b93eb93-ebe7-4711-a16a-7893d24c1bbf -    /dev/nbd7 
> 27568 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-616a74b5-3f57-4123-9505-dbd4c9aa9be3 -    /dev/nbd8 
> 21112 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-5c673a73-7827-44cc-802c-8d626da2f401 -    /dev/nbd9 
> 15726 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-1069a275-d97f-48fd-9c52-aed1d8ac9eab -    /dev/nbd10
> 4368  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-23b72184-0914-4924-8f7f-10868af7c0ab -    /dev/nbd11
> 4642  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-bf13cf77-6115-466e-85c5-aa1d69a570a0 -    /dev/nbd12
> 9438  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-a2071aa0-5f63-4425-9f67-1713851fc1ca -    /dev/nbd13
> 29191 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-fd9a299f-dad9-4ab9-b6c9-2e9650cda581 -    /dev/nbd14
> 4493  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-1bbb4135-e9ed-4720-a41a-a49b998faf42 -    /dev/nbd15
> 4683  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-374cadac-d969-49eb-8269-aa125cba82d8 -    /dev/nbd16
> 1736  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-478a20cc-58dd-4cd9-b8b1-6198014e21b1 -    /dev/nbd17
> 3648  RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-6e28ec15-747a-43c9-998d-e9f2a600f266 -    /dev/nbd18
> 9993  RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-61ae5ef3-9efb-4fe6-8882-45d54558313e -    /dev/nbd19
> 10324 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-f7d27673-c268-47b9-bd58-46dcd4626bbb -    /dev/nbd20
> 19330 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-0d4e5568-ac93-4f27-b24f-6624f2fa4a2b -    /dev/nbd21
> 14942 RBD_XenStorage-PROD-SSD-1-cb933ab7-a006-4046-a012-5cbe0c5fbfb5 RBD-69832522-fd68-49f9-810f-485947ff5e44 -    /dev/nbd22
> 20859 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-5025b066-723e-48f5-bc4e-9b8bdc1e9326 -    /dev/nbd23
> 19247 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-095292a0-6cc2-4112-95bf-15cb3dd33e9a -    /dev/nbd24
> 22356 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-f8229ea0-ad7b-4034-9cbe-7353792a2b7c -    /dev/nbd25
> 22537 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-e8c0b841-50ec-4765-a3cb-30c78a4b9162 -    /dev/nbd26
> 15105 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-6d3d3503-2b45-45e9-a17b-30ab65c2be3d -    /dev/nbd27
> 28192 RBD_XenStorage-PROD-SSD-2-edcf45e6-ca5b-43f9-bafe-c553b1e5dd84 RBD-e04ec9e6-da4c-4b7a-b257-2cf7022ac59f -    /dev/nbd28
> 28507 RBD_XenStorage-PROD-HDD-1-2d80bec4-0f74-4553-9d87-5ccf650c87a0 RBD-e6d213b3-89d6-4c09-bc65-18ed7992149d -    /dev/nbd29
> 23206 RBD_XenStorage-PROD-HDD-2-08fdb4aa-81e3-433a-87d7-d5b37012a282 RBD-638ef476-843e-4c26-8202-377f185d9d26 -    /dev/nbd30
>
>
> [root@xen-s31 ~]# uname -a
> Linux xen-s31 4.4.0+10 #1 SMP Wed Dec 6 13:56:09 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
>
> [root@xen-s31 ~]# rpm -qa|grep -P "ceph|rbd"
> librbd1-12.2.5-0.el7.x86_64
> python-rbd-12.2.5-0.el7.x86_64
> ceph-common-12.2.5-0.el7.x86_64
> python-cephfs-12.2.5-0.el7.x86_64
> rbd-fuse-12.2.5-0.el7.x86_64
> libcephfs2-12.2.5-0.el7.x86_64
> rbd-nbd-12.2.5-0.el7.x86_64
>
> Therefore i will try to use a 4.4 release - but i suppose that there some patch-differences between my ubuntu 4.4 and xenserver 4.4 kernel
> I will test with "4.4.0-154".
>
> Regards
> Marc
>
>
>>> I can test with following releases:
>>>
>>> # apt-cache search linux-image-4.*.*.*-*-generic 2>&1|sed '~s,\.[0-9]*-[0-9]*-*-generic - .*,,;~s,linux-image-,,'|sort -u
>>> 4.10
>>> 4.11
>>> 4.13
>>> 4.15
>>> 4.4
>>> 4.8
>>>
>>> We can also perform tests by using another filesystem (i.e. ext4).
>>>
>>> From my point of view i suppose that there is something wrong nbd.ko or with rbd-nbd (excluding rbd-cache functionality) - therefore i do not think that this very promising....
>> Agreed. I would also attempt to see if you have blocked ops on the OSD during these events (see Mykola’s ticket comment).
>>
>>> Regards
>>> Marc
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux