[ceph] [nautilus][ceph-ansible] - Dynamic bucket resharding problem

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

I've run into a bit of an issue with one of our radosgw production clusters..

Setup is two radosgw nodes behind haproxy loadbalancing, which in turn are connected to the ceph cluster. Everything running 14.2.2 so Nautilus. It's tied to a openstack cluster, so keystone as authentication backend (should really matter though).

Today both rgw backends crashed. Checking logs it seems to be related to dynamic resharding of a bucket, causing Lock errors:

Logs snippet: https://pastebin.com/uBCnhinF

Checking http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021368.html (old), I performed a manual reshard of affected bucket with success (radosgw-admin bucket reshard --bucket="XXX/YYY" --num-shards=256)

Checking the metadata for bucket, it now correctly shows 256, up from 128.

HOWEVER, the dynamic resharding still kept happening and bringing down the backeds. I suspect it is because of the old reshard op hanging around when checking a `reshard list`: https://pastebin.com/dPChwBCT

As the resharding seems to have been successful when running manually, I now want to remove that reshard op, but can't, getting this https://pastebin.com/071kfAsa error when trying..

Right now I had to resort to setting rgw_dynamic_resharding = false in ceph.conf to stop the problem from occuring.

Ideas? 

Cheers
Erik

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux