Re: slow requests and short OSD failures in small cluster

Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> · Thu, 13 Apr 2017 10:51:11 +0200

On 04/13/17 10:34, Jogi Hofmüller wrote:
> Dear David,
>
> Am Mittwoch, den 12.04.2017, 13:46 +0000 schrieb David Turner:
>> I can almost guarantee what you're seeing is PG subfolder splitting. 
> Evey day there's something new to learn about ceph ;)
>
>> When the subfolders in a PG get X number of objects, it splits into
>> 16 subfolders.  Every cluster I manage has blocked requests and OSDs
>> that get marked down while this is happening.  To stop the OSDs
>> getting marked down, I increase the osd_heartbeat_grace until the
>> OSDs no longer mark themselves down during this process.
> Thanks for the hint. I adjusted the values accordingly and will monitor
> our cluster. This morning there were no troubles at all btw. Still
> wondering what caused yesterday's mayhem ...
>
> Regards,
Also more things to consider...

Ceph snapshots reaaaally slow things down. They aren't efficient like on
zfs and btrfs. Having one might take away some % performance, and having
2 snaps takes potentially double, etc. until it is crawling. And it's
not just the CoW... even just rbd snap rm, rbd diff, etc. starts to take
many times longer. See http://tracker.ceph.com/issues/10823 for
explanation of CoW. My goal is just to keep max 1 long term snapshot.

Also there's snap trimming, which I found to be far worse than directory
splitting. The settings I have for this and splitting are:
osd_pg_max_concurrent_snap_trims=1
osd_snap_trim_sleep=0
filestore_split_multiple=8

osd_snap_trim_sleep is bugged, holding a lock while sleeping, so make
sure it's 0.
filestore_split_multiple makes it split less often I think...not sure
how much this helps, but I subjectively think it improves it.

And I find that bcache makes little metadata operations like that (and
xattrs, leveldb, xfs journal, etc.) much less load on the osd disk.

I have not changed any timeouts and don't get any OSDs marked down. But
also I didn't before I tried bcache and other settings. I just got
blocked requests (and still do, but less), and hanging librbd client VMs
(disabling exclusive-lock fixes it).
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com