Re: Bluestore "separate" WAL and DB (and WAL/DB size?) [and recovery sleep]

Mark Nelson <mnelson@xxxxxxxxxx> · Thu, 14 Sep 2017 09:02:40 -0500

I'm really glad to hear that it wasn't bluestore! :)

It raises another concern though. We didn't expect to see that much of a 
slowdown with the current throttle settings.  An order of magnitude 
slowdown in recovery performance isn't ideal at all.

I wonder if we could improve things dramatically if we kept track of 
client IO activity on the OSD and remove the throttle if there's been no 
client activity for X seconds.  Theoretically more advanced heuristics 
might cover this, but in the interim it seems to me like this would 
solve the very specific problem you are seeing while still throttling 
recovery when IO is happening.

Mark

On 09/14/2017 06:19 AM, Richard Hesketh wrote:
Yeah, that hit the nail on the head. Significantly reducing/eliminating the recovery sleep times increases the recovery speed back up (and beyond!) the levels I was expecting to see - recovery is almost an order of magnitude faster now. Thanks for educating me about those changes!

Rich

On 14/09/17 11:16, Richard Hesketh wrote:
Hi Mark,

No, I wasn't familiar with that work. I am in fact comparing speed of recovery to maintenance work I did while the cluster was in Jewel; I haven't manually done anything to sleep settings, only adjusted max backfills OSD settings. New options that introduce arbitrary slowdown to recovery operations to preserve client performance would explain what I'm seeing! I'll have a tinker with adjusting those values (in my particular case client load on the cluster is very low and I don't have to honour any guarantees about client performance - getting back into HEALTH_OK asap is preferable).

Rich

On 13/09/17 21:14, Mark Nelson wrote:
Hi Richard,

Regarding recovery speed, have you looked through any of Neha's results on recovery sleep testing earlier this summer?

https://www.spinics.net/lists/ceph-devel/msg37665.html

She tested bluestore and filestore under a couple of different scenarios.  The gist of it is that time to recover changes pretty dramatically depending on the sleep setting.

I don't recall if you said earlier, but are you comparing filestore and bluestore recovery performance on the same version of ceph with the same sleep settings?

Mark

On 09/12/2017 05:24 AM, Richard Hesketh wrote:
Thanks for the links. That does seem to largely confirm that what I haven't horribly misunderstood anything and I've not been doing anything obviously wrong while converting my disks; there's no point specifying separate WAL/DB partitions if they're going to go on the same device, throw as much space as you have available at the DB partitions and they'll use all the space they can, and significantly reduced I/O on the DB/WAL device compared to Filestore is expected since bluestore's nixed the write amplification as much as possible.

I'm still seeing much reduced recovery speed on my newly Bluestored cluster, but I guess that's a tuning issue rather than evidence of catastrophe.

Rich

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com