Ok, so I did some testing on each of these parameters one by one; removing them from the config, watching the latency for a few minutes then adding them back again. None of them had any conclusive, statistically significant impact on the latency except bluestore_prefer_deferred_size. I removed it like this: sudo ceph config rm osd/class:hdd bluestore_prefer_deferred_size and my latency immediately increased from 2ms to 6ms. So I added it back again: sudo ceph config set osd/class:hdd bluestore_prefer_deferred_size 32768 latency immediately dropped back to 2ms. So this parameter is definitely able to be applied at runtime and makes a difference to how my osds perform. As I am using separate db partitions on the ssd this is to be expected when more is being pushed through the wal, which I believe is what this parameter is doing. I also notice the activity on the wal increases across these osds. I also tested the other way, by removing all the parameters I mentioned earlier and just adding this one. The results were the same. So I guess an update to my original post is that when using bcache make sure that you tweak the bluestore_prefer_deferred_size at least. The bluestore_prefer_deferred_size_hdd value of 32768 works well for me but there may be other values that are better. Rich On Mon, 11 Apr 2022 at 09:23, Richard Bade <hitrich@xxxxxxxxx> wrote: > > Hi Frank, > Thanks for your insight on this. I had done a bunch of testing on this > over a year ago and found improvements with these settings. I then > applied them all at once to our production cluster and confirmed the > 3x reduction in latency, however I did not test the settings > individually. > It could well be, as you say, that the settings cannot be changed at > runtime and that in fact only the other settings such as op queue and > throttle cost are making the difference. I'll attempt to test the > settings again this week and see which ones are actually affecting > latency during runtime setting. > > > I'm not sure why with your OSD creation procedure the data part is created with the correct HDD parameters. > I believe that at prepare time my osds get all SSD parameters. That's > why I manually change the class and these runtime settings. > > Rich > > On Sat, 9 Apr 2022 at 00:22, Frank Schilder <frans@xxxxxx> wrote: > > > > Hi Richard, > > > > thanks for the additional info, now I understand the whole scenario and what might be different when using lvm and dm_cache. > > > > > In my process, bcache is added before osd creation as bcache creates a > > > disk device called /dev/bcache0 for example. This is used for the data > > > > This is an important detail. As far as I know, dm_cache is transparent. It can be added/removed at run time and doesn't create a new device. However, I don't know if it changes the rotational attribute of the LVM device. > > > > I'm not sure why with your OSD creation procedure the data part is created with the correct HDD parameters. I believe at least these if not more parameters are used at prepare time only and cannot be changed after the OSD is created: > > > > bluestore_prefer_deferred_size = 32768 > > bluestore_compression_max_blob_size = 524288 > > bluestore_max_blob_size = 524288 > > bluestore_min_alloc_size = 65536 > > > > If you set these for osd/class:hdd they should *not* be used if the initial device class is ssd. If I understood you correctly, you create an OSD with class=ssd and then change its class to class=hdd. At this point, however, it is too late, the hard-coded ssd options should persist. I wonder if using a command like > > > > ceph-volume lvm batch --crush-device-class hdd ... > > > > will select the right parameters irrespective of the rotational flag. How did you do it? I believe the only way to get the burned-in bluestore values was to start an OSD with high debug logging. The "config show" commands will show what is in the config DB and not what is burned onto disk (and actually used). > > > > Best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Richard Bade <hitrich@xxxxxxxxx> > > Sent: 08 April 2022 00:08 > > To: Frank Schilder > > Cc: Igor Fedotov; Ceph Users > > Subject: [Warning Possible spam] Re: [Warning Possible spam] Re: [Warning Possible spam] Re: Ceph Bluestore tweaks for Bcache > > > > Hi Frank, > > Yes, I think you have got to the crux of the issue. > > > - some_config_value_hdd is used for "rotational=0" devices and > > > - osd/class:hdd values are used for "device_class=hdd" OSDs, > > > > The class is something that is user defined and you can actually > > define your own class names. By default the class is set to ssd for > > rotational=0 and hdd for rotational=1. I override this so my osds end > > up in the right pools as my pools are class based. I also have another > > class called nvme for all nvme storage. > > So the rotational=0 and the class=ssd are actually disconnected and > > used for two different purposes. > > > > > Or are you observing that an HDD+bcache OSD comes up in device class hdd *but* bluestore thinks it is an ssd and applies SSD defaults (some_config_value_ssd) *unless* you explicitly set the config option for device class hdd? > > > > Yes, this is what I am observing, because I am manually changing the > > device class to HDD. > > > > > - OSD is prepared on HDD and put into device class hdd (with correct persistent prepare-time options) > > > - bcache is added *after* OSD creation (???) > > > - after this, on (re-)start the OSD comes up in device class hdd but bluestore thinks now its an SSD and uses some incorrect run-time config option defaults > > > - to fix the incorrect run-time options, you explicitly copy some hdd-defaults to the config data base with filter "osd/class:hdd" > > > > In my process, bcache is added before osd creation as bcache creates a > > disk device called /dev/bcache0 for example. This is used for the data > > disk. As you have surmised bluestore thinks my disks are ssd and > > applies settings as such. I set the class to HDD and then I correct > > runtime settings based on the class. > > > > > There is actually an interesting follow up on this. With bcache/dm_cache large enough it should make sense to use SSD rocks-DB settings, because the data base will fit into the cache. Are there any recommendations for tweaking the prepare-time config options, in particular, the rocks-db options for such hybrid drives? > > > > In my case, this doesn't apply as I have used volumes on the ssd > > specifically for the db. This means I know the db will always be on > > the fast storage. > > But yes, a larger cache size may change the performance and make it > > closer to what ceph expects from an ssd. In my experience the ssd > > settings made performance considerably worse than the hdd settings (3x > > average latency) on bcache. > > > > Regards, > > Rich > > > > On Fri, 8 Apr 2022 at 02:03, Frank Schilder <frans@xxxxxx> wrote: > > > > > > Hi Richard, > > > > > > so you are tweaking run-time config values, not OSD prepare-time config values. There is something I don't understand here: > > > > > > > What I do for my settings is to set them for the hdd class (ceph config set osd/class:hdd bluestore_setting_blah=blahblah. > > > > I think that's the correct syntax, but I'm not currently at a computer) in the config database. > > > > > > If the OSD comes up as class=hdd, then the hdd defaults should be applied any way and there is no point setting these values explicitly to their defaults. How do you make the OSD come up in class hdd, wasn't it your original problem that the OSDs came up in class ssd? Or are you observing that an HDD+bcache OSD comes up in device class hdd *but* bluestore thinks it is an ssd and applies SSD defaults (some_config_value_ssd) *unless* you explicitly set the config option for device class hdd? > > > > > > I think I am confused about the OSD device class, the drive type detected by bluestore and what options are used if there is a mis-match - if there is any. If I understand you correctly, it seems you observe that: > > > > > > - OSD is prepared on HDD and put into device class hdd (with correct persistent prepare-time options) > > > - bcache is added *after* OSD creation (???) > > > - after this, on (re-)start the OSD comes up in device class hdd but bluestore thinks now its an SSD and uses some incorrect run-time config option defaults > > > - to fix the incorrect run-time options, you explicitly copy some hdd-defaults to the config data base with filter "osd/class:hdd" > > > > > > If this is correct, then I believe the underlying issue is that: > > > > > > - some_config_value_hdd is used for "rotational=0" devices and > > > - osd/class:hdd values are used for "device_class=hdd" OSDs, > > > > > > which is not the same despite the string "hdd" indicating that it is. > > > > > > There is actually an interesting follow up on this. With bcache/dm_cache large enough it should make sense to use SSD rocks-DB settings, because the data base will fit into the cache. Are there any recommendations for tweaking the prepare-time config options, in particular, the rocks-db options for such hybrid drives? > > > > > > Best regards, > > > ================= > > > Frank Schilder > > > AIT Risø Campus > > > Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx