Re: [Warning Possible spam] Re: [Warning Possible spam] Re: Ceph Bluestore tweaks for Bcache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Richard,

so you are tweaking run-time config values, not OSD prepare-time config values. There is something I don't understand here:

> What I do for my settings is to set them for the hdd class (ceph config set osd/class:hdd bluestore_setting_blah=blahblah.
> I think that's the correct syntax, but I'm not currently at a computer) in the config database.

If the OSD comes up as class=hdd, then the hdd defaults should be applied any way and there is no point setting these values explicitly to their defaults. How do you make the OSD come up in class hdd, wasn't it your original problem that the OSDs came up in class ssd? Or are you observing that an HDD+bcache OSD comes up in device class hdd *but* bluestore thinks it is an ssd and applies SSD defaults (some_config_value_ssd) *unless* you explicitly set the config option for device class hdd?

I think I am confused about the OSD device class, the drive type detected by bluestore and what options are used if there is a mis-match - if there is any. If I understand you correctly, it seems you observe that:

- OSD is prepared on HDD and put into device class hdd (with correct persistent prepare-time options)
- bcache is added *after* OSD creation (???)
- after this, on (re-)start the OSD comes up in device class hdd but bluestore thinks now its an SSD and uses some incorrect run-time config option defaults
- to fix the incorrect run-time options, you explicitly copy some hdd-defaults to the config data base with filter "osd/class:hdd"

If this is correct, then I believe the underlying issue is that:

- some_config_value_hdd is used for "rotational=0" devices and
- osd/class:hdd values are used for "device_class=hdd" OSDs,

which is not the same despite the string "hdd" indicating that it is.

There is actually an interesting follow up on this. With bcache/dm_cache large enough it should make sense to use SSD rocks-DB settings, because the data base will fit into the cache. Are there any recommendations for tweaking the prepare-time config options, in particular, the rocks-db options for such hybrid drives?

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Richard Bade <hitrich@xxxxxxxxx>
Sent: 07 April 2022 14:05:19
To: Frank Schilder
Cc: Igor Fedotov; Ceph Users
Subject: [Warning Possible spam]  Re: [Warning Possible spam]  Re: Ceph Bluestore tweaks for Bcache

Hi Frank,
I can't speak for the bluestore debug enforce settings as I don't have this setting but I would guess it's the same.
What I do for my settings is to set them for the hdd class (ceph config set osd/class:hdd bluestore_setting_blah=blahblah. I think that's the correct syntax, but I'm not currently at a computer) in the config database. The class is a permanent setting on the osd so when it starts or server reboots it automatically applies these setting based on the osd class. That way any new osds also get them as soon as the class is defined for the osd.
Hopefully that helps.

Rich


On Thu, 7 Apr 2022, 23:40 Frank Schilder, <frans@xxxxxx<mailto:frans@xxxxxx>> wrote:
Hi Richard and Igor,

are these tweaks required at build-time (osd prepare) only or are they required for every restart?

Is this setting "bluestore debug enforce settings=hdd" in the ceph config data base or set somewhere else? How does this work if deploying HDD- and SSD-OSDs at the same time?

Ideally, all these tweaks should be applicable and settable at creation time only without affecting generic settings (that is, at the ceph-volume command line and not via config side effects). Otherwise it becomes really tedious to manage these.

For example, would the following work-flow apply the correct settings *permanently* across restarts:

1) Prepare OSD on fresh HDD with ceph-volume lvm batch --prepare ...
2) Assign dm_cache to logical OSD volume created in step 1
3) Start OSD, restart OSDs, boot server ...

I would assume that the HDD settings are burned into the OSD in step 1 and will be used in all future (re-)starts without the need to do anything despite the device being detected as non-rotational after step 2. Is this assumption correct?

Thanks and best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: Richard Bade <hitrich@xxxxxxxxx<mailto:hitrich@xxxxxxxxx>>
Sent: 06 April 2022 00:43:48
To: Igor Fedotov
Cc: Ceph Users
Subject: [Warning Possible spam]   Re: Ceph Bluestore tweaks for Bcache

Just for completeness for anyone that is following this thread. Igor
added that setting in Octopus, so unfortunately I am unable to use it
as I am still on Nautilus.

Thanks,
Rich

On Wed, 6 Apr 2022 at 10:01, Richard Bade <hitrich@xxxxxxxxx<mailto:hitrich@xxxxxxxxx>> wrote:
>
> Thanks Igor for the tip. I'll see if I can use this to reduce the
> number of tweaks I need.
>
> Rich
>
> On Tue, 5 Apr 2022 at 21:26, Igor Fedotov <igor.fedotov@xxxxxxxx<mailto:igor.fedotov@xxxxxxxx>> wrote:
> >
> > Hi Richard,
> >
> > just FYI: one can use "bluestore debug enforce settings=hdd" config
> > parameter to manually enforce HDD-related  settings for a BlueStore
> >
> >
> > Thanks,
> >
> > Igor
> >
> > On 4/5/2022 1:07 AM, Richard Bade wrote:
> > > Hi Everyone,
> > > I just wanted to share a discovery I made about running bluestore on
> > > top of Bcache in case anyone else is doing this or considering it.
> > > We've run Bcache under Filestore for a long time with good results but
> > > recently rebuilt all the osds on bluestore. This caused some
> > > degradation in performance that I couldn't quite put my finger on.
> > > Bluestore osds have some smarts where they detect the disk type.
> > > Unfortunately in the case of Bcache it detects as SSD, when in fact
> > > the HDD parameters are better suited.
> > > I changed the following parameters to match the HDD default values and
> > > immediately saw my average osd latency during normal workload drop
> > > from 6ms to 2ms. Peak performance didn't change really, but a test
> > > machine that I have running a constant iops workload was much more
> > > stable as was the average latency.
> > > Performance has returned to Filestore or better levels.
> > > Here are the parameters.
> > >
> > >   ; Make sure that we use values appropriate for HDD not SSD - Bcache
> > > gets detected as SSD
> > >   bluestore_prefer_deferred_size = 32768
> > >   bluestore_compression_max_blob_size = 524288
> > >   bluestore_deferred_batch_ops = 64
> > >   bluestore_max_blob_size = 524288
> > >   bluestore_min_alloc_size = 65536
> > >   bluestore_throttle_cost_per_io = 670000
> > >
> > >   ; Try to improve responsiveness when some disks are fully utilised
> > >   osd_op_queue = wpq
> > >   osd_op_queue_cut_off = high
> > >
> > > Hopefully someone else finds this useful.
> > > _______________________________________________
> > > ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
> >
> > --
> > Igor Fedotov
> > Ceph Lead Developer
> >
> > Looking for help with your Ceph cluster? Contact us at https://croit.io
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> >
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux