Re: per-pool bluestore options

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 17.08.2016 19:27, Sage Weil wrote:
On Mon, 15 Aug 2016, Igor Fedotov wrote:
On 13.08.2016 21:56, Sage Weil wrote:
Hi Igor,

I took another look at

	https://github.com/ceph/ceph/pull/10556

You define three settings:

   compress_hint - determines if pool contains compressible / incompressible
data
   compress_algorithm - permits to specify different compression algorithm
   compress_ratio - specifies maximum compression ratio

I think we should extend this to include csum-related options.  And use a
consistent naming scheme that aligns with the config options where we just
strip off the bluestore_ prefix.  The relevant options are:
Sounds good!
The major questions here is how do these per-pool options correlate with
corresponding per-store ones?
One might suggest per-pool options to have higher priority over per-store one.
But I'm not sure that the best option.  Sometimes user might want to
disable/alter corresponding option without enumerating all the pools by simple
switching at per-store level. Hence we need to consider some means for that.

   bluestore_csum = {true, false}
   bluestore_csum_type = {crc32c, crc32c_{8,16}, ...}
   bluestore_csum_min_chunk_size = 4k      (*)
   bluestore_csum_max_chunk_size = 64k     (*)
   bluestore_compression = {force, aggressive, passive, none}
Actually this option along with compression_hint result in a single flag:
compress or not. Any rationale for not using that simple flag?
There are currently two persistent hints: compressible and incompressible.
Aggressive will compress unless incompressible, whereas passive will
compress if compressible.  Either way it varies per object, though, so a
single flag isn't sufficient.
So final approach is to have bluestore_compression switch at pool level (and per-store too) and compression_hint at object one, right? Hence we don't need compression_hint per pool. In fact my original point was that the latter(along with bluestore_compression) is IMHO an overkill and can be substituted with simple enable_compression flag.

My original idea here was to have bluestore_compression on per-store basis and
compression_hint of per pool one. This way one can receive pretty flexible
control at both storage and pool level - see my question above.
Yeah, I'm not sure how complex it's worth getting.  To get complete
control, we probably need a default *and* the min/max allowed range for
each option in order to bound what you can choose per-pool, but I think
that is probably overkill.

A bit easier would be to have

  bluestore_csum_override = {,true,false}
  bluestore_csum_type_override = {, crc32c, crc32c_{8,16}}
  bluestore_compression_override = {, force, aggressive, passive, none}

and have them blank by default (no override).  For the numeric options
we'd need to use 0 to mean 'no override'.

What do you think?
IMO per-pool settings (if set) have to override corresponding per-store one by default. But one should be able to turn off that behavior with 'one-click' and force specific per-store setting to prevail. Hence we don't need 'override' settings at per-pool level but should have one at per-store one. E.g.

Pool A has:
bluestore_csum = true
Pool B:
bluestore_csum=<not set>
OSD:
bluestore_csum=  false [/force]

if /'force' is omitted - actual settings are
Pool A:
bluestore_csum=true
Pool B:
bluestore_csum=false

else
Pool A:
bluestore_csum=false
Pool B:
bluestore_csum=false

An additional 'bluestore_force_all_settings=true' can be introduced to force all per-store settings to prevail too.
   bluestore_compression_algorithm = {snappy, zlib, ...}
   bluestore_compression_min_blob_size = 256k
   bluestore_compression_max_blob_size = 4M
   bluestore_compression_required_ratio = .875

(*) These currently have a different name but aren't used yet.  Working on
a PR to change that.

What's missing is your 'compress_hint'.  We can call that
'compression_hint' to align with the names above?

   compression_hint = {compressible, incompressible, ...}

The main changes from your PR that I think we need to make are:

* These options should be part of the pool_opts_t structure in pg_pool_t
(which is a set of optional key/value-like parameters for the pool).

* We can add a new ObjectStore operation that passes down parameters for a
collection, and have the OSD pass these all in for each PG collection when
the pool properties change.  That way ObjectStore doesn't need to persist
these options at all--just store the ones it understands in memory, and
the OSD will always reset them on startup etc.
One, probably silly, question here - do pool and collection have 1 to 1
relation? It seemed to me that they don't and hence we can't store per-pool
settings at collection level without some additional mapping: pool -> setting.
Also this requires some means to remove pool settings entry when pool goes
away...
It's 1:1 mapping of PG to collection, but lots of PGs per pool.  So when
the OSD gets a pool change, it'll set the same flags for all local PGs in
that pool.  A bit of duplication, but it keeps the interface
uncomplicated (no need to teach ObjectStore about pools).
And PG belongs to single pool only, right?

And another question - are there any means to receive notification when pool
setting changes. Similar to the one we have for OSD settings. We need that to
trigger that new Object Store operation to update pool settings.
There is already some machinery to do this (the pg_pool_t struct has an
epoch field).  See void PGPool::update(OSDMapRef map) in PG.cc.
Will do, thanks.
sage
Thanks,
Igor
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux