On Thu, 18 Aug 2016, Igor Fedotov wrote: > On 17.08.2016 19:27, Sage Weil wrote: > > On Mon, 15 Aug 2016, Igor Fedotov wrote: > > > On 13.08.2016 21:56, Sage Weil wrote: > > > > Hi Igor, > > > > > > > > I took another look at > > > > > > > > https://github.com/ceph/ceph/pull/10556 > > > > > > > > You define three settings: > > > > > > > > compress_hint - determines if pool contains compressible / > > > > incompressible > > > > data > > > > compress_algorithm - permits to specify different compression > > > > algorithm > > > > compress_ratio - specifies maximum compression ratio > > > > > > > > I think we should extend this to include csum-related options. And use > > > > a > > > > consistent naming scheme that aligns with the config options where we > > > > just > > > > strip off the bluestore_ prefix. The relevant options are: > > > Sounds good! > > > The major questions here is how do these per-pool options correlate with > > > corresponding per-store ones? > > > One might suggest per-pool options to have higher priority over per-store > > > one. > > > But I'm not sure that the best option. Sometimes user might want to > > > disable/alter corresponding option without enumerating all the pools by > > > simple > > > switching at per-store level. Hence we need to consider some means for > > > that. > > > > > > bluestore_csum = {true, false} > > > > bluestore_csum_type = {crc32c, crc32c_{8,16}, ...} > > > > bluestore_csum_min_chunk_size = 4k (*) > > > > bluestore_csum_max_chunk_size = 64k (*) > > > > bluestore_compression = {force, aggressive, passive, none} > > > Actually this option along with compression_hint result in a single flag: > > > compress or not. Any rationale for not using that simple flag? > > There are currently two persistent hints: compressible and incompressible. > > Aggressive will compress unless incompressible, whereas passive will > > compress if compressible. Either way it varies per object, though, so a > > single flag isn't sufficient. > So final approach is to have bluestore_compression switch at pool level (and > per-store too) and compression_hint at object one, right? Hence we don't need > compression_hint per pool. In fact my original point was that the latter(along > with bluestore_compression) is IMHO an overkill and can be substituted with > simple enable_compression flag. I think so. Having a pool compression_hint = compressible will accomplish the same thing as setting the pool compression = aggressive. > > > My original idea here was to have bluestore_compression on per-store basis > > > and > > > compression_hint of per pool one. This way one can receive pretty flexible > > > control at both storage and pool level - see my question above. > > Yeah, I'm not sure how complex it's worth getting. To get complete > > control, we probably need a default *and* the min/max allowed range for > > each option in order to bound what you can choose per-pool, but I think > > that is probably overkill. > > > > A bit easier would be to have > > > > bluestore_csum_override = {,true,false} > > bluestore_csum_type_override = {, crc32c, crc32c_{8,16}} > > bluestore_compression_override = {, force, aggressive, passive, none} > > > > and have them blank by default (no override). For the numeric options > > we'd need to use 0 to mean 'no override'. > > > > What do you think? > IMO per-pool settings (if set) have to override corresponding per-store > one by default. But one should be able to turn off that behavior with > 'one-click' and force specific per-store setting to prevail. > Hence we don't need 'override' settings at per-pool level but should have one > at per-store one. E.g. > > Pool A has: > bluestore_csum = true > Pool B: > bluestore_csum=<not set> > OSD: > bluestore_csum= false [/force] > > if /'force' is omitted - actual settings are > Pool A: > bluestore_csum=true > Pool B: > bluestore_csum=false > > else > Pool A: > bluestore_csum=false > Pool B: > bluestore_csum=false Hmm, this makes parsing more complicated but is probably less confusing. But what about numeric values? bluestore_compression_max_blob_size=1048576/force? This is partly why I like the (implementation) simplicity of bluestore_compression_max_blob_size_force=1048576 (or _override, whatever). > An additional 'bluestore_force_all_settings=true' can be introduced to force > all per-store settings to prevail too. Or maybe bluestore_ignore_per_collection_settings=true or the converse bluestore_per_collection_settings=false to be precise? > > > > bluestore_compression_algorithm = {snappy, zlib, ...} > > > > bluestore_compression_min_blob_size = 256k > > > > bluestore_compression_max_blob_size = 4M > > > > bluestore_compression_required_ratio = .875 > > > > > > > > (*) These currently have a different name but aren't used yet. Working > > > > on > > > > a PR to change that. > > > > > > > > What's missing is your 'compress_hint'. We can call that > > > > 'compression_hint' to align with the names above? > > > > > > > > compression_hint = {compressible, incompressible, ...} > > > > > > > > The main changes from your PR that I think we need to make are: > > > > > > > > * These options should be part of the pool_opts_t structure in pg_pool_t > > > > (which is a set of optional key/value-like parameters for the pool). > > > > > > > > * We can add a new ObjectStore operation that passes down parameters for > > > > a > > > > collection, and have the OSD pass these all in for each PG collection > > > > when > > > > the pool properties change. That way ObjectStore doesn't need to > > > > persist > > > > these options at all--just store the ones it understands in memory, and > > > > the OSD will always reset them on startup etc. > > > One, probably silly, question here - do pool and collection have 1 to 1 > > > relation? It seemed to me that they don't and hence we can't store > > > per-pool > > > settings at collection level without some additional mapping: pool -> > > > setting. > > > Also this requires some means to remove pool settings entry when pool goes > > > away... > > It's 1:1 mapping of PG to collection, but lots of PGs per pool. So when > > the OSD gets a pool change, it'll set the same flags for all local PGs in > > that pool. A bit of duplication, but it keeps the interface > > uncomplicated (no need to teach ObjectStore about pools). > And PG belongs to single pool only, right? Right. sage -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html