Dear David and Igor, thank you very much for your help. I have one more question about chunk sizes and data granularity on bluestore and will summarize the information I got on bluestore compression at the end. 1) Compression ratio --------------------------- Following Igor's explanation, I tried to understand the numbers for compressed_allocated and compressed_original and am somewhat stuck with figuring out how bluestore arithmetic works. I created a 32GB file of zeros using dd with write size bs=8M on a cephfs with ceph.dir.layout="stripe_unit=4194304 stripe_count=1 object_size=4194304 pool=con-fs-data-test" The data pool is an 8+2 erasure coded pool with properties pool 37 'con-fs-data-test' erasure size 10 min_size 9 crush_rule 11 object_hash rjenkins pg_num 900 pgp_num 900 last_change 9970 flags hashpspool,ec_overwrites stripe_width 32768 compression_mode aggressive application cephfs As I understand EC pools, a 4M object is split into 8x0.5M data shards that are stored together with 2x0.5M coding shards on one OSD each. So, I would expect a full object write to put a 512K chunk on each OSD in the PG. Looking at some config options of one of the OSDs, I see: "bluestore_compression_max_blob_size_hdd": "524288", "bluestore_compression_min_blob_size_hdd": "131072", "bluestore_max_blob_size_hdd": "524288", "bluestore_min_alloc_size_hdd": "65536", >From this, I would conclude that the largest chunk size is 512K, which also equals compression_max_blob_size. The minimum allocation size is 64K for any object. What I would expect now is, that the full object writes to cephfs create chunk sizes of 512M per OSD in the PG, meaning that with an all-zero file I should observe a compresses_allocated ratio of 64K/512K=0.125 instead of the 0.5 reported below. It looks like that chunks of 128K are written instead of 512K. I'm happy with the 64K granularity, but the observed maximum chunk size seems a factor of 4 too small. Where am I going wrong, what am I overlooking? 2) Bluestore compression configuration --------------------------------------------------- If I understand David correctly, pool and OSD settings do *not* override each other, but are rather *combined* into a resulting setting as follows. Let 0 - (n)one 1 - (p)assive 2 - (a)ggressive 3 - (f)orce ? - (u)nset be the 4+1 possible settings of compression modes with numeric values assigned as shown. Then, the resulting numeric compression mode for data in a pool on a specific OSD is res_compr_mode = min(mode OSD, mode pool) or in form of a table: pool | n p a f u --+-------------- n | n n n n n O p | n p p p ? S a | n p a a ? D f | n p a f ? u | n ? ? ? u which would allow for the flexible configuration as mentioned by David below. I'm actually not sure if I can confirm this. I have some pools where compression_mode is not set and which reside on separate OSDs with compression enabled, yet there is compressed data on these OSDs. Wondering if I polluted my test with "ceph config set bluestore_compression_mode aggressive" that I executed earlier, or if my above interpretation is still wrong. Does the setting issued with "ceph config set bluestore_compression_mode aggressive" apply to pools with 'compression_mode' not set on the pool (see question marks in table above, what is the resulting mode?). What I would like to do is enable compression on all OSDs, enable compression on all data pools and disable compression on all meta data pools. Data and meta data pools might share OSDs in the future. The above table says I should be able to do just that by being explicit. Many thanks again and best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Igor Fedotov <ifedotov@xxxxxxx> Sent: 19 October 2018 23:41 To: Frank Schilder; David Turner Cc: ceph-users@xxxxxxxxxxxxxx Subject: Re: bluestore compression enabled but no data compressed Hi Frank, On 10/19/2018 2:19 PM, Frank Schilder wrote: > Hi David, > > sorry for the slow response, we had a hell of a week at work. > > OK, so I had compression mode set to aggressive on some pools, but the global option was not changed, because I interpreted the documentation as "pool settings take precedence". To check your advise, I executed > > ceph tell "osd.*" config set bluestore_compression_mode aggressive > > and dumped a new file consisting of null-bytes. Indeed, this time I observe compressed objects: > > [root@ceph-08 ~]# ceph daemon osd.80 perf dump | grep blue > "bluefs": { > "bluestore": { > "bluestore_allocated": 2967207936, > "bluestore_stored": 3161981179, > "bluestore_compressed": 24549408, > "bluestore_compressed_allocated": 261095424, > "bluestore_compressed_original": 522190848, > > Obvious questions that come to my mind: > > 1) I think either the documentation is misleading or the implementation is not following documented behaviour. I observe that per pool settings do *not* override globals, but the documentation says they will. (From doc: "Sets the policy for the inline compression algorithm for underlying BlueStore. This setting overrides the global setting of bluestore compression mode.") Will this be fixed in the future? Should this be reported? > > Remark: When I look at "compression_mode" under "http://docs.ceph.com/docs/luminous/rados/operations/pools/?highlight=bluestore%20compression#set-pool-values" it actually looks like a copy-and-paste error. The doc here talks about compression algorithm (see quote above) while the compression mode should be explained. Maybe that is worth looking at? > > 2) If I set the global to aggressive, do I now have to disable compression explicitly on pools where I don't want compression or is the pool default still "none"? Right now, I seem to observe that compression is still disabled by default. > > 3) Do you know what the output means? What is the compression ratio? bluestore_compressed/bluestore_compressed_original=0.04 or bluestore_compressed_allocated/bluestore_compressed_original=0.5? The second ratio does not look too impressive given the file contents. "bluestore_compressed_original" is the amount of user data subjected to compression. "bluestore_compressed" - pure compressed data size. I.e. amount of data compression algorithm produced. Hence actual compression rate is 0.04. "bluestore_compressed_allocated" - amount of space required to keep that compressed data. This shows an overhead for data store caused by allocation granularity (which is 16K (SSD) or 64K(HDD) by default). E.g. if you need to keep just a single byte object BlueStore need 16/64K of disk space for it anyway. it's somewhat unrelated to compression and uncompressed data suffer from the same penalties. But this exact counter shows actual size at the disk specifically for compressed data. So actual compression ratio in your case is 0.5 and that's not a compression algorithm fault. You can improve this ration to some degree by writing in a larger block sizes but granularity overhead will exist in most cases anyway. > > 4) Is there any way to get uncompressed data compressed as a background task like scrub? Generally no but perhaps it might occur during backfilling. Never verified that though. > If you have the time to look at these questions, this would be great. Most importantly right now is that I got it to work. > > Thanks for your help, > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: ceph-users <ceph-users-bounces@xxxxxxxxxxxxxx> on behalf of Frank Schilder <frans@xxxxxx> > Sent: 12 October 2018 17:00 > To: David Turner > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: bluestore compression enabled but no data compressed > > Hi David, > > thanks, now I see what you mean. If you are right, that would mean that the documentation is wrong. Under "http://docs.ceph.com/docs/master/rados/operations/pools/#set-pool-values" is stated that "Sets inline compression algorithm to use for underlying BlueStore. This setting overrides the global setting of bluestore compression algorithm". In other words, the global setting should be irrelevant if compression is enabled on a pool. > > Well, I will try how setting both to "aggressive" or "force" works out and let you know. > > Thanks and have a nice weekend, > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: David Turner <drakonstein@xxxxxxxxx> > Sent: 12 October 2018 16:50:31 > To: Frank Schilder > Cc: ceph-users@xxxxxxxxxxxxxx > Subject: Re: bluestore compression enabled but no data compressed > > If you go down just a little farther you'll see the settings that you put into your ceph.conf under the osd section (although I'd probably do global). That's where the OSDs get the settings from. As a note, once these are set, future writes will be compressed (if they match the compression settings which you can see there about minimum ratios, blob sizes, etc). To compress current data, you need to re-write it. > > On Fri, Oct 12, 2018 at 10:41 AM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx>> wrote: > Hi David, > > thanks for your quick answer. When I look at both references, I see exactly the same commands: > > ceph osd pool set {pool-name} {key} {value} > > where on one page only keys specific for compression are described. This is the command I found and used. However, I can't see any compression happening. If you know about something else than "ceph osd pool set" - commands, please let me know. > > Best regards, > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: David Turner <drakonstein@xxxxxxxxx<mailto:drakonstein@xxxxxxxxx>> > Sent: 12 October 2018 15:47:20 > To: Frank Schilder > Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx> > Subject: Re: bluestore compression enabled but no data compressed > > It's all of the settings that you found in your first email when you dumped the configurations and such. http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/#inline-compression > > On Fri, Oct 12, 2018 at 7:36 AM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx><mailto:frans@xxxxxx<mailto:frans@xxxxxx>>> wrote: > Hi David, > > thanks for your answer. I did enable compression on the pools as described in the link you sent below (ceph osd pool set sr-fs-data-test compression_mode aggressive, I also tried force to no avail). However, I could not find anything on enabling compression per OSD. Could you possibly provide a source or sample commands? > > Thanks and best regards, > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: David Turner <drakonstein@xxxxxxxxx<mailto:drakonstein@xxxxxxxxx><mailto:drakonstein@xxxxxxxxx<mailto:drakonstein@xxxxxxxxx>>> > Sent: 09 October 2018 17:42 > To: Frank Schilder > Cc: ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>> > Subject: Re: bluestore compression enabled but no data compressed > > When I've tested compression before there are 2 places you need to configure compression. On the OSDs in the configuration settings that you mentioned, but also on the [1] pools themselves. If you have the compression mode on the pools set to none, then it doesn't matter what the OSDs configuration is and vice versa unless you are using the setting of force. If you want to default compress everything, set pools to passive and osds to aggressive. If you want to only compress specific pools, set the osds to passive and the specific pools to aggressive. Good luck. > > > [1] http://docs.ceph.com/docs/mimic/rados/operations/pools/#set-pool-values > > On Tue, Sep 18, 2018 at 7:11 AM Frank Schilder <frans@xxxxxx<mailto:frans@xxxxxx><mailto:frans@xxxxxx<mailto:frans@xxxxxx>><mailto:frans@xxxxxx<mailto:frans@xxxxxx><mailto:frans@xxxxxx<mailto:frans@xxxxxx>>>> wrote: > I seem to have a problem getting bluestore compression to do anything. I followed the documentation and enabled bluestore compression on various pools by executing "ceph osd pool set <pool-name> compression_mode aggressive". Unfortunately, it seems like no data is compressed at all. As an example, below is some diagnostic output for a data pool used by a cephfs: > > [root@ceph-01 ~]# ceph --version > ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) > > All defaults are OK: > > [root@ceph-01 ~]# ceph --show-config | grep compression > [...] > bluestore_compression_algorithm = snappy > bluestore_compression_max_blob_size = 0 > bluestore_compression_max_blob_size_hdd = 524288 > bluestore_compression_max_blob_size_ssd = 65536 > bluestore_compression_min_blob_size = 0 > bluestore_compression_min_blob_size_hdd = 131072 > bluestore_compression_min_blob_size_ssd = 8192 > bluestore_compression_mode = none > bluestore_compression_required_ratio = 0.875000 > [...] > > Compression is reported as enabled: > > [root@ceph-01 ~]# ceph osd pool ls detail > [...] > pool 24 'sr-fs-data-test' erasure size 8 min_size 7 crush_rule 10 object_hash rjenkins pg_num 50 pgp_num 50 last_change 7726 flags hashpspool,ec_overwrites stripe_width 24576 compression_algorithm snappy compression_mode aggressive application cephfs > [...] > > [root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_mode > compression_mode: aggressive > [root@ceph-01 ~]# ceph osd pool get sr-fs-data-test compression_algorithm > compression_algorithm: snappy > > We dumped a 4Gib file with dd from /dev/zero. Should be easy to compress with excellent ratio. Search for a PG: > > [root@ceph-01 ~]# ceph pg ls-by-pool sr-fs-data-test > PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES LOG DISK_LOG STATE STATE_STAMP VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP > 24.0 15 0 0 0 0 62914560 77 77 active+clean 2018-09-14 01:07:14.593007 7698'77 7735:142 [53,47,36,30,14,55,57,5] 53 [53,47,36,30,14,55,57,5] 53 7698'77 2018-09-14 01:07:14.592966 0'0 2018-09-11 08:06:29.309010 > > There is about 250MB data on the primary OSD, but noting seems to be compressed: > > [root@ceph-07 ~]# ceph daemon osd.53 perf dump | grep blue > [...] > "bluestore_allocated": 313917440, > "bluestore_stored": 264362803, > "bluestore_compressed": 0, > "bluestore_compressed_allocated": 0, > "bluestore_compressed_original": 0, > [...] > > Just to make sure, I checked one of the objects' contents: > > [root@ceph-01 ~]# rados ls -p sr-fs-data-test > 10000000004.0000039c > [...] > 10000000004.0000039f > > It is 4M chunks ... > [root@ceph-01 ~]# rados -p sr-fs-data-test stat 10000000004.0000039f > sr-fs-data-test/10000000004.0000039f mtime 2018-09-11 14:39:38.000000, size 4194304 > > ... with all zeros: > > [root@ceph-01 ~]# rados -p sr-fs-data-test get 10000000004.0000039f obj > > [root@ceph-01 ~]# hexdump -C obj > 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| > * > 00400000 > > All as it should be, except for compression. Am I overlooking something? > > Best regards, > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>><mailto:ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx><mailto:ceph-users@xxxxxxxxxxxxxx<mailto:ceph-users@xxxxxxxxxxxxxx>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com