Re: RGW compression not compressing

Casey Bodley <cbodley@xxxxxxxxxx> · Thu, 7 Nov 2019 11:26:56 -0500

On 11/7/19 10:35 AM, Bryan Stillwell wrote:
Thanks Casey!

Hopefully this makes it in before 14.2.5.  Is there any way to tell the python boto or swiftclient modules to not send those headers?

It is likely to make 14.2.5.

You'd actually want to force these clients to send the headers as a 
workaround. If you're using boto2, you can inject the header in calls to 
set_contents_from_*(). For boto3, calls like put_object() take a 
StorageClass parameter.

Bryan

On Nov 7, 2019, at 8:04 AM, Casey Bodley <cbodley@xxxxxxxxxx> wrote:

Hi Bryan,

This is a bug related to storage classes. Compression does take effect
for requests that specify the storage class via the s3
x-amz-storage-class or swift x-object-storage-class header. But when
this header is absent, we default to the STANDARD storage class without
consulting its compression type. The fix is pending backport to nautilus
in https://tracker.ceph.com/issues/41981.

On 11/6/19 5:54 PM, Bryan Stillwell wrote:
Today I tried enabling RGW compression on a Nautilus 14.2.4 test cluster and found it wasn't doing any compression at all.  I figure I must have missed something in the docs, but I haven't been able to find out what that is and could use some help.

This is the command I used to enable zlib-based compression:

# radosgw-admin zone placement modify --rgw-zone=default --placement-id=default-placement --compression=zlib

I then restarted the radosgw process to activate the change (there's only 1 RGW in this test cluster):

# systemctl restart ceph-radosgw@radosgw.$(hostname -s)

I verified it was enabled correctly with:

# radosgw-admin zone get | jq -r '.placement_pools'
[
   {
     "key": "default-placement",
     "val": {
       "index_pool": "default.rgw.buckets.index",
       "storage_classes": {
         "STANDARD": {
           "data_pool": "default.rgw.buckets.data",
           "compression_type": "zlib"
         }
       },
       "data_extra_pool": "default.rgw.buckets.non-ec",
       "index_type": 0
     }
   }
]

Before starting the test I had nothing in the default.rgw.buckets.data pool:

# ceph df | grep default.rgw.buckets.data
     default.rgw.buckets.data      16         0 B           0         0 B         0       230 TiB

I then tried uploading a 1GiB file consisting of all 0s from /dev/zero with both S3 (boto) and Swift (swiftclient) and each time they used 1GiB of data on the cluster:

# ceph df -f json | jq -r '.' | grep -A9 default.rgw.buckets.data
       "name": "default.rgw.buckets.data",
       "id": 16,
       "stats": {
         "stored": 1073741824,
         "objects": 256,
         "kb_used": 1048576,
         "bytes_used": 1073741824,
         "percent_used": 1.4138463484414387e-06,
         "max_avail": 253148744646656
       }

The same thing was reported by bucket stats:

# radosgw-admin bucket stats --bucket=bs-test | jq -r '.usage'
{
   "rgw.main": {
     "size": 1073741824,
     "size_actual": 1073741824,
     "size_utilized": 1073741824,
     "size_kb": 1048576,
     "size_kb_actual": 1048576,
     "size_kb_utilized": 1048576,
     "num_objects": 1
   }
}

What am I missing?

Thanks,
Bryan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx