On Mon, Apr 2, 2018 at 8:21 AM Kevin Hrpcek <kevin.hrpcek@xxxxxxxxxxxxx> wrote:
Hello,
We use python librados bindings for object operations on our cluster. For a long time we've been using 2 ec pools with k=4 m=1 and a fixed 4MB read/write size with the python bindings. During preparations for migrating all of our data to a k=6 m=2 pool we've discovered that ec pool alignment size is dynamic and the librados bindings for python and go fail to write objects because they are not aware of the the pool alignment size and therefore cannot adjust the write block size to be a multiple of that. The ec pool alignment size seems to be (k value * 4K) on new pools, but is only 4K on old pools from the hammer days. We haven't been able to find much useful documentation for this pool alignment setting other than the librados docs (http://docs.ceph.com/docs/master/rados/api/librados) rados_ioctx_pool_requires_alingment, rados_ioctx_pool_requires_alignment2, rados_ioctx_pool_required_alignment, rados_ioctx_pool_required_alignment2. After going through the rados binary source we found that the binary is rounding the write op size for an ec pool to a multiple of the pool alignment size (line ~1945 https://github.com/ceph/ceph/blob/master/src/tools/rados/rados.cc#L1945). The min write op size can be figured out by writing to an ec pool like this to get the binary to round up and print it out `rados -b 1k -p $pool put .....`. All of the support for being alignment aware is obviously available but simply isn't available in the bindings, we've only tested python and go.
We've gone ahead and submitted a patch and pull request to the pycradox project which seems to be what was merged into the ceph project for python bindings https://github.com/sileht/pycradox/pull/4. It replicates getting the alignment size of the pool in the python bindings so that we can then calculate the proper op sizes for writing to a pool
We find it hard to believe that we're the only ones to have run into this problem when using the bindings. Have we missed something obvious for cluster configuration? Or maybe we're just doing things different compared to most users... Any insight would be appreciated as we'd prefer to use an official solution rather than our bindings fix for long term use.
It's not impossible you're the only user both using the python bindings and targeting EC pools. Even now with overwrites they're limited in terms of object class and omap support, and I think all the direct-access users I've heard about required at least one of omap or overwrites.
Just submit the patch to the Ceph github repo and it'll get fixed up! :)
-Greg
_______________________________________________
Tested on Luminous 12.2.2 and 12.2.4.
Thanks,
Kevin
-- Kevin Hrpcek Linux Systems Administrator NASA SNPP Atmospheric SIPS Space Science & Engineering Center University of Wisconsin-Madison
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com