Thanks for the input Greg, we've submitted the patch
to the ceph github repo https://github.com/ceph/ceph/pull/21222
Kevin
On 04/02/2018 01:10 PM, Gregory Farnum
wrote:
On Mon, Apr 2, 2018 at 8:21 AM Kevin Hrpcek < kevin.hrpcek@xxxxxxxxxxxxx>
wrote:
Hello,
We use python librados bindings for object operations on
our cluster. For a long time we've been using 2 ec pools
with k=4 m=1 and a fixed 4MB read/write size with the
python bindings. During preparations for migrating all
of our data to a k=6 m=2 pool we've discovered that ec
pool alignment size is dynamic and the librados bindings
for python and go fail to write objects because they are
not aware of the the pool alignment size and therefore
cannot adjust the write block size to be a multiple of
that. The ec pool alignment size seems to be (k value *
4K) on new pools, but is only 4K on old pools from the
hammer days. We haven't been able to find much useful
documentation for this pool alignment setting other than
the librados docs (http://docs.ceph.com/docs/master/rados/api/librados)
rados_ioctx_pool_requires_alingment,
rados_ioctx_pool_requires_alignment2,
rados_ioctx_pool_required_alignment,
rados_ioctx_pool_required_alignment2. After going
through the rados binary source we found that the binary
is rounding the write op size for an ec pool to a
multiple of the pool alignment size (line ~1945
https://github.com/ceph/ceph/blob/master/src/tools/rados/rados.cc#L1945).
The min write op size can be figured out by writing to
an ec pool like this to get the binary to round up and
print it out `rados -b 1k -p $pool put .....`. All of
the support for being alignment aware is obviously
available but simply isn't available in the bindings,
we've only tested python and go.
We've gone ahead and submitted a patch and pull request
to the pycradox project which seems to be what was
merged into the ceph project for python bindings https://github.com/sileht/pycradox/pull/4.
It replicates getting the alignment size of the pool in
the python bindings so that we can then calculate the
proper op sizes for writing to a pool
We find it hard to believe that we're the only ones to
have run into this problem when using the bindings. Have
we missed something obvious for cluster configuration?
Or maybe we're just doing things different compared to
most users... Any insight would be appreciated as we'd
prefer to use an official solution rather than our
bindings fix for long term use.
It's not impossible you're the only user both using the
python bindings and targeting EC pools. Even now with
overwrites they're limited in terms of object class and omap
support, and I think all the direct-access users I've heard
about required at least one of omap or overwrites.
Just submit the patch to the Ceph github repo and it'll
get fixed up! :)
-Greg
Tested on Luminous 12.2.2 and 12.2.4.
Thanks,
Kevin
--
Kevin Hrpcek
Linux Systems Administrator
NASA SNPP Atmospheric SIPS
Space Science & Engineering Center
University of Wisconsin-Madison
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
|
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com