Re: librados python pool alignment size write failures

Kevin Hrpcek <kevin.hrpcek@xxxxxxxxxxxxx> · Tue, 3 Apr 2018 14:25:24 -0500



    Thanks for the input Greg, we've submitted the patch
      to the ceph github repo https://github.com/ceph/ceph/pull/21222

      
      Kevin

    
    On 04/02/2018 01:10 PM, Gregory Farnum
      wrote:

    
      On Mon, Apr 2, 2018 at 8:21 AM Kevin Hrpcek <kevin.hrpcek@xxxxxxxxxxxxx>
        wrote:

        
             Hello,

                
                We use python librados bindings for object operations on
                our cluster. For a long time we've been using 2 ec pools
                with k=4 m=1 and a fixed 4MB read/write size with the
                python bindings. During preparations for migrating all
                of our data to a k=6 m=2 pool we've discovered that ec
                pool alignment size is dynamic and the librados bindings
                for python and go fail to write objects because they are
                not aware of the the pool alignment size and therefore
                cannot adjust the write block size to be a multiple of
                that. The ec pool alignment size seems to be (k value *
                4K) on new pools, but is only 4K on old pools from the
                hammer days. We haven't been able to find much useful
                documentation for this pool alignment setting other than
                the librados docs (http://docs.ceph.com/docs/master/rados/api/librados)
                rados_ioctx_pool_requires_alingment,
                rados_ioctx_pool_requires_alignment2,
                rados_ioctx_pool_required_alignment,
                rados_ioctx_pool_required_alignment2. After going
                through the rados binary source we found that the binary
                is rounding the write op size for an ec pool to a
                multiple of the pool alignment size (line ~1945
                https://github.com/ceph/ceph/blob/master/src/tools/rados/rados.cc#L1945).
                The min write op size can be figured out by writing to
                an ec pool like this to get the binary to round up and
                print it out `rados -b 1k -p $pool put .....`. All of
                the support for being alignment aware is obviously
                available but simply isn't available in the bindings,
                we've only tested python and go. 

                
                We've gone ahead and submitted a patch and pull request
                to the pycradox project which seems to be what was
                merged into the ceph project for python bindings https://github.com/sileht/pycradox/pull/4.
                It replicates getting the alignment size of the pool in
                the python bindings so that we can then calculate the
                proper op sizes for writing to a pool

                
                We find it hard to believe that we're the only ones to
                have run into this problem when using the bindings. Have
                we missed something obvious for cluster configuration?
                Or maybe we're just doing things different compared to
                most users... Any insight would be appreciated as we'd
                prefer to use an official solution rather than our
                bindings fix for long term use.

              
          It's not impossible you're the only user both using the
            python bindings and targeting EC pools. Even now with
            overwrites they're limited in terms of object class and omap
            support, and I think all the direct-access users I've heard
            about required at least one of omap or overwrites.
          

          Just submit the patch to the Ceph github repo and it'll
            get fixed up! :)
          -Greg
           
          
                Tested on Luminous 12.2.2 and 12.2.4.

                
                Thanks,

                Kevin

              
              -- 
Kevin Hrpcek
Linux Systems Administrator
NASA SNPP Atmospheric SIPS
Space Science & Engineering Center
University of Wisconsin-Madison
            
            _______________________________________________

            ceph-users mailing list

            ceph-users@xxxxxxxxxxxxxx

            http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

          
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com