Re: Sparse file info in filestore not propagated to other OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 6 Apr 2017, Piotr Dałek wrote:
> On 04/06/2017 03:25 PM, Sage Weil wrote:
> > On Thu, 6 Apr 2017, Piotr Dałek wrote:
> > > Hello,
> > > 
> > > We recently had an interesting issue with RBD images and filestore on
> > > Jewel
> > > 10.2.5:
> > > We have a pool with RBD images, all of them mostly untouched (large areas
> > > of
> > > those images unused), and once we added 3 new OSDs to cluster, objects
> > > representing these images grew substantially on new OSDs: objects hosting
> > > unused areas of these images on original OSDs remained small (~8K of space
> > > actually used, 4M allocated), but on new OSDs were large (4M allocated
> > > *and*
> > > actually used). After investigation we concluded that Ceph didn't
> > > propagate
> > > sparse file information during cluster rebalance, resulting in correct
> > > data
> > > contents on all OSDs, but no sparse file data on new OSDs, hence disk
> > > space
> > > usage increase on those.
> > > 
> > > [..]
> > 
> > I think the solution here is to use sparse_read during recovery.  The
> > PushOp data representation already supports it; it's just a matter of
> > skipping the zeros.  The recovery code could also have an option to check
> > for fully-zero regions of the data and turn those into holes as well.  For
> > ReplicatedBackend, see build_push_op().
> 
> Can we abuse that to reduce amount of regular (client/inter-osd) network
> traffic?

Yeah... I wouldn't call it abuse :).  sparse_read() will use 
SEEK_HOLE/SEEK_DATA on filestore (if enabled).  On bluestore we have the 
metadata on-hand.  It may be a bit slower, though... more complexity 
and such.  They recently implemented something like this for the kernel 
NFS server and found it was faster for very sparse files but the rest of 
the time it was a fair bit slower.

sage

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux