Re: Sparse file info in filestore not propagated to other OSDs

Sage Weil <sage@xxxxxxxxxxxx> · Wed, 21 Jun 2017 13:24:55 +0000 (UTC)

On Wed, 21 Jun 2017, Piotr Dałek wrote:
> On 17-06-14 03:44 PM, Sage Weil wrote:
> > On Wed, 14 Jun 2017, Paweł Sadowski wrote:
> > > On 04/13/2017 04:23 PM, Piotr Dałek wrote:
> > > > On 04/06/2017 03:25 PM, Sage Weil wrote:
> > > > > On Thu, 6 Apr 2017, Piotr Dałek wrote:
> > > > > > [snip]
> > > > > 
> > > > > I think the solution here is to use sparse_read during recovery.  The
> > > > > PushOp data representation already supports it; it's just a matter of
> > > > > skipping the zeros.  The recovery code could also have an option to
> > > > > check
> > > > > for fully-zero regions of the data and turn those into holes as
> > > > > well.  For
> > > > > ReplicatedBackend, see build_push_op().
> > > > 
> > > > So far it turns out that there's even easier solution, we just enabled
> > > > "filestore seek hole" on some test cluster and that seems to fix the
> > > > problem for us. We'll see if fiemap works too.
> > > > 
> > > 
> > > Is it safe to enable "filestore seek hole", are there any tests that
> > > verifies that everything related to RBD works fine with this enabled?
> > > Can we make this enabled by default?
> > 
> > We would need to enable it in the qa environment first.  The risk here is
> > that users run a broad range of kernels and we are exposing ourselves to
> > any bugs in any kernel version they may run.  I'd prefer to leave it off
> > by default.
> 
> That's a common regression? If not, we could blacklist particular kernels and
> call it a day.
>  > We can enable it in the qa suite, though, which covers
> > centos7 (latest kernel) and ubuntu xenial and trusty.
> 
> +1. Do you need some particular PR for that?

Sure.  How about a patch that adds the config option to several of the 
files in qa/suites/rados/thrash/thrashers?

> > > I tested on few of our production images and it seems that about 30% is
> > > sparse. This will be lost on any cluster wide event (add/remove nodes,
> > > PG grow, recovery).
> > > 
> > > How this is/will be handled in BlueStore?
> > 
> > BlueStore exposes the same sparseness metadata that enabling the
> > filestore seek hole or fiemap options does, so it won't be a problem
> > there.
> > 
> > I think the only thing that we could potentially add is zero detection
> > on writes (so that explicitly writing zeros consumes no space).  We'd
> > have to be a bit careful measuring the performance impact of that check on
> > non-zero writes.
> 
> I saw that RBD (librbd) does that - replacing writes with discards when buffer
> contains only zeros. Some code that does the same in librados could be added
> and it shouldn't impact performance much, current implementation of
> mem_is_zero is fast and shouldn't be a big problem.

I'd rather not have librados silently translating requests; I think it 
makes more sense to do any zero checking in bluestore.  _do_write_small 
and _do_write_big already break writes into (aligned) chunks; that would 
be an easy place to add the check.

sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com