On Wed, 21 Jun 2017, Piotr Dałek wrote:
> On 17-06-14 03:44 PM, Sage Weil wrote:
> > On Wed, 14 Jun 2017, Paweł Sadowski wrote:
> > > On 04/13/2017 04:23 PM, Piotr Dałek wrote:
> > > > On 04/06/2017 03:25 PM, Sage Weil wrote:
> > > > > On Thu, 6 Apr 2017, Piotr Dałek wrote:
> > > > > > [snip]
> > > > >
> > > > > I think the solution here is to use sparse_read during recovery. The
> > > > > PushOp data representation already supports it; it's just a matter of
> > > > > skipping the zeros. The recovery code could also have an option to
> > > > > check
> > > > > for fully-zero regions of the data and turn those into holes as
> > > > > well. For
> > > > > ReplicatedBackend, see build_push_op().
> > > >
> > > > So far it turns out that there's even easier solution, we just enabled
> > > > "filestore seek hole" on some test cluster and that seems to fix the
> > > > problem for us. We'll see if fiemap works too.
> > > >
> > >
> > > Is it safe to enable "filestore seek hole", are there any tests that
> > > verifies that everything related to RBD works fine with this enabled?
> > > Can we make this enabled by default?
> >
> > We would need to enable it in the qa environment first. The risk here is
> > that users run a broad range of kernels and we are exposing ourselves to
> > any bugs in any kernel version they may run. I'd prefer to leave it off
> > by default.
>
> That's a common regression? If not, we could blacklist particular kernels and
> call it a day.
> > We can enable it in the qa suite, though, which covers
> > centos7 (latest kernel) and ubuntu xenial and trusty.
>
> +1. Do you need some particular PR for that?
Sure. How about a patch that adds the config option to several of the
files in qa/suites/rados/thrash/thrashers?
> > > I tested on few of our production images and it seems that about 30% is
> > > sparse. This will be lost on any cluster wide event (add/remove nodes,
> > > PG grow, recovery).
> > >
> > > How this is/will be handled in BlueStore?
> >
> > BlueStore exposes the same sparseness metadata that enabling the
> > filestore seek hole or fiemap options does, so it won't be a problem
> > there.
> >
> > I think the only thing that we could potentially add is zero detection
> > on writes (so that explicitly writing zeros consumes no space). We'd
> > have to be a bit careful measuring the performance impact of that check on
> > non-zero writes.
>
> I saw that RBD (librbd) does that - replacing writes with discards when buffer
> contains only zeros. Some code that does the same in librados could be added
> and it shouldn't impact performance much, current implementation of
> mem_is_zero is fast and shouldn't be a big problem.
I'd rather not have librados silently translating requests; I think it
makes more sense to do any zero checking in bluestore. _do_write_small
and _do_write_big already break writes into (aligned) chunks; that would
be an easy place to add the check.
sage
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com