On 17-06-14 03:44 PM, Sage Weil wrote:
On Wed, 14 Jun 2017, Paweł Sadowski wrote:
On 04/13/2017 04:23 PM, Piotr Dałek wrote:
On 04/06/2017 03:25 PM, Sage Weil wrote:
On Thu, 6 Apr 2017, Piotr Dałek wrote:
[snip]
I think the solution here is to use sparse_read during recovery. The
PushOp data representation already supports it; it's just a matter of
skipping the zeros. The recovery code could also have an option to
check
for fully-zero regions of the data and turn those into holes as
well. For
ReplicatedBackend, see build_push_op().
So far it turns out that there's even easier solution, we just enabled
"filestore seek hole" on some test cluster and that seems to fix the
problem for us. We'll see if fiemap works too.
Is it safe to enable "filestore seek hole", are there any tests that
verifies that everything related to RBD works fine with this enabled?
Can we make this enabled by default?
We would need to enable it in the qa environment first. The risk here is
that users run a broad range of kernels and we are exposing ourselves to
any bugs in any kernel version they may run. I'd prefer to leave it off
by default.
That's a common regression? If not, we could blacklist particular kernels
and call it a day.
> We can enable it in the qa suite, though, which covers
centos7 (latest kernel) and ubuntu xenial and trusty.
+1. Do you need some particular PR for that?
I tested on few of our production images and it seems that about 30% is
sparse. This will be lost on any cluster wide event (add/remove nodes,
PG grow, recovery).
How this is/will be handled in BlueStore?
BlueStore exposes the same sparseness metadata that enabling the
filestore seek hole or fiemap options does, so it won't be a problem
there.
I think the only thing that we could potentially add is zero detection
on writes (so that explicitly writing zeros consumes no space). We'd
have to be a bit careful measuring the performance impact of that check on
non-zero writes.
I saw that RBD (librbd) does that - replacing writes with discards when
buffer contains only zeros. Some code that does the same in librados could
be added and it shouldn't impact performance much, current implementation of
mem_is_zero is fast and shouldn't be a big problem.
--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovh.com/us/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html