Re: Sparse file info in filestore not propagated to other OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 17-06-14 03:44 PM, Sage Weil wrote:
On Wed, 14 Jun 2017, Paweł Sadowski wrote:
On 04/13/2017 04:23 PM, Piotr Dałek wrote:
On 04/06/2017 03:25 PM, Sage Weil wrote:
On Thu, 6 Apr 2017, Piotr Dałek wrote:
[snip]

I think the solution here is to use sparse_read during recovery.  The
PushOp data representation already supports it; it's just a matter of
skipping the zeros.  The recovery code could also have an option to
check
for fully-zero regions of the data and turn those into holes as
well.  For
ReplicatedBackend, see build_push_op().

So far it turns out that there's even easier solution, we just enabled
"filestore seek hole" on some test cluster and that seems to fix the
problem for us. We'll see if fiemap works too.


Is it safe to enable "filestore seek hole", are there any tests that
verifies that everything related to RBD works fine with this enabled?
Can we make this enabled by default?

We would need to enable it in the qa environment first.  The risk here is
that users run a broad range of kernels and we are exposing ourselves to
any bugs in any kernel version they may run.  I'd prefer to leave it off
by default.

That's a common regression? If not, we could blacklist particular kernels and call it a day.
 > We can enable it in the qa suite, though, which covers
centos7 (latest kernel) and ubuntu xenial and trusty.

+1. Do you need some particular PR for that?

I tested on few of our production images and it seems that about 30% is
sparse. This will be lost on any cluster wide event (add/remove nodes,
PG grow, recovery).

How this is/will be handled in BlueStore?

BlueStore exposes the same sparseness metadata that enabling the
filestore seek hole or fiemap options does, so it won't be a problem
there.

I think the only thing that we could potentially add is zero detection
on writes (so that explicitly writing zeros consumes no space).  We'd
have to be a bit careful measuring the performance impact of that check on
non-zero writes.

I saw that RBD (librbd) does that - replacing writes with discards when buffer contains only zeros. Some code that does the same in librados could be added and it shouldn't impact performance much, current implementation of mem_is_zero is fast and shouldn't be a big problem.

--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovh.com/us/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux