Sparse file info in filestore not propagated to other OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

We recently had an interesting issue with RBD images and filestore on Jewel 10.2.5: We have a pool with RBD images, all of them mostly untouched (large areas of those images unused), and once we added 3 new OSDs to cluster, objects representing these images grew substantially on new OSDs: objects hosting unused areas of these images on original OSDs remained small (~8K of space actually used, 4M allocated), but on new OSDs were large (4M allocated *and* actually used). After investigation we concluded that Ceph didn't propagate sparse file information during cluster rebalance, resulting in correct data contents on all OSDs, but no sparse file data on new OSDs, hence disk space usage increase on those.

Example on test cluster, before growing it by one OSD:

ls:

osd-01-cluster: -rw-r--r-- 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-01-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0 osd-02-cluster: -rw-r--r-- 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-02-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0 osd-03-cluster: -rw-r--r-- 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-03-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0

du:

osd-01-cluster: 12 /var/lib/ceph/osd-01-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0 osd-02-cluster: 12 /var/lib/ceph/osd-02-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0 osd-03-cluster: 12 /var/lib/ceph/osd-03-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0


mon-01-cluster:~ # rbd diff test
Offset   Length  Type
8388608  4194304 data
16777216 4096    data
33554432 4194304 data
37748736 2048    data

And after growing it:

ls:

clush> find /var/lib/ceph/osd-*/current/0.*head/ -type f -name '*data*' -exec ls -l {} \+ osd-02-cluster: -rw-r--r-- 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-02-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0 osd-03-cluster: -rw-r--r-- 1 root root 4194304 Apr 6 09:18 /var/lib/ceph/osd-03-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0 osd-04-cluster: -rw-r--r-- 1 root root 4194304 Apr 6 09:25 /var/lib/ceph/osd-04-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0

du:

clush> find /var/lib/ceph/osd-*/current/0.*head/ -type f -name '*data*' -exec du -k {} \+ osd-02-cluster: 12 /var/lib/ceph/osd-02-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0 osd-03-cluster: 12 /var/lib/ceph/osd-03-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0 osd-04-cluster: 4100 /var/lib/ceph/osd-04-cluster/current/0.27_head/rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0

Note that "rbd\udata.12a474b0dc51.0000000000000008__head_2DD64767__0" grew from 12 to 4100KB when copied from other OSDs to osd-04.

Is this something to be expected? Is there any way to make it propagate the sparse file info? Or should we think about issuing a "fallocate -d"-like patch for writes on filestore?

(We're using kernel 3.13.0-45-generic but on 4.4.0-31-generic the issue remains; our XFS uses 4K bsize).

--
Piotr Dałek
piotr.dalek@xxxxxxxxxxxx
https://www.ovh.com/us/
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux