Forever growing data in ceph using RBD image

Alphe Salas <asalas@xxxxxxxxx> · Thu, 17 Jul 2014 10:11:09 -0400

Hello,
I would like to know if there is something planned to correct the 
"forever growing" effet when using rbd image.
My experience shows that the replicas of a rbd images are never 
discarded and never overwriten. Lets say my physical share is about 30 
TB I make an image of 13TB (half the real space - 25% of disfunction osd 
support). My experience shows that the rbd image is overwriten so if I 
top the 13TB once i get a 26TB of real space used (replicas set to 2) if 
I delete 8TB from those 13TB I see the real space used unchanged.
If I write back 4TB then ceph collapse it is nearfull and I have to go 
buy another 30TB integrate it to my cluster to hold the problem. But 
still soon I have in my ceph more useless replicas of "delete" datas 
than usefull data with they replicas.

Usually when I talk to dev team about this problem they tell me that the 
real problem is the lack of trim in XFS, but my own analysis shows that 
the real problem is ceph internal way to handle data. It is ceph that 
never discard any replicas and never "clean" itself to only keep records 
of the data in use.

If ceph was behaving properly then for a replicas set to 2 I would have 
my rbd image of 13 TB the 13TB replicas corresponding, and a fix 26TB
of overall used data. When I would "free" data in the rbd image the 
corresponding replicas would be considered as discarded by ceph and when 
the real data in the rbd image is overwriten their corresponding 
replicas would be overwriten too with the new data. That would show the 
overall data space used as fixed.

In case of the failure of 2 osd then the ceph system would have just 
enough space to clone the replicas of the missing data which actually is
not the case in a environement that reach near full state.

So my ask is what is planned to correct this problem?

Best regards,

--
Alphe Salas
I.T ingeneer
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html