Hello,
I would like to know if there is something planned to correct the
"forever growing" effet when using rbd image.
My experience shows that the replicas of a rbd images are never
discarded and never overwriten. Lets say my physical share is about 30
TB I make an image of 13TB (half the real space - 25% of disfunction osd
support). My experience shows that the rbd image is overwriten so if I
top the 13TB once i get a 26TB of real space used (replicas set to 2) if
I delete 8TB from those 13TB I see the real space used unchanged.
If I write back 4TB then ceph collapse it is nearfull and I have to go
buy another 30TB integrate it to my cluster to hold the problem. But
still soon I have in my ceph more useless replicas of "delete" datas
than usefull data with they replicas.
Usually when I talk to dev team about this problem they tell me that the
real problem is the lack of trim in XFS, but my own analysis shows that
the real problem is ceph internal way to handle data. It is ceph that
never discard any replicas and never "clean" itself to only keep records
of the data in use.
If ceph was behaving properly then for a replicas set to 2 I would have
my rbd image of 13 TB the 13TB replicas corresponding, and a fix 26TB
of overall used data. When I would "free" data in the rbd image the
corresponding replicas would be considered as discarded by ceph and when
the real data in the rbd image is overwriten their corresponding
replicas would be overwriten too with the new data. That would show the
overall data space used as fixed.
In case of the failure of 2 osd then the ceph system would have just
enough space to clone the replicas of the missing data which actually is
not the case in a environement that reach near full state.
So my ask is what is planned to correct this problem?
Best regards,
--
Alphe Salas
I.T ingeneer
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html