Self-heal impact on performance: is there a definitive answer?

milanraf at gmail.com (R.C.) · Fri, 25 Mar 2011 09:46:19 +0100

> I had been looking at doing the same for a while now. While Gluster is
> a great technology and I was very excited by what it could offer, I
> don't think Gluster would be suitable for this particular purpose.
> Pardon me if I'm wrong but since Gluster works on the file level, I
> feel that it isn't really optimal for huge VM images and VM drives.
> Any rebuild would require copying entire files and therefore consume
> heavy bandwidth and IO.

My tests haven't yet involved 2GB+ files like VM disks, but I'm still on 
"average sizes" uses.
I think that GlusterFS should be intended as a "live data" repository and 
not a static one.
If it was static or quasi-static, where's the point in having hardware 
failure resistance, inherent read/write parallelism, on-the-fly data data 
reconstruction and brick adding?

Even at file-level, data reconstruction should be moved in the background 
for unaccessed files, while accessed ones are automatically reconstructed 
when data is overwritten.

In hardware raid, mirror reconstruction involves even unused disk areas and 
this is real robust overhead!
Take in consideration DRDB: here data reconstrucion is at block level but 
doesn't chokes system performance.

As you said, GlusterFS is a real good approach to cluster filesystem, with 
very interesting features unseen elsewhere. At this point in development, 
with a yet stable system, I think two major aspects have to be taken in 
consideration:
- automatic self-heal
- background data reconstruction

always IMMO

Raf