How glusterFs deal data traffic when it do 'rebalance layout and migrate'?

JinHwan Hwang <calanchue@xxxxxxxxx> · Fri, 7 Feb 2014 17:10:27 +0900

I have read architecture documents about GlusterFS, Ceph and Swift. Mostly I can’t understand and worry about GlusterFS is rebalacing. From my understanding, It looks like that GlusterFS move too much data when it do ‘rebalance layout and migrate’, especially compare with solution of Ceph and Swift. I’m not sure that i understand it correctly. 

It look like Ceph and Swift map file to disk with similar way. They map file to disk like this. 
filename->hashfile name-> map to PG(placement group at ceph) or partition(at swif)-> PG or partition to disk.

And if i did not misunderstand, GlusterFs do this way. 
filename->hashfile name->map to disk

Existence of PG(or parition) makes only minimul data migration from old disks to new disks. That way they flatten storage cluster with minimul data migration. Theoretically, worst case scenario, ceph and swift moves amount of data which is equal to newly added disk size(at this point, I assumed that there only exist homogeneous files)

But GlusterFs, from my understand, just add new disk to tail of hash ring. And I found a article which is wrote by jeff darcy. He also said that there really exist this issue. 
http://hekafs.org/index.php/2012/03/glusterfs-algorithms-distribution/
If storage cluster get larger and larger 50% data replacement will be more and more horrible issue. But could not find worries about this issue. How do they works in real field? I’m worrying about that there will be too much data traffic while ‘rebalancing and migration’. Maybe just rebalance the layout and forget about migration can be solution. In this way, exchange of little latency maybe relieve migration traffic and time. But this looks a patchwork solution. Moreover,without migration and flattening, Write requests will converge upon newly added disk.

Am i understand correctly about glusterFs? Above articles about glusterFs is quite old and I couldn’t find any docuemnts about architecture except it. If my understanding is correct, How glusterFs user solve those kind of issue(huge migration data traffic, converge of write request)? Are there some design decision or target usecase?

Thanks in advances for any helps. 

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://supercolony.gluster.org/mailman/listinfo/gluster-users