On Friday 16 October 2015 08:11 PM, Lindsay Mathieson wrote:
On 17 October 2015 at 00:26, Udo Giacomozzi <udo.giacomozzi@xxxxxxxxxx <mailto:udo.giacomozzi@xxxxxxxxxx>> wrote: To me this sounds like Gluster is not really suited for big files, like as the main storage for VMs - since they are being modified constantly. Depends :) Any replicated storage will have to heal its copies if they are written to when a node is down. So long as the files can still be read/written while being healed and the resource usage (CPU/Network) is not to high then it should be transparent - that's a major whole pint of a replicated filesystem. I'm guessing that like me, you are running your gluster storage on your VM Hosts and you like me are a chronic tweaker, so tend to reboot the hosts more than you should. In that case you might want to consider moving your gluster storage to seperate dedicated nodes that you can leave up. Or am I missing something? Perhaps Gluster can be configured to heal only modified parts of the files? Not that I know of.
self-healing in gluster by default syncs only modified parts of the files from a source node. Gluster does a rolling checksum of a file needing self-heal to identify regions of the file which need to be synced over the network. This rolling checksum computation can sometimes be expensive and there are plans to have a lighter self-healing in 3.8 with more granular changelogs that can do away with the need to do a rolling checksum.
You may also want to check sharding (currently in beta with 3.7) where large files are chunked to smaller fragments. With this scheme, self-healing (and rolling checksum computation thereby) happens only on those fragments that undergo changes when one of the nodes in a replicated set is offline. This has shown nice improvements in gluster's resource utilization during self-healing.
Regards, Vijay _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users