Hi, I've spotted what may be a small bug (or an unavoidable feature?) with the way a gluster volume reports free space while a replicated brick is re-syncing, or it may be that there's a setting I need to change. Using gluster 3.5.2 on CentOS 7, I created a volume with 3 servers using 3 replicas. The servers were very different specs from each other, and had varying disk sizes. # gluster volume status Status of volume: gv0 Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick data210:/data/brick/gv0 49153 Y 2131 Brick data310:/data/brick/gv0 49152 Y 2211 Brick data410:/data/brick/gv0 49152 Y 2346 NFS Server on localhost 2049 Y 2360 Self-heal Daemon on localhost N/A Y 2364 NFS Server on data310 2049 Y 2225 Self-heal Daemon on data310 N/A Y 2229 NFS Server on 172.20.1.2 2049 Y 2146 Self-heal Daemon on 172.20.1.2 N/A Y 2141 Task Status of Volume gv0 ------------------------------------------------------------------------------ Sensibly, to the clients mounting the volume, Gluster showed the free space as being the amount of free space on the smallest brick. I wrote about 120GB of data to the cluster, and then simulated a brick failure and replacement by doing this on the server that had the smallest disks (I didn't have a 4th server to hand to introduce): * stopped the gluster service * killed any remaining gluster processes * uninstalled gluster (yum remove glusterfs-server glusterfs) * deleted /var/lib/glusterd/ * deleted /data/brick/gv0/ I then re-installed gluster and re-introduced the server to the cluster by following instructions here: http://gluster.org/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server What I noticed while it was re-syncing the data back to the 'new' brick was that, on the client, the free space *and* the used space were values taken from this smallest brick that had not yet finished rebuilding data. As these are test servers, /data/brick is on the root file system: client# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/centos-root 146933660 1273948 145659712 1% / data310:gv0 234708992 90838272 143870720 39% /mnt/gv0 brick-server# df Filesystem 1K-blocks Used Available Use% Mounted on /dev/mapper/centos-root 234709020 90860416 143848604 39% / /dev/sda1 508588 177704 330884 35% /boot The problem with this is that when data has finished replicating, the smallest brick will be over 50% full. It is therefore possible for a client to (while the brick is rebuilding) write too much data to the volume such that the smallest brick will be unable to hold all of the volume's data once rebuild has finished, ie oversubscribe the space on the brick. This is obviously more likely to hit a problem if clients write to a rebuilding volume that's quite full, and the new brick has only just started to replicate, so hopefully a rare case. What I think would be a more failsafe behaviour is for gluster to report to clients the volume size based on the smallest brick in the replica group, but the space used based on the most spaced used on one of the up-to-date bricks. I appreciate this may not be a value so easily derived if the file system containing the brick also contains other data. Is this a setting I need to change, or is this a bug? Cheers, Kingsley. _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://supercolony.gluster.org/mailman/listinfo/gluster-users