Hi all, We are running glusterfs 3.0.3, installed from RHEL rpm's, over 30 nodes (not virtual machines). Our config pairs each 2 machines under replicate translator as mirrors, and over that aggregates the 15 resulting mirrors under stripe translator. Before we were using distribute instead, but we had the same problem. We are copying (using cp) a lot of files which reside under the same directory, and I have been monitoring the whole copy process to check where the failure starts. In the middle of the copy process we get this error: cp: cannot create regular file `/mnt/gluster_new/videos/1251512-3CA86758640A31E7770EBC7629AEC10F.mpg': No space left on device cp: cannot create regular file `/mnt/gluster_new/videos/1758650-3AF69C6B7FDAC0A40D85EABA8C85490D.mswmm': No space left on device cp: cannot create regular file `/mnt/gluster_new/videos/179183-A018B5FBE6DCCF04A3BB99C814CD9EAB.wmv': No space left on device cp: cannot create regular file `/mnt/gluster_new/videos/2448602-568B1ACF53675DC762485F2B26539E0D.wmv': No space left on device cp: cannot create regular file `/mnt/gluster_new/videos/626249-7B7FFFE0B9C56E9BE5733409CB73BCDF_300.jpg': No space left on device cp: cannot create regular file `/mnt/gluster_new/videos/1962299-B7CDFF12FB1AD41DF3660BF0C7045CBC.avi': No space left on device (hundreds of times) When I look at the storage distribution, I can see this: node 10 37G 14G 23G 38% /glusterfs_storage node 11 37G 14G 23G 37% /glusterfs_storage node 12 37G 14G 23G 37% /glusterfs_storage node 13 37G 14G 23G 37% /glusterfs_storage node 14 37G 13G 24G 36% /glusterfs_storage node 15 37G 13G 24G 36% /glusterfs_storage node 16 37G 13G 24G 35% /glusterfs_storage node 17 49G 12G 36G 26% /glusterfs_storage node 18 37G 12G 25G 33% /glusterfs_storage node 19 37G 12G 25G 33% /glusterfs_storage node 20 37G 14G 23G 38% /glusterfs_storage node 21 37G 14G 23G 37% /glusterfs_storage node 22 37G 14G 23G 37% /glusterfs_storage node 23 37G 14G 23G 37% /glusterfs_storage node 24 37G 13G 24G 36% /glusterfs_storage node 25 37G 13G 24G 36% /glusterfs_storage node 26 37G 13G 24G 35% /glusterfs_storage node 27 49G 12G 36G 26% /glusterfs_storage node 28 37G 12G 25G 33% /glusterfs_storage node 29 37G 12G 25G 33% /glusterfs_storage node 35 40G 40G 0 100% /glusterfs_storage node 36 40G 22G 18G 56% /glusterfs_storage node 37 40G 18G 22G 45% /glusterfs_storage node 38 40G 16G 24G 40% /glusterfs_storage node 39 40G 15G 25G 37% /glusterfs_storage node 45 40G 40G 0 100% /glusterfs_storage node 46 40G 22G 18G 56% /glusterfs_storage node 47 40G 18G 22G 45% /glusterfs_storage node 48 40G 16G 24G 40% /glusterfs_storage node 49 40G 15G 25G 37% /glusterfs_storage (node mirror pairings are 10-19 paired to 20-29, and 35-39 to 45-49) As you can see, distribution of space over the cluster is more or less rational over most of the nodes, except for node pair 35/45, which run out of space. Thus, every time I try to copy more data onto the cluster, I run into the mentioned "no space left on device" From the mountpoint point of view, the gluster free space looks like this: Filesystem 1M-blocks Used Available Use% Mounted on [...] /etc/glusterfs/glusterfs.vol.new 586617 240197 340871 42% /mnt/gluster_new So basically, I get out of space messages when there is around 340 Gb free on the cluster. I tried using distribute translator instead of stripe, in fact that was our first setup, but we thought maybe we are starting to copy a big file (usually we store really big .tar.gz backups here) and it runs out of space in the meanwhile, so we thought about using stripe, because theoretically glusterfs would in that case move and copy the next block of the file into another node. But in both cases (distribute and stripe) we run into the same problems. So I am wondering if this is a problem of a maximum number of files in a same directory or filesystem or what? Any ideas on this issue? Our config as follows: Each node has -------------- volume posix type storage/posix option directory /glusterfs_storage end-volume volume locks type features/posix-locks subvolumes posix end-volume volume server type protocol/server option transport-type tcp option auth.addr.locks.allow 10.20.0.* subvolumes locks end-volume ------------- And the mount client has: ======= ##### Old blades (37 gb each, except rsid-a-27, 49 gb) volume rsid-a-10 type protocol/client option transport-type tcp option remote-host 10.20.0.150 option remote-subvolume locks end-volume volume rsid-a-11 type protocol/client option transport-type tcp option remote-host 10.20.0.151 option remote-subvolume locks end-volume volume rsid-a-12 type protocol/client option transport-type tcp option remote-host 10.20.0.152 option remote-subvolume locks end-volume volume rsid-a-13 type protocol/client option transport-type tcp option remote-host 10.20.0.153 option remote-subvolume locks end-volume volume rsid-a-14 type protocol/client option transport-type tcp option remote-host 10.20.0.154 option remote-subvolume locks end-volume volume rsid-a-15 type protocol/client option transport-type tcp option remote-host 10.20.0.155 option remote-subvolume locks end-volume volume rsid-a-16 type protocol/client option transport-type tcp option remote-host 10.20.0.156 option remote-subvolume locks end-volume volume rsid-a-17 type protocol/client option transport-type tcp option remote-host 10.20.0.157 option remote-subvolume locks end-volume volume rsid-a-18 type protocol/client option transport-type tcp option remote-host 10.20.0.158 option remote-subvolume locks end-volume volume rsid-a-19 type protocol/client option transport-type tcp option remote-host 10.20.0.159 option remote-subvolume locks end-volume volume rsid-a-20 type protocol/client option transport-type tcp option remote-host 10.20.0.160 option remote-subvolume locks end-volume volume rsid-a-21 type protocol/client option transport-type tcp option remote-host 10.20.0.161 option remote-subvolume locks end-volume volume rsid-a-22 type protocol/client option transport-type tcp option remote-host 10.20.0.162 option remote-subvolume locks end-volume volume rsid-a-23 type protocol/client option transport-type tcp option remote-host 10.20.0.163 option remote-subvolume locks end-volume volume rsid-a-24 type protocol/client option transport-type tcp option remote-host 10.20.0.164 option remote-subvolume locks end-volume volume rsid-a-25 type protocol/client option transport-type tcp option remote-host 10.20.0.165 option remote-subvolume locks end-volume volume rsid-a-26 type protocol/client option transport-type tcp option remote-host 10.20.0.166 option remote-subvolume locks end-volume volume rsid-a-27 type protocol/client option transport-type tcp option remote-host 10.20.0.167 option remote-subvolume locks end-volume volume rsid-a-28 type protocol/client option transport-type tcp option remote-host 10.20.0.168 option remote-subvolume locks end-volume volume rsid-a-29 type protocol/client option transport-type tcp option remote-host 10.20.0.169 option remote-subvolume locks end-volume ##### New blades (40gb each) volume rsid-a-35 type protocol/client option transport-type tcp option remote-host 10.20.0.180 option remote-subvolume locks end-volume volume rsid-a-36 type protocol/client option transport-type tcp option remote-host 10.20.0.181 option remote-subvolume locks end-volume volume rsid-a-37 type protocol/client option transport-type tcp option remote-host 10.20.0.182 option remote-subvolume locks end-volume volume rsid-a-38 type protocol/client option transport-type tcp option remote-host 10.20.0.183 option remote-subvolume locks end-volume volume rsid-a-39 type protocol/client option transport-type tcp option remote-host 10.20.0.184 option remote-subvolume locks end-volume volume rsid-a-45 type protocol/client option transport-type tcp option remote-host 10.20.0.190 option remote-subvolume locks end-volume volume rsid-a-46 type protocol/client option transport-type tcp option remote-host 10.20.0.191 option remote-subvolume locks end-volume volume rsid-a-47 type protocol/client option transport-type tcp option remote-host 10.20.0.192 option remote-subvolume locks end-volume volume rsid-a-48 type protocol/client option transport-type tcp option remote-host 10.20.0.193 option remote-subvolume locks end-volume volume rsid-a-49 type protocol/client option transport-type tcp option remote-host 10.20.0.194 option remote-subvolume locks end-volume ### mirroring blade volumes, 1x <---> 2x and 3x <---> 4x volume mirror1 type cluster/replicate subvolumes rsid-a-10 rsid-a-20 end-volume volume mirror2 type cluster/replicate subvolumes rsid-a-11 rsid-a-21 end-volume volume mirror3 type cluster/replicate subvolumes rsid-a-12 rsid-a-22 end-volume volume mirror4 type cluster/replicate subvolumes rsid-a-13 rsid-a-23 end-volume volume mirror5 type cluster/replicate subvolumes rsid-a-14 rsid-a-24 end-volume volume mirror6 type cluster/replicate subvolumes rsid-a-15 rsid-a-25 end-volume volume mirror7 type cluster/replicate subvolumes rsid-a-16 rsid-a-26 end-volume volume mirror8 type cluster/replicate subvolumes rsid-a-17 rsid-a-27 end-volume volume mirror9 type cluster/replicate subvolumes rsid-a-18 rsid-a-28 end-volume volume mirror10 type cluster/replicate subvolumes rsid-a-19 rsid-a-29 end-volume volume mirror11 type cluster/replicate subvolumes rsid-a-35 rsid-a-45 end-volume volume mirror12 type cluster/replicate subvolumes rsid-a-36 rsid-a-46 end-volume volume mirror13 type cluster/replicate subvolumes rsid-a-37 rsid-a-47 end-volume volume mirror14 type cluster/replicate subvolumes rsid-a-38 rsid-a-48 end-volume volume mirror15 type cluster/replicate subvolumes rsid-a-39 rsid-a-49 end-volume ### final volume, striped mirrors: 15x2 blades. 4 mb block size to allow small files fit complete in a single storage node ### (currently only new blades are mirrored) volume stripe type cluster/stripe option block-size 4MB subvolumes mirror1 mirror2 mirror3 mirror4 mirror5 mirror6 mirror7 mirror8 mirror9 mirror10 mirror11 mirror12 mirror13 mirror14 mirror15 end-volume volume writebehind type performance/write-behind option cache-size 4MB option disable-for-first-nbytes 128KB subvolumes stripe end-volume volume iocache type performance/io-cache subvolumes writebehind option cache-size 4MB option cache-timeout 5 end-volume