No space left on device (when there is actually lots of free space)

kali at thenetcircle.com (Kali Hernandez) · Tue, 06 Apr 2010 13:56:50 +0800

Hi,

In this same environment, when I try to create a new directory on the 
mount point (client side), I get this error:

profile3:/mnt # mkdir gluster_new/newdir
mkdir: cannot create directory `gluster_new/newdir': Software caused 
connection abort
profile3:/mnt # mkdir gluster_new/newdir
mkdir: cannot create directory `gluster_new/newdir': Transport endpoint 
is not connected
profile3:/mnt # mount

If I check the log file, I can see:

[2010-04-06 07:58:26] W [fuse-bridge.c:477:fuse_entry_cbk] 
glusterfs-fuse: 4373613: MKDIR() /newdir returning inode 0
pending frames:
frame : type(1) op(MKDIR)
frame : type(1) op(MKDIR)

patchset: v3.0.2-41-g029062c
signal received: 11
time of crash: 2010-04-06 07:58:26
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.0.3
/lib64/libc.so.6[0x7f49e1c7f6e0]
/usr/lib64/libglusterfs.so.0(inode_link+0x23)[0x7f49e23e73b3]
/usr/lib64/glusterfs/3.0.3/xlator/mount/fuse.so[0x7f49e07b8a43]
/usr/lib64/glusterfs/3.0.3/xlator/mount/fuse.so[0x7f49e07b8f92]
/usr/lib64/libglusterfs.so.0[0x7f49e23e0cd5]
/usr/lib64/libglusterfs.so.0[0x7f49e23e0cd5]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/stripe.so(stripe_stack_unwind_inode_cbk+0x1aa)[0x7f49e0de19ba]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_unwind+0x113)[0x7f49e0ffa4c3]
/usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0xbe)[0x7f49e0ffb1de]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(client_mkdir_cbk+0x405)[0x7f49e1242d35]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7f49e123024a]
/usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(notify+0x212)[0x7f49e12376c2]
/usr/lib64/libglusterfs.so.0(xlator_notify+0x43)[0x7f49e23d93e3]
/usr/lib64/glusterfs/3.0.3/transport/socket.so(socket_event_handler+0xd3)[0x7f49dfda6173]
/usr/lib64/libglusterfs.so.0[0x7f49e23f3045]
/usr/sbin/glusterfs(main+0xa28)[0x404268]
/lib64/libc.so.6(__libc_start_main+0xe6)[0x7f49e1c6b586]
/usr/sbin/glusterfs[0x402749]
---------

Again, I am totally clueless...

On 04/06/2010 12:07 PM, Kali Hernandez wrote:
>
> Hi all,
>
> We are running glusterfs 3.0.3, installed from RHEL rpm's, over 30 
> nodes (not virtual machines). Our config pairs each 2 machines under 
> replicate translator as mirrors, and over that aggregates the 15 
> resulting mirrors under stripe translator. Before we were using 
> distribute instead, but we had the same problem.
>
> We are copying (using cp) a lot of files which reside under the same 
> directory, and I have been monitoring the whole copy process to check 
> where the failure starts.
>
> In the middle of the copy process we get this error:
>
> cp: cannot create regular file 
> `/mnt/gluster_new/videos/1251512-3CA86758640A31E7770EBC7629AEC10F.mpg': No 
> space left on device
> cp: cannot create regular file 
> `/mnt/gluster_new/videos/1758650-3AF69C6B7FDAC0A40D85EABA8C85490D.mswmm': 
> No space left on device
> cp: cannot create regular file 
> `/mnt/gluster_new/videos/179183-A018B5FBE6DCCF04A3BB99C814CD9EAB.wmv': 
> No space left on device
> cp: cannot create regular file 
> `/mnt/gluster_new/videos/2448602-568B1ACF53675DC762485F2B26539E0D.wmv': No 
> space left on device
> cp: cannot create regular file 
> `/mnt/gluster_new/videos/626249-7B7FFFE0B9C56E9BE5733409CB73BCDF_300.jpg': 
> No space left on device
> cp: cannot create regular file 
> `/mnt/gluster_new/videos/1962299-B7CDFF12FB1AD41DF3660BF0C7045CBC.avi': No 
> space left on device
>
> (hundreds of times)
>
> When I look at the storage distribution, I can see this:
>
> node 10    37G   14G   23G  38% /glusterfs_storage
> node 11    37G   14G   23G  37% /glusterfs_storage
> node 12    37G   14G   23G  37% /glusterfs_storage
> node 13    37G   14G   23G  37% /glusterfs_storage
> node 14    37G   13G   24G  36% /glusterfs_storage
> node 15    37G   13G   24G  36% /glusterfs_storage
> node 16    37G   13G   24G  35% /glusterfs_storage
> node 17    49G   12G   36G  26% /glusterfs_storage
> node 18    37G   12G   25G  33% /glusterfs_storage
> node 19    37G   12G   25G  33% /glusterfs_storage
> node 20    37G   14G   23G  38% /glusterfs_storage
> node 21    37G   14G   23G  37% /glusterfs_storage
> node 22    37G   14G   23G  37% /glusterfs_storage
> node 23    37G   14G   23G  37% /glusterfs_storage
> node 24    37G   13G   24G  36% /glusterfs_storage
> node 25    37G   13G   24G  36% /glusterfs_storage
> node 26    37G   13G   24G  35% /glusterfs_storage
> node 27    49G   12G   36G  26% /glusterfs_storage
> node 28    37G   12G   25G  33% /glusterfs_storage
> node 29    37G   12G   25G  33% /glusterfs_storage
> node 35    40G   40G     0 100% /glusterfs_storage
> node 36    40G   22G   18G  56% /glusterfs_storage
> node 37    40G   18G   22G  45% /glusterfs_storage
> node 38    40G   16G   24G  40% /glusterfs_storage
> node 39    40G   15G   25G  37% /glusterfs_storage
> node 45    40G   40G     0 100% /glusterfs_storage
> node 46    40G   22G   18G  56% /glusterfs_storage
> node 47    40G   18G   22G  45% /glusterfs_storage
> node 48    40G   16G   24G  40% /glusterfs_storage
> node 49    40G   15G   25G  37% /glusterfs_storage
>
> (node mirror pairings are 10-19 paired to 20-29, and 35-39 to 45-49)
>
>
> As you can see, distribution of space over the cluster is more or less 
> rational over most of the nodes, except for node pair 35/45, which run 
> out of space. Thus, every time I try to copy more data onto the 
> cluster, I run into the mentioned "no space left on device"
>
> From the mountpoint point of view, the gluster free space looks like 
> this:
>
> Filesystem                                       1M-blocks      Used 
> Available Use% Mounted on
> [...]
> /etc/glusterfs/glusterfs.vol.new        586617    240197    340871  
> 42% /mnt/gluster_new
>
>
> So basically, I get out of space messages when there is around 340 Gb 
> free on the cluster.
>
>
> I tried using distribute translator instead of stripe, in fact that 
> was our first setup, but we thought maybe we are starting to copy a 
> big file (usually we store really big .tar.gz backups here) and it 
> runs out of space in the meanwhile, so we thought about using stripe, 
> because theoretically glusterfs would in that case move and copy the 
> next block of the file into another node. But in both cases 
> (distribute and stripe) we run into the same problems.
>
> So I am wondering if this is a problem of a maximum number of files in 
> a same directory or filesystem or what?
>
>
> Any ideas on this issue?
>
>
>
> Our config as follows:
>
> <snip>