Hi, In this same environment, when I try to create a new directory on the mount point (client side), I get this error: profile3:/mnt # mkdir gluster_new/newdir mkdir: cannot create directory `gluster_new/newdir': Software caused connection abort profile3:/mnt # mkdir gluster_new/newdir mkdir: cannot create directory `gluster_new/newdir': Transport endpoint is not connected profile3:/mnt # mount If I check the log file, I can see: [2010-04-06 07:58:26] W [fuse-bridge.c:477:fuse_entry_cbk] glusterfs-fuse: 4373613: MKDIR() /newdir returning inode 0 pending frames: frame : type(1) op(MKDIR) frame : type(1) op(MKDIR) patchset: v3.0.2-41-g029062c signal received: 11 time of crash: 2010-04-06 07:58:26 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.0.3 /lib64/libc.so.6[0x7f49e1c7f6e0] /usr/lib64/libglusterfs.so.0(inode_link+0x23)[0x7f49e23e73b3] /usr/lib64/glusterfs/3.0.3/xlator/mount/fuse.so[0x7f49e07b8a43] /usr/lib64/glusterfs/3.0.3/xlator/mount/fuse.so[0x7f49e07b8f92] /usr/lib64/libglusterfs.so.0[0x7f49e23e0cd5] /usr/lib64/libglusterfs.so.0[0x7f49e23e0cd5] /usr/lib64/glusterfs/3.0.3/xlator/cluster/stripe.so(stripe_stack_unwind_inode_cbk+0x1aa)[0x7f49e0de19ba] /usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_unwind+0x113)[0x7f49e0ffa4c3] /usr/lib64/glusterfs/3.0.3/xlator/cluster/replicate.so(afr_mkdir_wind_cbk+0xbe)[0x7f49e0ffb1de] /usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(client_mkdir_cbk+0x405)[0x7f49e1242d35] /usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(protocol_client_pollin+0xca)[0x7f49e123024a] /usr/lib64/glusterfs/3.0.3/xlator/protocol/client.so(notify+0x212)[0x7f49e12376c2] /usr/lib64/libglusterfs.so.0(xlator_notify+0x43)[0x7f49e23d93e3] /usr/lib64/glusterfs/3.0.3/transport/socket.so(socket_event_handler+0xd3)[0x7f49dfda6173] /usr/lib64/libglusterfs.so.0[0x7f49e23f3045] /usr/sbin/glusterfs(main+0xa28)[0x404268] /lib64/libc.so.6(__libc_start_main+0xe6)[0x7f49e1c6b586] /usr/sbin/glusterfs[0x402749] --------- Again, I am totally clueless... On 04/06/2010 12:07 PM, Kali Hernandez wrote: > > Hi all, > > We are running glusterfs 3.0.3, installed from RHEL rpm's, over 30 > nodes (not virtual machines). Our config pairs each 2 machines under > replicate translator as mirrors, and over that aggregates the 15 > resulting mirrors under stripe translator. Before we were using > distribute instead, but we had the same problem. > > We are copying (using cp) a lot of files which reside under the same > directory, and I have been monitoring the whole copy process to check > where the failure starts. > > In the middle of the copy process we get this error: > > cp: cannot create regular file > `/mnt/gluster_new/videos/1251512-3CA86758640A31E7770EBC7629AEC10F.mpg': No > space left on device > cp: cannot create regular file > `/mnt/gluster_new/videos/1758650-3AF69C6B7FDAC0A40D85EABA8C85490D.mswmm': > No space left on device > cp: cannot create regular file > `/mnt/gluster_new/videos/179183-A018B5FBE6DCCF04A3BB99C814CD9EAB.wmv': > No space left on device > cp: cannot create regular file > `/mnt/gluster_new/videos/2448602-568B1ACF53675DC762485F2B26539E0D.wmv': No > space left on device > cp: cannot create regular file > `/mnt/gluster_new/videos/626249-7B7FFFE0B9C56E9BE5733409CB73BCDF_300.jpg': > No space left on device > cp: cannot create regular file > `/mnt/gluster_new/videos/1962299-B7CDFF12FB1AD41DF3660BF0C7045CBC.avi': No > space left on device > > (hundreds of times) > > When I look at the storage distribution, I can see this: > > node 10 37G 14G 23G 38% /glusterfs_storage > node 11 37G 14G 23G 37% /glusterfs_storage > node 12 37G 14G 23G 37% /glusterfs_storage > node 13 37G 14G 23G 37% /glusterfs_storage > node 14 37G 13G 24G 36% /glusterfs_storage > node 15 37G 13G 24G 36% /glusterfs_storage > node 16 37G 13G 24G 35% /glusterfs_storage > node 17 49G 12G 36G 26% /glusterfs_storage > node 18 37G 12G 25G 33% /glusterfs_storage > node 19 37G 12G 25G 33% /glusterfs_storage > node 20 37G 14G 23G 38% /glusterfs_storage > node 21 37G 14G 23G 37% /glusterfs_storage > node 22 37G 14G 23G 37% /glusterfs_storage > node 23 37G 14G 23G 37% /glusterfs_storage > node 24 37G 13G 24G 36% /glusterfs_storage > node 25 37G 13G 24G 36% /glusterfs_storage > node 26 37G 13G 24G 35% /glusterfs_storage > node 27 49G 12G 36G 26% /glusterfs_storage > node 28 37G 12G 25G 33% /glusterfs_storage > node 29 37G 12G 25G 33% /glusterfs_storage > node 35 40G 40G 0 100% /glusterfs_storage > node 36 40G 22G 18G 56% /glusterfs_storage > node 37 40G 18G 22G 45% /glusterfs_storage > node 38 40G 16G 24G 40% /glusterfs_storage > node 39 40G 15G 25G 37% /glusterfs_storage > node 45 40G 40G 0 100% /glusterfs_storage > node 46 40G 22G 18G 56% /glusterfs_storage > node 47 40G 18G 22G 45% /glusterfs_storage > node 48 40G 16G 24G 40% /glusterfs_storage > node 49 40G 15G 25G 37% /glusterfs_storage > > (node mirror pairings are 10-19 paired to 20-29, and 35-39 to 45-49) > > > As you can see, distribution of space over the cluster is more or less > rational over most of the nodes, except for node pair 35/45, which run > out of space. Thus, every time I try to copy more data onto the > cluster, I run into the mentioned "no space left on device" > > From the mountpoint point of view, the gluster free space looks like > this: > > Filesystem 1M-blocks Used > Available Use% Mounted on > [...] > /etc/glusterfs/glusterfs.vol.new 586617 240197 340871 > 42% /mnt/gluster_new > > > So basically, I get out of space messages when there is around 340 Gb > free on the cluster. > > > I tried using distribute translator instead of stripe, in fact that > was our first setup, but we thought maybe we are starting to copy a > big file (usually we store really big .tar.gz backups here) and it > runs out of space in the meanwhile, so we thought about using stripe, > because theoretically glusterfs would in that case move and copy the > next block of the file into another node. But in both cases > (distribute and stripe) we run into the same problems. > > So I am wondering if this is a problem of a maximum number of files in > a same directory or filesystem or what? > > > Any ideas on this issue? > > > > Our config as follows: > > <snip>