On 02/07/2011 11:49 PM, Raghavendra G wrote: > Hi Steve, > > Are the back-end file systems working correctly? I am seeing lots of errors in server log files while accessing back-end filesystem. > > gluster-01-brick.log.1:[2011-01-26 03:43:07.353445] E [posix.c:2193:posix_open] post-posix: open on /gluster/01/bri > ck/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.cmd: Read-only file system > gluster-01-brick.log.1:[2011-01-26 03:43:07.353857] E [posix.c:678:posix_setattr] post-posix: setattr (utimes) on / > gluster/01/brick/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.cmd failed: Re > ad-only file system > gluster-01-brick.log.1:[2011-01-26 03:43:07.354827] E [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f > 28e50dc1c8: Input/output error > gluster-01-brick.log.1:[2011-01-26 03:43:07.357396] E [posix.c:2193:posix_open] post-posix: open on /gluster/01/bri > ck/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.ps: Read-only file system > gluster-01-brick.log.1:[2011-01-26 03:43:07.357794] E [posix.c:678:posix_setattr] post-posix: setattr (utimes) on / > gluster/01/brick/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.ps failed: Rea > d-only file system > gluster-01-brick.log.1:[2011-01-26 03:43:07.358865] E [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f > 28e50dc1c8: Input/output error > gluster-01-brick.log.1:[2011-01-26 03:43:07.359264] E [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f > 28e50dc1c8: Input/output error > gluster-01-brick.log.1:[2011-01-26 03:43:07.359548] E [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f > 28e50dc1c8: Input/output error > gluster-01-brick.log.1:[2011-01-26 03:43:07.367163] E [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f > > I am seeing other errors, which indicate that the backend is read-only filesystem. Due to this distribute and replicate are not able to store the metadata (using xattrs), which in turn is resulting in lots of split-brains and layout NULL errors. Can you please check the backend file system? > > regards, Yes, the filesystem was read-only for a time when a disk failed. We then rebuilt the brick on that disk from the corresponding brick in the second server (with the volume stopped, of course) using: rsync -aXv brick/ stanley:/gluster/06/brick/ Following some instructions we found on the mailing list we then: 1) deleted the volume 2) ran "find /gluster -exec setfattr -x trusted.gfid \{\} \;" on the bricks 3) created the volume again 4) mounted the volume 5) ran "find . -print0 | xargs --null stat > /dev/null" on the mounted volume This returned us to what seemed to be a stable state (i.e., no errors from running "ls -alR" from the top of the volume). Then after putting the volume back into service, these errors started occurring again. I have noticed that turning off "performance.stat-prefetch" has brought about a great improvement. We continue to see some errors like this on one of the servers: [2011-02-08 14:22:08.360799] I [dht-common.c:369:dht_revalidate_cbk] post-dht: subvolume post-replicate-1 returned -1 (Invalid argument) [2011-02-08 14:22:08.836672] I [dht-common.c:369:dht_revalidate_cbk] post-dht: subvolume post-replicate-4 returned -1 (Invalid argument) [2011-02-08 14:22:39.468388] I [dht-common.c:369:dht_revalidate_cbk] post-dht: subvolume post-replicate-0 returned -1 (Invalid argument) [2011-02-08 14:22:39.468436] W [fuse-bridge.c:184:fuse_entry_cbk] glusterfs-fuse: 22465136: LOOKUP() /home/lev/.Xauthority => -1 (Invalid argument) [2011-02-08 14:22:40.462910] I [dht-common.c:369:dht_revalidate_cbk] post-dht: subvolume post-replicate-5 returned -1 (Invalid argument) [2011-02-08 14:22:40.462958] W [fuse-bridge.c:184:fuse_entry_cbk] glusterfs-fuse: 22466110: LOOKUP() /home/lev/.viminfo => -1 (Invalid argument) And the user sees: root at stanley:/net/post/lev# ls -al .viminfo .Xauthority ls: cannot access .viminfo: Invalid argument ls: cannot access .Xauthority: Invalid argument But only from one client (which also happens to be the server giving the errors above). Another client (the other server) shows these same files without problem: root at pablo:/net/post/lev# ls -al .viminfo .Xauthority -rw------- 1 lev post 9400 2011-02-07 22:52 .viminfo -rw------- 1 lev post 7401 2011-02-08 00:27 .Xauthority Steve > ----- Original Message ----- >> From: "Steve Wilson"<stevew at purdue.edu> >> To: "Lakshmipathi"<lakshmipathi at gluster.com> >> Cc: "Raghavendra G"<raghavendra at gluster.com> >> Sent: Thursday, February 3, 2011 7:21:36 PM >> Subject: Re: 3.1.2 with "No such file" and "Invalid argument" errors >> Hi, >> >> Thanks for looking into this. Any ideas so far? Or anything you'd like >> me to try? >> >> Here's some other perhaps relevant information: >> * all bricks are formatted ext4 and mounted with the noatime option >> in addition to default options >> * servers and clients are running Ubuntu 10.04 >> * I did try mounting the GlusterFS volume with direct-io-mode >> disabled but that didn't fix the problem >> >> Thanks! >> >> Steve >> >> On 02/01/2011 07:35 AM, Lakshmipathi wrote: >>> Hi, >>> Could you please sent us client and server log files? >>> >>> >> -- >> Steven M. Wilson, Systems and Network Manager >> Markey Center for Structural Biology >> Purdue University >> (765) 496-1946 -- Steven M. Wilson, Systems and Network Manager Markey Center for Structural Biology Purdue University (765) 496-1946