3.1.2 with "No such file" and "Invalid argument" errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 02/08/2011 02:32 PM, Steve Wilson wrote:
> On 02/07/2011 11:49 PM, Raghavendra G wrote:
>> Hi Steve,
>>
>> Are the back-end file systems working correctly? I am seeing lots of 
>> errors in server log files while accessing back-end filesystem.
>>
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.353445] E 
>> [posix.c:2193:posix_open] post-posix: open on /gluster/01/bri
>> ck/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.cmd: 
>> Read-only file system
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.353857] E 
>> [posix.c:678:posix_setattr] post-posix: setattr (utimes) on /
>> gluster/01/brick/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.cmd 
>> failed: Re
>> ad-only file system
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.354827] E 
>> [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
>> 28e50dc1c8: Input/output error
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.357396] E 
>> [posix.c:2193:posix_open] post-posix: open on /gluster/01/bri
>> ck/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.ps: 
>> Read-only file system
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.357794] E 
>> [posix.c:678:posix_setattr] post-posix: setattr (utimes) on /
>> gluster/01/brick/home/lev/deltah/aadimers/serd/converge/0..75000/serd_phi-psi_hist.4deg.0..75000_map.ps 
>> failed: Rea
>> d-only file system
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.358865] E 
>> [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
>> 28e50dc1c8: Input/output error
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.359264] E 
>> [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
>> 28e50dc1c8: Input/output error
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.359548] E 
>> [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
>> 28e50dc1c8: Input/output error
>> gluster-01-brick.log.1:[2011-01-26 03:43:07.367163] E 
>> [posix.c:2318:posix_readv] post-posix: read failed on fd=0x7f
>>
>> I am seeing other errors, which indicate that the backend is 
>> read-only filesystem. Due to this distribute and replicate are not 
>> able to store the metadata (using xattrs), which in turn is resulting 
>> in lots of split-brains and layout NULL errors. Can you please check 
>> the backend file system?
>>
>> regards,
>
>
> Yes, the filesystem was read-only for a time when a disk failed.  We 
> then rebuilt the brick on that disk from the corresponding brick in 
> the second server (with the volume stopped, of course) using:
>     rsync -aXv brick/ stanley:/gluster/06/brick/
>
> Following some instructions we found on the mailing list we then:
>     1)  deleted the volume
>     2)  ran "find /gluster -exec setfattr -x trusted.gfid \{\} \;" on 
> the bricks
>     3)  created the volume again
>     4)  mounted the volume
>     5)  ran "find . -print0 | xargs --null stat > /dev/null" on the 
> mounted volume
>
> This returned us to what seemed to be a stable state (i.e., no errors 
> from running "ls -alR" from the top of the volume).  Then after 
> putting the volume back into service, these errors started occurring 
> again.  I have noticed that turning off "performance.stat-prefetch" 
> has brought about a great improvement.  We continue to see some errors 
> like this on one of the servers:
>
>    [2011-02-08 14:22:08.360799] I [dht-common.c:369:dht_revalidate_cbk]
>    post-dht: subvolume post-replicate-1 returned -1 (Invalid argument)
>    [2011-02-08 14:22:08.836672] I [dht-common.c:369:dht_revalidate_cbk]
>    post-dht: subvolume post-replicate-4 returned -1 (Invalid argument)
>    [2011-02-08 14:22:39.468388] I [dht-common.c:369:dht_revalidate_cbk]
>    post-dht: subvolume post-replicate-0 returned -1 (Invalid argument)
>    [2011-02-08 14:22:39.468436] W [fuse-bridge.c:184:fuse_entry_cbk]
>    glusterfs-fuse: 22465136: LOOKUP() /home/lev/.Xauthority => -1
>    (Invalid argument)
>    [2011-02-08 14:22:40.462910] I [dht-common.c:369:dht_revalidate_cbk]
>    post-dht: subvolume post-replicate-5 returned -1 (Invalid argument)
>    [2011-02-08 14:22:40.462958] W [fuse-bridge.c:184:fuse_entry_cbk]
>    glusterfs-fuse: 22466110: LOOKUP() /home/lev/.viminfo => -1 (Invalid
>    argument)
>
> And the user sees:
>
>    root at stanley:/net/post/lev# ls -al .viminfo .Xauthority
>    ls: cannot access .viminfo: Invalid argument
>    ls: cannot access .Xauthority: Invalid argument
>
> But only from one client (which also happens to be the server giving 
> the errors above).  Another client (the other server) shows these same 
> files without problem:
>
>    root at pablo:/net/post/lev# ls -al .viminfo .Xauthority
>    -rw------- 1 lev post 9400 2011-02-07 22:52 .viminfo
>    -rw------- 1 lev post 7401 2011-02-08 00:27 .Xauthority
>
>
> Steve
>

Just a quick update on this...  We decided to reformat the bricks on 
both replicated servers and rebuild the volume.  That was two days ago 
and so far we've not seen these problems again.  It seems like a pretty 
drastic measure, though, to have to recover in this manner.  I'm 
continuing to run with both performance.stat-prefetch and 
performance.write-behind turned off.

Steve

>> ----- Original Message -----
>>> From: "Steve Wilson"<stevew at purdue.edu>
>>> To: "Lakshmipathi"<lakshmipathi at gluster.com>
>>> Cc: "Raghavendra G"<raghavendra at gluster.com>
>>> Sent: Thursday, February 3, 2011 7:21:36 PM
>>> Subject: Re: 3.1.2 with "No such file" and "Invalid 
>>> argument" errors
>>> Hi,
>>>
>>> Thanks for looking into this. Any ideas so far? Or anything you'd like
>>> me to try?
>>>
>>> Here's some other perhaps relevant information:
>>> * all bricks are formatted ext4 and mounted with the noatime option
>>> in addition to default options
>>> * servers and clients are running Ubuntu 10.04
>>> * I did try mounting the GlusterFS volume with direct-io-mode
>>> disabled but that didn't fix the problem
>>>
>>> Thanks!
>>>
>>> Steve
>>>
>>> On 02/01/2011 07:35 AM, Lakshmipathi wrote:
>>>> Hi,
>>>> Could you please sent us client and server log files?
>>>>
>>>>
>>> -- 
>>> Steven M. Wilson, Systems and Network Manager
>>> Markey Center for Structural Biology
>>> Purdue University
>>> (765) 496-1946
>
>
> _______________________________________________
> Gluster-users mailing list
> Gluster-users at gluster.org
> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users

-- 
Steven M. Wilson, Systems and Network Manager
Markey Center for Structural Biology
Purdue University
(765) 496-1946



[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux