Re: gluster warning remote operation failed during recovery from backups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Yes, same message on gluster03's brick log:
[2016-06-16 10:07:55.619621] E [MSGID: 115059] [server-rpc-fops.c:811:server_getxattr_cbk] 0-storage-server: 23783173: GETXATTR /data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_08.asc (0e98a94b-7b86-4a72-88a9-a99a787e059d) ((null)) ==> (Numerical result out of range) [Numerical result out of range]

nothing indicated in etc-glusterfs-glusterd.vol.log

Also it seems hard to believe TSM would be sending back a larger value than was sent to it from the initial backup done on gluster storage. ie: File on gluster fuse mount -> TSM backup ... TSM restore -> File restore to gluster fuse mount.

I can't actually get the xattrs from the file, because the file doesn't exist after TSM errors out. My guess is that TSM restores the file, then tries to verify the xattrs and on failure removes the file. BUT I suppose if there was some corruption on TSM side, it might be trying to send garbage too large to store in the xattr (if I'm understanding the issue).

If I restore the single file above after the failure I don't get any errors, which is why I started to suspect gluster as the culprit.

I'm capturing the strace output now, hopefully something useful is shown.

Thanks

On Thu, Jun 16, 2016 at 7:31 PM, Vijay Bellur <vbellur@xxxxxxxxxx> wrote:
On Thu, Jun 16, 2016 at 3:05 PM, Steve Dainard <sdainard@xxxxxxxx> wrote:
> I'm restoring some data to gluster from TSM backups and the client errors
> out trying to retrieve xattrs at some point during the restore, killing
> progress:
> ...
> Restoring       8,118,878
> /storage/data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_04.asc
> [Done]
> ANS1587W Unable to read extended attributes for object
> /storage/data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_08.asc
> due to errno: 34, reason: Numerical result out of range
>  ** Unsuccessful **
> ...
>
> In the gluster fuse logs for the volume I see this:
> [2016-06-16 10:07:55.622020] W [MSGID: 114031]
> [client-rpc-fops.c:1161:client3_3_getxattr_cbk] 0-storage-client-2: remote
> operation failed. Path:
> /data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_08.asc
> (0e98a94b-7b86-4a72-88a9-a99a787e059d). Key: (null) [Numerical result out of
> range]
> [2016-06-16 10:07:55.622110] W [fuse-bridge.c:3353:fuse_xattr_cbk]
> 0-glusterfs-fuse: 76197165: GETXATTR((null))
> /data/climate/ANUSPLIN/ANUSPLIN300/monthly/pcp_grids/1918/pcp300_08.asc =>
> -1 (Numerical result out of range)
>
> I'm trying to understand if gluster is bubbling up errors to the TSM client
> (gluster fault), or reporting errors the TSM client is generating (TSM
> fault).
>

Do you happen to see the same error reported by posix translator(s) in
any of the brick(s)? Doing that might help in figuring out where the
problem could be stemming from.

As per man (2) getxattr, ERANGE is seen when the size of the value
buffer is too small to hold the result. Would it be possible to strace
the TSM client and see the size of the value buffer being passed?
Also, doing an extended attribute dump of the file on the brick
directory (either through attr or getfattr) can help in determining
the size necessary to hold all attributes.

HTH,
Vijay

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux