Re: VM going down

Niels de Vos <ndevos@xxxxxxxxxx> · Wed, 10 May 2017 15:41:34 +0200

On Wed, May 10, 2017 at 04:08:22PM +0530, Pranith Kumar Karampuri wrote:
> On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote:
> 
> > ...
> > > > client from
> > > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > > (version: 3.8.11)
> > > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> > [posix.c:1079:posix_seek]
> > > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
> > > > device or address]
> >
> > The SEEK procedure translates to lseek() in the posix xlator. This can
> > return with "No suck device or address" (ENXIO) in only one case:
> >
> >     ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
> >              beyond the end of the file.
> >
> > This means that an lseek() was executed where the current offset of the
> > filedescriptor was higher than the size of the file. I'm not sure how
> > that could happen... Sharding prevents using SEEK at all atm.
> >
> > ...
> > > > The strange part is that I cannot seem to find any other error.
> > > > If I restart the VM everything works as expected (it stopped at ~9.51
> > > > UTC and was started at ~10.01 UTC) .
> > > >
> > > > This is not the first time that this happened, and I do not see any
> > > > problems with networking or the hosts.
> > > >
> > > > Gluster version is 3.8.11
> > > > this is the incriminated volume (though it happened on a different one
> > too)
> > > >
> > > > Volume Name: datastore2
> > > > Type: Replicate
> > > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > > Status: Started
> > > > Snapshot Count: 0
> > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > Transport-type: tcp
> > > > Bricks:
> > > > Brick1: srvpve2g:/data/brick2/brick
> > > > Brick2: srvpve3g:/data/brick2/brick
> > > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > > Options Reconfigured:
> > > > nfs.disable: on
> > > > performance.readdir-ahead: on
> > > > transport.address-family: inet
> > > >
> > > > Any hint on how to dig more deeply into the reason would be greatly
> > > > appreciated.
> >
> > Probably the problem is with SEEK support in the arbiter functionality.
> > Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> > succeed on bricks where the files with content are located. It does not
> > look like arbiter handles SEEK, so the offset in lseek() will likely be
> > higher than the size of the file on the brick (empty, 0 size file). I
> > don't know how the replication xlator responds on an error return from
> > SEEK on one of the bricks, but I doubt it likes it.
> >
> 
> inode-read fops don't get sent to arbiter brick. So this won't happen.

Yes, I see that the arbiter xlator returns on reads without going to the
bricks. Should that not be done for seek as well? It's the first time I
actually looked at the code of the arbiter xlator, so I might well be
misunderstanding how it works :)

Thanks,
Niels

> 
> 
> >
> > We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
> > SEEK for sharding. I suggest you open a bug for getting SEEK in the
> > arbiter xlator as well.
> >
> > HTH,
> > Niels
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> 
> 
> 
> -- 
> Pranith
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users