Re: VM going down

Niels de Vos <ndevos@xxxxxxxxxx> · Fri, 12 May 2017 11:13:51 +0200

On Thu, May 11, 2017 at 07:40:15PM +0530, Pranith Kumar Karampuri wrote:
> On Thu, May 11, 2017 at 5:39 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote:
> 
> > On Thu, May 11, 2017 at 12:35:42PM +0530, Krutika Dhananjay wrote:
> > > Niels,
> > >
> > > Allesandro's configuration does not have shard enabled. So it has
> > > definitely not got anything to do with shard not supporting seek fop.
> >
> > Yes, but in case sharding would have been enabled, the seek FOP would be
> > handled correctly (detected as not supported at all).
> >
> > I'm still not sure how arbiter prevents doing shards though. We normally
> > advise to use sharding *and* (optional) arbiter for VM workloads,
> > arbiter without sharding has not been tested much. In addition, the seek
> > functionality is only available in recent kernels, so there has been
> > little testing on CentOS or similar enterprise Linux distributions.
> >
> 
> That is not true. Both are independent. There are quite a few questions we
> answered in the past ~1 year on gluster-users which don't use
> sharding+arbiter but plain old 2+1 configuration.

Yes, of course. But that does not take away the *advise* to use
sharding (+arbiter as option) for VM workloads. I am not aware of
regular testing that may use seek on 2+1 configurations. The oVirt team
that runs their regular tests, have sharding enabled, I think?

Seek is only usable in two occasions (for VMs):
 1. QEMU + libgfapi integration
 2. FUSE on a recent kernel (not available in CentOS/RHEL/... yet)

Given this, there are not many deployments that could run into problems
with seek. Simply because the functionlity is not (by default) available
in enterprise distributions.

It is well possible that there is a bug in the seek implementation, and
because of our default testing (with shard) on enterprise distributions
(no seek for FUSE), we may have not hit it yet.

From the information in this (sub)thread, I can not see what client
(QEMU+gfapi or FUSE) is used, or which brick returns the seek error. I
can recommend to enable sharding as a workaround, and would expect not
to see any problems with seek anymore (because sharding blocks those
requests).

Niels

> 
> >
> >
> > HTH,
> > Niels
> >
> >
> > > Copy-pasting volume-info output from the first mail:
> > >
> > > Volume Name: datastore2
> > > Type: Replicate
> > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: srvpve2g:/data/brick2/brick
> > > Brick2: srvpve3g:/data/brick2/brick
> > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > Options Reconfigured:
> > > nfs.disable: on
> > > performance.readdir-ahead: on
> > > transport.address-family: inet
> > >
> > >
> > > -Krutika
> > >
> > >
> > > On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote:
> > >
> > > > ...
> > > > > > client from
> > > > > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > > > > (version: 3.8.11)
> > > > > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> > > > [posix.c:1079:posix_seek]
> > > > > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No
> > such
> > > > > > device or address]
> > > >
> > > > The SEEK procedure translates to lseek() in the posix xlator. This can
> > > > return with "No suck device or address" (ENXIO) in only one case:
> > > >
> > > >     ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is
> > > >              beyond the end of the file.
> > > >
> > > > This means that an lseek() was executed where the current offset of the
> > > > filedescriptor was higher than the size of the file. I'm not sure how
> > > > that could happen... Sharding prevents using SEEK at all atm.
> > > >
> > > > ...
> > > > > > The strange part is that I cannot seem to find any other error.
> > > > > > If I restart the VM everything works as expected (it stopped at
> > ~9.51
> > > > > > UTC and was started at ~10.01 UTC) .
> > > > > >
> > > > > > This is not the first time that this happened, and I do not see any
> > > > > > problems with networking or the hosts.
> > > > > >
> > > > > > Gluster version is 3.8.11
> > > > > > this is the incriminated volume (though it happened on a different
> > one
> > > > too)
> > > > > >
> > > > > > Volume Name: datastore2
> > > > > > Type: Replicate
> > > > > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > > > > Status: Started
> > > > > > Snapshot Count: 0
> > > > > > Number of Bricks: 1 x (2 + 1) = 3
> > > > > > Transport-type: tcp
> > > > > > Bricks:
> > > > > > Brick1: srvpve2g:/data/brick2/brick
> > > > > > Brick2: srvpve3g:/data/brick2/brick
> > > > > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > > > > Options Reconfigured:
> > > > > > nfs.disable: on
> > > > > > performance.readdir-ahead: on
> > > > > > transport.address-family: inet
> > > > > >
> > > > > > Any hint on how to dig more deeply into the reason would be greatly
> > > > > > appreciated.
> > > >
> > > > Probably the problem is with SEEK support in the arbiter functionality.
> > > > Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> > > > succeed on bricks where the files with content are located. It does not
> > > > look like arbiter handles SEEK, so the offset in lseek() will likely be
> > > > higher than the size of the file on the brick (empty, 0 size file). I
> > > > don't know how the replication xlator responds on an error return from
> > > > SEEK on one of the bricks, but I doubt it likes it.
> > > >
> > > > We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
> > > > SEEK for sharding. I suggest you open a bug for getting SEEK in the
> > > > arbiter xlator as well.
> > > >
> > > > HTH,
> > > > Niels
> > > >
> >
> > _______________________________________________
> > Gluster-users mailing list
> > Gluster-users@xxxxxxxxxxx
> > http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> 
> 
> 
> -- 
> Pranith
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users