Re: VM going down

Krutika Dhananjay <kdhananj@xxxxxxxxxx> · Thu, 11 May 2017 12:35:42 +0530

Niels,

Allesandro's configuration does not have shard enabled. So it has definitely not got anything to do with shard not supporting seek fop.

Copy-pasting volume-info output from the first mail:

Volume Name: datastore2

Type: Replicate

Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea

Status: Started

Snapshot Count: 0

Number of Bricks: 1 x (2 + 1) = 3

Transport-type: tcp

Bricks:

Brick1: srvpve2g:/data/brick2/brick

Brick2: srvpve3g:/data/brick2/brick

Brick3: srvpve1g:/data/brick2/brick (arbiter)

Options Reconfigured:

nfs.disable: on

performance.readdir-ahead: on

transport.address-family: inet

-Krutika

On Tue, May 9, 2017 at 7:40 PM, Niels de Vos <ndevos@xxxxxxxxxx> wrote:
...

> > client from

> > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0

> > (version: 3.8.11)

> > [2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]

> > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such

> > device or address]

The SEEK procedure translates to lseek() in the posix xlator. This can

return with "No suck device or address" (ENXIO) in only one case:

    ENXIO    whence is SEEK_DATA or SEEK_HOLE, and the file offset is

             beyond the end of the file.

This means that an lseek() was executed where the current offset of the

filedescriptor was higher than the size of the file. I'm not sure how

that could happen... Sharding prevents using SEEK at all atm.

...

> > The strange part is that I cannot seem to find any other error.

> > If I restart the VM everything works as expected (it stopped at ~9.51

> > UTC and was started at ~10.01 UTC) .

> >

> > This is not the first time that this happened, and I do not see any

> > problems with networking or the hosts.

> >

> > Gluster version is 3.8.11

> > this is the incriminated volume (though it happened on a different one too)

> >

> > Volume Name: datastore2

> > Type: Replicate

> > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea

> > Status: Started

> > Snapshot Count: 0

> > Number of Bricks: 1 x (2 + 1) = 3

> > Transport-type: tcp

> > Bricks:

> > Brick1: srvpve2g:/data/brick2/brick

> > Brick2: srvpve3g:/data/brick2/brick

> > Brick3: srvpve1g:/data/brick2/brick (arbiter)

> > Options Reconfigured:

> > nfs.disable: on

> > performance.readdir-ahead: on

> > transport.address-family: inet

> >

> > Any hint on how to dig more deeply into the reason would be greatly

> > appreciated.

Probably the problem is with SEEK support in the arbiter functionality.

Just like with a READ or a WRITE on the arbiter brick, SEEK can only

succeed on bricks where the files with content are located. It does not

look like arbiter handles SEEK, so the offset in lseek() will likely be

higher than the size of the file on the brick (empty, 0 size file). I

don't know how the replication xlator responds on an error return from

SEEK on one of the bricks, but I doubt it likes it.

We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support

SEEK for sharding. I suggest you open a bug for getting SEEK in the

arbiter xlator as well.

HTH,

Niels

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users