Re: Fwd: vm paused unknown storage error one node out of 3 only

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



1. Could you share the output of `gluster volume heal <VOL> info`?
2. `gluster volume info`
3. fuse mount logs of the affected volume(s)?
4. glustershd logs
5. Brick logs

-Krutika


On Sat, Aug 13, 2016 at 3:10 AM, David Gossage <dgossage@xxxxxxxxxxxxxxxxxx> wrote:
On Fri, Aug 12, 2016 at 4:25 PM, Dan Lavu <dan@xxxxxxxxxx> wrote:
David,

I'm seeing similar behavior in my lab, but it has been caused by healing files in the gluster cluster, though I attribute my problems to problems with the storage fabric. See if 'gluster volume heal $VOL info' indicates files that are being healed, and if those reduce in number, can the VM start?


I haven't had any files in a state of being healed according to either of the 3 storage nodes.  

I shut down one VM that has been around awhile a moment ago then told it to start on the one ovirt server that complained previously.  It ran fine, and I was able to migrate it off and on the host no issues.

I told one of the new VM's to migrate to the one node and within seconds it paused from unknown storage errors no shards showing heals nothing with an error on storage node.  Same stale file handle issues.

I'll probably put this node in maintenance later and reboot it.  Other than that I may re-clone those 2 reccent VM's.  maybe images just got corrupted though why it would only fail on one node of 3 if image was bad not sure.


Dan

On Thu, Aug 11, 2016 at 7:52 AM, David Gossage <dgossage@xxxxxxxxxxxxxxxxxx> wrote:
Figure I would repost here as well.  one client out of 3 complaining of stale file handles on a few new VM's I migrated over. No errors on storage nodes just client.  Maybe just put that one in maintenance and restart gluster mount?

David Gossage
Carousel Checks Inc. | System Administrator
Office 708.613.2284

---------- Forwarded message ----------
From: David Gossage <dgossage@xxxxxxxxxxxxxxxxxx>
Date: Thu, Aug 11, 2016 at 12:17 AM
Subject: vm paused unknown storage error one node out of 3 only
To: users <users@xxxxxxxxx>


Out of a 3 node cluster running oVirt 3.6.6.2-1.el7.centos with a 3 replicate gluster 3.7.14 starting a VM i just copied in on one node of the 3 gets the following errors.  The other 2 the vm starts fine.  All ovirt and gluster are centos 7 based. VM on start of the one node it tries to default to on its own accord immediately puts into paused for unknown reason.  Telling it to start on different node starts ok.  node with issue already has 5 VMs running fine on it same gluster storage plus the hosted engine on different volume.

gluster nodes logs did not have any errors for volume
nodes own gluster logs had this in log

dfb8777a-7e8c-40ff-8faa-252beabba5f8 couldnt find in .glusterfs .shard or images/

7919f4a0-125c-4b11-b5c9-fb50cc195c43 is the gfid of the bootable drive of the vm

[2016-08-11 04:31:39.982952] W [MSGID: 114031] [client-rpc-fops.c:3050:client3_3_readv_cbk] 0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.983683] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984182] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.984221] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.985941] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:31:39.986633] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.987644] E [MSGID: 109040] [dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:31:39.987751] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 15152930: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43 fd=0x7f00a80bdb64 (Stale file handle)
[2016-08-11 04:31:39.986567] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:31:39.986567] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210145] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.210873] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210888] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.210947] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.213270] E [MSGID: 109040] [dht-helper.c:1190:dht_migration_complete_check_task] 0-GLUSTER1-dht: (null): failed to lookup the file on GLUSTER1-dht [Stale file handle]
[2016-08-11 04:35:21.213345] W [fuse-bridge.c:2227:fuse_readv_cbk] 0-glusterfs-fuse: 15156910: READ => -1 gfid=7919f4a0-125c-4b11-b5c9-fb50cc195c43 fd=0x7f00a80bf6d0 (Stale file handle)
[2016-08-11 04:35:21.211516] W [MSGID: 108008] [afr-read-txn.c:244:afr_read_txn] 0-GLUSTER1-replicate-0: Unreadable subvolume -1 found with event generation 3 for gfid dfb8777a-7e8c-40ff-8faa-252beabba5f8. (Possible split-brain)
[2016-08-11 04:35:21.212013] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-0: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212081] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-1: remote operation failed [No such file or directory]
[2016-08-11 04:35:21.212121] W [MSGID: 114031] [client-rpc-fops.c:1572:client3_3_fstat_cbk] 0-GLUSTER1-client-2: remote operation failed [No such file or directory]

I attached vdsm.log starting from when I spun up vm on offending node



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users



_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux