Re: [ovirt-users] ovirt 4.1 hosted engine hyper converged on glusterfs 3.8.10 : "engine" storage domain alway complain about "unsynced" elements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On Tue, Jul 25, 2017 at 11:12 AM, Kasturi Narra <knarra@xxxxxxxxxx> wrote:
These errors are because not having glusternw assigned to the correct interface. Once you attach that these errors should go away.  This has nothing to do with the problem you are seeing.
 
sahina any idea about engine not showing the correct volume info ?

Please provide the vdsm.log (contianing the gluster volume info) and engine.log


On Mon, Jul 24, 2017 at 7:30 PM, yayo (j) <jaganz@xxxxxxxxx> wrote:
Hi,

UI refreshed but problem still remain ... 

No specific error, I've only these errors but I've read that there is no problem if I have this kind of errors:


2017-07-24 15:53:59,823+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2) [b7590c4] START, GlusterServersListVDSCommand(HostName = node01.localdomain.local, VdsIdVDSCommandParametersBase:{runAsync='true', hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 29a62417
2017-07-24 15:54:01,066+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterServersListVDSCommand] (DefaultQuartzScheduler2) [b7590c4] FINISH, GlusterServersListVDSCommand, return: [10.10.20.80/24:CONNECTED, node02.localdomain.local:CONNECTED, gdnode04:CONNECTED], log id: 29a62417
2017-07-24 15:54:01,076+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2) [b7590c4] START, GlusterVolumesListVDSCommand(HostName = node01.localdomain.local, GlusterVolumesListVDSParameters:{runAsync='true', hostId='4c89baa5-e8f7-4132-a4b3-af332247570c'}), log id: 7fce25d3
2017-07-24 15:54:02,209+02 WARN  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode01:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,212+02 WARN  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode02:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,215+02 WARN  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode04:/gluster/engine/brick' of volume 'd19c19e3-910d-437b-8ba7-4f2a23d17515' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,218+02 WARN  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode01:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,221+02 WARN  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode02:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,224+02 WARN  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListReturn] (DefaultQuartzScheduler2) [b7590c4] Could not associate brick 'gdnode04:/gluster/data/brick' of volume 'c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d' with correct network as no gluster network found in cluster '00000002-0002-0002-0002-00000000017a'
2017-07-24 15:54:02,224+02 INFO  [org.ovirt.engine.core.vdsbroker.gluster.GlusterVolumesListVDSCommand] (DefaultQuartzScheduler2) [b7590c4] FINISH, GlusterVolumesListVDSCommand, return: {d19c19e3-910d-437b-8ba7-4f2a23d17515=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@fdc91062, c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d=org.ovirt.engine.core.common.businessentities.gluster.GlusterVolumeEntity@999a6f23}, log id: 7fce25d3


Thank you


2017-07-24 8:12 GMT+02:00 Kasturi Narra <knarra@xxxxxxxxxx>:
Hi,

   Regarding the UI showing incorrect information about engine and data volumes, can you please refresh the UI and see if the issue persists  plus any errors in the engine.log files ?

Thanks
kasturi

On Sat, Jul 22, 2017 at 11:43 AM, Ravishankar N <ravishankar@xxxxxxxxxx> wrote:

On 07/21/2017 11:41 PM, yayo (j) wrote:
Hi,

Sorry for follow up again, but, checking the ovirt interface I've found that ovirt report the "engine" volume as an "arbiter" configuration and the "data" volume as full replicated volume. Check these screenshots:

This is probably some refresh bug in the UI, Sahina might be able to tell you.



But the "gluster volume info" command report that all 2 volume are full replicated:


Volume Name: data
Type: Replicate
Volume ID: c7a5dfc9-3e72-4ea1-843e-c8275d4a7c2d
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gdnode01:/gluster/data/brick
Brick2: gdnode02:/gluster/data/brick
Brick3: gdnode04:/gluster/data/brick
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
storage.owner-uid: 36
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: enable
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-gid: 36
features.shard-block-size: 512MB
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: on
auth.allow: *
server.allow-insecure: on




Volume Name: engine
Type: Replicate
Volume ID: d19c19e3-910d-437b-8ba7-4f2a23d17515
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: gdnode01:/gluster/engine/brick
Brick2: gdnode02:/gluster/engine/brick
Brick3: gdnode04:/gluster/engine/brick
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet
storage.owner-uid: 36
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
performance.low-prio-threads: 32
network.remote-dio: off
cluster.eager-lock: enable
cluster.quorum-type: auto
cluster.server-quorum-type: server
cluster.data-self-heal-algorithm: full
cluster.locking-scheme: granular
cluster.shd-max-threads: 8
cluster.shd-wait-qlength: 10000
features.shard: on
user.cifs: off
storage.owner-gid: 36
features.shard-block-size: 512MB
network.ping-timeout: 30
performance.strict-o-direct: on
cluster.granular-entry-heal: on
auth.allow: *
          server.allow-insecure: on


2017-07-21 19:13 GMT+02:00 yayo (j) <jaganz@xxxxxxxxx>:
2017-07-20 14:48 GMT+02:00 Ravishankar N <ravishankar@xxxxxxxxxx>:


But it does  say something. All these gfids of completed heals in the log below are the for the ones that you have given the getfattr output of. So what is likely happening is there is an intermittent connection problem between your mount and the brick process, leading to pending heals again after the heal gets completed, which is why the numbers are varying each time. You would need to check why that is the case.
Hope this helps,
Ravi



[2017-07-20 09:58:46.573079] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed data selfheal on e6dfd556-340b-4b76-b47b-7b6f5bd74327. sources=[0] 1  sinks=2
[2017-07-20 09:59:22.995003] I [MSGID: 108026] [afr-self-heal-metadata.c:51:__afr_selfheal_metadata_do] 0-engine-replicate-0: performing metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81
[2017-07-20 09:59:22.999372] I [MSGID: 108026] [afr-self-heal-common.c:1254:afr_log_selfheal] 0-engine-replicate-0: Completed metadata selfheal on f05b9742-2771-484a-85fc-5b6974bcef81. sources=[0] 1  sinks=2


Hi,

following your suggestion, I've checked the "peer" status and I found that there is too many name for the hosts, I don't know if this can be the problem or part of it:

gluster peer status on NODE01:
Number of Peers: 2

Hostname: dnode02.localdomain.local
Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd
State: Peer in Cluster (Connected)
Other names:
192.168.10.52
dnode02.localdomain.local
10.10.20.90
10.10.10.20




gluster peer status on NODE02:
Number of Peers: 2

Hostname: dnode01.localdomain.local
Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12
State: Peer in Cluster (Connected)
Other names:
gdnode01
10.10.10.10

Hostname: gdnode04
Uuid: ce6e0f6b-12cf-4e40-8f01-d1609dfc5828
State: Peer in Cluster (Connected)
Other names:
192.168.10.54
10.10.10.40


gluster peer status on NODE04:
Number of Peers: 2

Hostname: dnode02.neridom.dom
Uuid: 7c0ebfa3-5676-4d3f-9bfa-7fff6afea0dd
State: Peer in Cluster (Connected)
Other names:
10.10.20.90
gdnode02
192.168.10.52
10.10.10.20

Hostname: dnode01.localdomain.local
Uuid: a568bd60-b3e4-4432-a9bc-996c52eaaa12
State: Peer in Cluster (Connected)
Other names:
gdnode01
10.10.10.10


All these ip are pingable and hosts resolvible across all 3 nodes but, only the 10.10.10.0 network is the decidated network for gluster  (rosolved using gdnode* host names) ... You think that remove other entries can fix the problem? So, sorry, but, how can I remove other entries? 
I don't think having extra entries could be a problem. Did you check the fuse mount logs for disconnect messages that I referred to in the other email?

And, what about the selinux?
Not sure about this. See if there are disconnect messages in the mount logs first.
-Ravi

Thank you





--
Linux User: 369739 http://counter.li.org


_______________________________________________
Users mailing list
Users@xxxxxxxxx
http://lists.ovirt.org/mailman/listinfo/users





--
Linux User: 369739 http://counter.li.org


_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux