Hello everyone,
Summary:
After an update of GlusterFS from 6.10 to 9.5, a posix-acl.c:262:posix_acl_log_permit_denied error occurs on some shards.
We think the issue looks similar to: https://github.com/gluster/glusterfs/issues/876
This is our setup:
oVirt Version: 4.3.10.4-1.el7
GlusterFS Volumes:
Volume Name: hdd
Type: Distributed-Replicate
Number of Bricks: 4 x 3 = 12
Options Reconfigured:
features.acl: enable
storage.owner-gid: 36
storage.owner-uid: 36
server.event-threads: 4
client.event-threads: 4
cluster.choose-local: off
user.cifs: off
features.shard: on
cluster.shd-wait-qlength: 10000
cluster.shd-max-threads: 8
cluster.locking-scheme: granular
cluster.data-self-heal-algorithm: full
cluster.server-quorum-type: server
cluster.eager-lock: enable
network.remote-dio: enable
performance.low-prio-threads: 32
performance.io-cache: off
performance.read-ahead: off
performance.quick-read: off
auth.allow: *
network.ping-timeout: 10
cluster.quorum-type: auto
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: on
diagnostics.brick-log-level: INFO
storage.fips-mode-rchecksum: on
--
Volume Name: nvme
Type: Replicate
Status: Started
--
Volume Name: ovirt-hosted-engine
Type: Replicate
After GlusterFS Update 6.10 to 9.5 (following the generic-upgrade-procedure guide) there are "permission denied" errors on oVirt operations (i.e. starting an existing VM, cloning an existing VM) for existing data disks that are stored on the HDD volume.
This is the error message that oVirt is showing: "VM gitrunner has been paused due to unknown storage error."
Interestingly, the NVME volume does not have these issues and VMs can be started. When creating new VMs with new disks on the 'hdd' volume everything is working as expected too. The issue only seems to affect existing VM disks that are stored on the HDD volume.
Example Error in Brick Log:
[2022-04-06 13:07:29.856665 +0000] I [MSGID: 139001] [posix-acl.c:262:posix_acl_log_permit_denied] 0-hdd-access-control: client: CTX_ID:e15aab39-76c1-4b83-9c17-4dd90e1bd524-GRAPH_ID:0-PID:10405-HOST:host002.PC_NAME:hdd-client-3-RECON_NO:-0, gfid: be318638-e8a0-4c6d-977d-7a937aa84806, req(uid:107,gid:107,perm:1,ngrps:3), ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-) [Permission denied]
Example Error in Gluster Client Log (on oVirt Host):
[2022-04-06 09:17:56.949699 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-6: remote operation failed. [{path=/.shard/59643bf9-7a0b-4fc4-9cce-17227c86d
bc8.149}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.949764 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-8: remote operation failed. [{path=/.shard/59643bf9-7a0b-4fc4-9cce-17227c86dbc8.149}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.953126 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-6: remote operation failed. [{path=(null)}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.953339 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-8: remote operation failed. [{path=(null)}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.953384 +0000] W [MSGID: 108027] [afr-common.c:2925:afr_attempt_readsubvol_set] 0-hdd-replicate-2: no read subvols for /.shard/59643bf9-7a0b-4fc4-9cce-17227c86dbc8.149
[2022-04-06 09:17:56.954845 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-3: remote operation failed. [{path=/.shard/59643bf9-7a0b-4fc4-9cce-17227c86dbc8.149}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.954873 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-5: remote operation failed. [{path=/.shard/59643bf9-7a0b-4fc4-9cce-17227c86dbc8.149}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.955106 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-6: remote operation failed. [{path=/.shard/59643bf9-7a0b-4fc4-9cce-17227c86dbc8.149}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.955287 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-8: remote operation failed. [{path=/.shard/59643bf9-7a0b-4fc4-9cce-17227c86dbc8.149}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.958300 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-6: remote operation failed. [{path=(null)}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.958727 +0000] W [MSGID: 114031] [client-rpc-fops_v2.c:2625:client4_0_lookup_cbk] 0-hdd-client-8: remote operation failed. [{path=(null)}, {gfid=00000000-0000-0000-0000-000000000000}, {errno=13}, {error=Permission denied}]
[2022-04-06 09:17:56.958767 +0000] W [MSGID: 108027] [afr-common.c:2925:afr_attempt_readsubvol_set] 0-hdd-replicate-2: no read subvols for /.shard/59643bf9-7a0b-4fc4-9cce-17227c86dbc8.149
[2022-04-06 09:17:56.958807 +0000] E [MSGID: 133010] [shard.c:2416:shard_common_lookup_shards_cbk] 0-hdd-shard: Lookup on shard 149 failed. Base file gfid = 59643bf9-7a0b-4fc4-9cce-17227c86dbc8 [Transport endpoint is not connected]
[2022-04-06 09:17:56.958839 +0000] W [fuse-bridge.c:2977:fuse_readv_cbk] 0-glusterfs-fuse: 3443: READ => -1 gfid=59643bf9-7a0b-4fc4-9cce-17227c86dbc8 fd=0x7fc6e403be38 (Transport endpoint is not connected)
Mainly we see "permission denied" and "transport endpoint is not connected" on the Client and "posix_acl_log_permit_denied" on the Brick.
We tried to set "features.acl: disable", but nothing changed (as far as we can tell).
We see no Split-Brain entries and the Heal Info on the HDD volume is clear. We triggered a Full Heal but we are not seeing any changes as of now.
gluster volume heal hdd info | grep Number | sort | uniq
Number of entries: 0
The HDD volume bricks are all online on the gluster storage servers and successfully mounted on ours clients.
Mount on client:
storage1:/nvme on /glusterSD/storage1:_nvme type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
storage1:/hdd on /glusterSD/storage1:_hdd type fuse.glusterfs (rw,relatime,user_id=0,group_id=0,default_permissions,allow_other,max_read=131072)
Does anyone know how to fix these issues? Thank you in advance.
________ Community Meeting Calendar: Schedule - Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC Bridge: https://meet.google.com/cpu-eiue-hvk Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users