Re: missing files on FUSE mount

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Martín,

 Try to disable "performance.readdir-ahead", we had a similar issue, and disabling "performance.readdir-ahead" solved our issue.
gluster volume set tapeless performance.readdir-ahead off

On Tue, Oct 27, 2020 at 8:23 PM Martín Lorenzo <mlorenzo@xxxxxxxxx> wrote:
Hi Strahil, today we have the same number clients on all nodes, but the problem persists. I have the impression that it gets more frequent as the server capacity fills up, now we are having at least one incident per day.
Regards,
Martin

On Mon, Oct 26, 2020 at 8:09 AM Martín Lorenzo <mlorenzo@xxxxxxxxx> wrote:
HI Strahil, thanks for your reply,
I had one node with 13 clients, the rest with 14. I've just restarted the services on that node, now I have 14, let's see what happens.
Regarding the samba repos, I wasn't aware of that, I was using centos main repo. I'll check the out
Best Regards,
Martin


On Tue, Oct 20, 2020 at 3:19 PM Strahil Nikolov <hunter86_bg@xxxxxxxxx> wrote:
Do you have the same ammount of clients connected to each brick ?

I guess something like this can show it:

gluster volume status VOL clients
gluster volume status VOL client-list

Best Regards,
Strahil Nikolov






В вторник, 20 октомври 2020 г., 15:41:45 Гринуич+3, Martín Lorenzo <mlorenzo@xxxxxxxxx> написа:





Hi, I have the following problem, I have a distributed replicated cluster set up with samba and CTDB, over fuse mount points
I am having inconsistencies across the FUSE mounts, users report that files are disappearing after being copied/moved. I take a look at the mount points on each node, and they don't display the same data

#### faulty mount point####
[root@gluster6 ARRIBA GENTE martes 20 de octubre]# ll
ls: cannot access PANEO VUELTA A CLASES CON TAPABOCAS.mpg: No such file or directory
ls: cannot access PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg: No such file or directory
total 633723
drwxr-xr-x. 5 arribagente PN      4096 Oct 19 10:52 COMERCIAL AG martes 20 de octubre
-rw-r--r--. 1 arribagente PN 648927236 Jun  3 07:16 PANEO FACHADA PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg
-?????????? ? ?           ?          ?            ? PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg
-?????????? ? ?           ?          ?            ? PANEO VUELTA A CLASES CON TAPABOCAS.mpg


###healthy mount point###
[root@gluster7 ARRIBA GENTE martes 20 de octubre]# ll
total 3435596
drwxr-xr-x. 5 arribagente PN       4096 Oct 19 10:52 COMERCIAL AG martes 20 de octubre
-rw-r--r--. 1 arribagente PN  648927236 Jun  3 07:16 PANEO FACHADA PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg
-rw-r--r--. 1 arribagente PN 2084415492 Aug 18 09:14 PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg
-rw-r--r--. 1 arribagente PN  784701444 Sep  4 07:23 PANEO VUELTA A CLASES CON TAPABOCAS.mpg

 - So far the only way to solve this is to create a directory in the healthy mount point, on the same path:
[root@gluster7 ARRIBA GENTE martes 20 de octubre]# mkdir hola

- When you refresh the other mountpoint, and the issue is resolved:
[root@gluster6 ARRIBA GENTE martes 20 de octubre]# ll
total 3435600
drwxr-xr-x. 5 arribagente PN         4096 Oct 19 10:52 COMERCIAL AG martes 20 de octubre
drwxr-xr-x. 2 root        root       4096 Oct 20 08:45 hola
-rw-r--r--. 1 arribagente PN    648927236 Jun  3 07:16 PANEO FACHADA PALACIO LEGISLATIVO DRONE DIA Y NOCHE.mpg
-rw-r--r--. 1 arribagente PN   2084415492 Aug 18 09:14 PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg
-rw-r--r--. 1 arribagente PN    784701444 Sep  4 07:23 PANEO VUELTA A CLASES CON TAPABOCAS.mpg

Interestingly, the error occurs on the mount point where the files were copied. They don't show up as pending heal entries. I have around 15 people using them over samba, I think I'm having this issue reported every two days. 

I have an older cluster with similar issues, different gluster version, but a very similar topology (4 bricks, initially two bricks then expanded)
Please note , the bricks aren't the same size (but their replicas are), so my other suspicion is that rebalancing has something to do with it.

I'm trying to reproduce it over a small virtualized cluster, so far no results.

Here are the cluster details
four nodes, replica 2, plus one arbiter hosting 2 bricks

I have 2 bricks with ~20 TB capacity and the other pair is ~48TB
Volume Name: tapeless
Type: Distributed-Replicate
Volume ID: 53bfa86d-b390-496b-bbd7-c4bba625c956
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x (2 + 1) = 6
Transport-type: tcp
Bricks:
Brick1: gluster6.glustersaeta.net:/data/glusterfs/tapeless/brick_6/brick
Brick2: gluster7.glustersaeta.net:/data/glusterfs/tapeless/brick_7/brick
Brick3: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_1a/brick (arbiter)
Brick4: gluster12.glustersaeta.net:/data/glusterfs/tapeless/brick_12/brick
Brick5: gluster13.glustersaeta.net:/data/glusterfs/tapeless/brick_13/brick
Brick6: kitchen-store.glustersaeta.net:/data/glusterfs/tapeless/brick_2a/brick (arbiter)
Options Reconfigured:
features.quota-deem-statfs: on
performance.client-io-threads: on
nfs.disable: on
transport.address-family: inet
features.quota: on
features.inode-quota: on
features.cache-invalidation: on
features.cache-invalidation-timeout: 600
performance.cache-samba-metadata: on
performance.stat-prefetch: on
performance.cache-invalidation: on
performance.md-cache-timeout: 600
network.inode-lru-limit: 200000
performance.nl-cache: on
performance.nl-cache-timeout: 600
performance.readdir-ahead: on
performance.parallel-readdir: on
performance.cache-size: 1GB
client.event-threads: 4
server.event-threads: 4
performance.normal-prio-threads: 16
performance.io-thread-count: 32
performance.write-behind-window-size: 8MB
storage.batch-fsync-delay-usec: 0
cluster.data-self-heal: on
cluster.metadata-self-heal: on
cluster.entry-self-heal: on
cluster.self-heal-daemon: on
performance.write-behind: on
performance.open-behind: on

Log section form faulty mount point. I think the [file exists] entries are from people trying to copy the missing files over an over


[2020-10-20 11:31:03.034220] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:32:06.684329] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:33:02.191863] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:34:05.841608] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:35:20.736633] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-1: performing metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644
[2020-10-20 11:35:20.741213] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: Completed metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644. sources=[0] 1  sinks=2  
[2020-10-20 11:35:04.278043] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
The message "I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-1: performing metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644" repeated 3 times between [2020-10-20 11:35:20.736633] and [2020-10-20 11:35:26.733298]
The message "I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: Completed metadata selfheal on 958dbd7a-3cd7-4b66-9038-76e5c5669644. sources=[0] 1  sinks=2 " repeated 3 times between [2020-10-20 11:35:20.741213] and [2020-10-20 11:35:26.737629]
[2020-10-20 11:36:02.548350] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:36:57.365537] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-1: performing metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7
[2020-10-20 11:36:57.370824] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: Completed metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7. sources=[0] 1  sinks=2  
[2020-10-20 11:37:01.363925] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-1: performing metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7
[2020-10-20 11:37:01.368069] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-1: Completed metadata selfheal on f4907af2-1775-4c46-89b5-e9776df6d5c7. sources=[0] 1  sinks=2  
The message "I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0" repeated 3 times between [2020-10-20 11:36:02.548350] and [2020-10-20 11:37:36.389208]
[2020-10-20 11:38:07.367113] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:39:01.595981] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:40:04.184899] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:41:07.833470] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:42:01.871621] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:43:04.399194] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:44:04.558647] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:44:15.953600] W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-5: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]
[2020-10-20 11:44:15.953819] W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-2: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]
[2020-10-20 11:44:15.954072] W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-3: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]
[2020-10-20 11:44:15.954680] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043294: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
[2020-10-20 11:44:15.963175] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043306: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
[2020-10-20 11:44:15.971839] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043318: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
[2020-10-20 11:44:16.010242] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043403: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
[2020-10-20 11:44:16.020291] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043415: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
[2020-10-20 11:44:16.028857] W [fuse-bridge.c:2606:fuse_create_cbk] 0-glusterfs-fuse: 31043427: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg => -1 (File exists)
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-5: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]" repeated 5 times between [2020-10-20 11:44:15.953600] and [2020-10-20 11:44:16.027785]
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-2: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]" repeated 5 times between [2020-10-20 11:44:15.953819] and [2020-10-20 11:44:16.028331]
The message "W [MSGID: 114031] [client-rpc-fops_v2.c:2114:client4_0_create_cbk] 0-tapeless-client-3: remote operation failed. Path: /PN/arribagente/PLAYER 2020/ARRIBA GENTE martes 20 de octubre/PANEO NIÑOS ESCUELAS CON TAPABOCAS.mpg [File exists]" repeated 5 times between [2020-10-20 11:44:15.954072] and [2020-10-20 11:44:16.028355]
[2020-10-20 11:45:03.572106] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:45:40.080010] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
The message "I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0" repeated 2 times between [2020-10-20 11:45:40.080010] and [2020-10-20 11:47:10.871801]
[2020-10-20 11:48:03.913129] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:49:05.082165] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:50:06.725722] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:51:04.254685] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:52:07.903617] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:53:01.420513] I [MSGID: 108026] [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do] 0-tapeless-replicate-0: performing metadata selfheal on 3c316533-5f47-4267-ac19-58b3be305b94
[2020-10-20 11:53:01.428657] I [MSGID: 108026] [afr-self-heal-common.c:1750:afr_log_selfheal] 0-tapeless-replicate-0: Completed metadata selfheal on 3c316533-5f47-4267-ac19-58b3be305b94. sources=[0]  sinks=1 2  
The message "I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0" repeated 3 times between [2020-10-20 11:52:07.903617] and [2020-10-20 11:53:12.037835]
[2020-10-20 11:54:02.208354] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:55:04.360284] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:56:09.508092] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:57:02.580970] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0
[2020-10-20 11:58:06.230698] I [MSGID: 108031] [afr-common.c:2581:afr_local_discovery_cbk] 0-tapeless-replicate-0: selecting local read_child tapeless-client-0 


Let me know if you need something else. Thank you for you suppoort!
Best Regards,
Martin Lorenzo


________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://bluejeans.com/441850968

Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users


--
Respectfully
Mahdi
________



Community Meeting Calendar:

Schedule -
Every 2nd and 4th Tuesday at 14:30 IST / 09:00 UTC
Bridge: https://meet.google.com/cpu-eiue-hvk
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
https://lists.gluster.org/mailman/listinfo/gluster-users

[Index of Archives]     [Gluster Development]     [Linux Filesytems Development]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux