Hello, we started with a gluster installation: 3.12.11. 3 servers (gluster11, gluster12, gluster13) and 4 bricks (each hdd == brick, JBOD behind controller) per server: bricksda1, bricksdb1, bricksdc1, bricksdd1; full information: see here: https://pastebin.com/0ndDSstG In the beginning everything was running fine so far. In May one hdd (sdd on gluster13) died and got replaced; i replaced the brick and the self-heal started, taking weeks and worsening performance. One week after the heal had finished another hdd (sdd on gluster12) died -> did the same again, it again took weeks, bad performance etc. After the replace/heal now the performance on most of the bricks was ok, but 2 have a bad performance; in short: gluster11: no hdd change, bricksd(a|b|c) ok, bricksdd takes much longer for requests gluster12: 1 hdd change, all bricks with normal performance gluster13: 1 hdd change, bricksd(a|b|c) ok, bricksdd takes much longer for requests We've checked (thx to Pranith and Xavi) hardware, disks speed, gluster settings etc., but only the 2 bricksdd on gluster11+13 take much longer (>2x) for each request, worsening the overall gluster performance. So something must be wrong, especially with bricksdd1. Anyone knows how to check how to investigate this? 2nd problen: during all these checks and searches we upgraded glusterfs from 3.12.11 -> 3.12.15 and finally to 4.1.6, but the problems didn't disappear. Well, and some additional problems came up: this week i rebooted (kernel updates) gluster11 and gluster13 (the ones with the "sick" bricksdd1), and for these 2 bricks 2 processes are started, making it unavailable. root 2118 0.1 0.0 944596 12452 ? Ssl 07:25 0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/run/gluster/glustershd/glustershd.pid -l /var/log/glusterfs/glustershd.log -S /var/run/gluster/546621eb24596f4c.socket --xlator-option *replicate*.node-uuid=4fdb11c3-a5af-4e18-af48-182c00b88cc8 --process-name glustershd root 2197 0.5 0.0 540808 8672 ? Ssl 07:25 0:00 /usr/sbin/glusterfsd -s gluster13 --volfile-id shared.gluster13.gluster-bricksdd1_new-shared -p /var/run/gluster/vols/shared/gluster13-gluster-bricksdd1_new-shared.pid -S /var/run/gluster/23f68b171e2f2c9e.socket --brick-name /gluster/bricksdd1_new/shared -l /var/log/glusterfs/bricks/gluster-bricksdd1_new-shared.log --xlator-option *-posix.glusterd-uuid=4fdb11c3-a5af-4e18-af48-182c00b88cc8 --process-name brick --brick-port 49155 --xlator-option shared-server.listen-port=49155 In the brick log for bricksdd1_new i see: [2018-12-12 06:20:41.817978] I [rpcsvc.c:2052:rpcsvc_spawn_threads] 0-rpc-service: spawned 1 threads for program 'GlusterFS 3.3'; total count:1 [2018-12-12 06:20:41.818048] I [rpcsvc.c:2052:rpcsvc_spawn_threads] 0-rpc-service: spawned 1 threads for program 'GlusterFS 4.x v1'; total count:1 A simple 'gluster volume start shared force' ended up in having 4 processes for that brick. I had to do the following twice: - kill the 2 brick processes - gluster volume start shared force After the 2nd try there was only 1 brick process left. Heal started etc. Has anyone seen that there are 2 processes for one brick? I followed the upgrade guide (https://docs.gluster.org/en/latest/Upgrade-Guide/upgrade_to_4.1/), but is there anything one can do? 3rd problem: i've seen some additional issues that on the mounted volume some clients can't see some directories, even if they are there, but other clients do. Example: client1: ls /data/repository/shared/public/staticmap/118/ ls: cannot access '/data/repository/shared/public/staticmap/118/408': No such file or directory 238 255 272 289 306 323 340 357 374 391 408 478 [...] client1: ls /data/repository/shared/public/staticmap/118/408/ ls: cannot access '/data/repository/shared/public/staticmap/118/408/': No such file or directory client2: ls /data/repository/shared/public/staticmap/118/408/ 118408013 118408051 118408260 118408285 118408334 118408399 [...] mount options: nothing special. from /etc/fstab: gluster13:/shared /shared glusterfs defaults,_netdev 0 0 By doing a umount/mount the problem disappears. umount /data/repository/shared ; mount -t glusterfs gluster12:/shared /data/repository/shared Has anyone had or seen such problems? Thx Hubert _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx https://lists.gluster.org/mailman/listinfo/gluster-users