for completeness: https://bugzilla.redhat.com/show_bug.cgi?id=1378978 2016-09-23 18:06 GMT+02:00 Luca Gervasi <luca.gervasi@xxxxxxxxx>: > Hi guys, > I've got a strange problem involving this timeline (matches the "Log > fragment 1" excerpt) > 19:56:50: disk is detached from my system. This disk is actually the brick > of the volume V. > 19:56:50: LVM sees the disk as unreachable and starts its maintenance > procedures > 19:56:50: LVM umounts my thin provisioned volumes > 19:57:02: Health check on specific bricks fails thus moving the brick to a > down state > 19:57:32: XFS filesystem umounts > > At this point, the brick filesystem is no longer mounted. The underlying > filesystems is empty (misses the brick directory too). My assumption is that > gluster would stop itself in such conditions: it is not. > Gluster slowly fills my entire root partition, creating its full tree. > > My only warning point is the disk that starts to fill its inodes to 100%. > > I've read release notes for every version subsequent mine (3.7.14, 3.7.15) > without finding relevant fixes and at this point i'm pretty sure is some bug > undocumented. > Servers were made symmetric. > > Could you please help me understand how to avoid that gluster coninues write > on an unmounted filesystem? Thanks. > > I'm running a 3 node replica on 3 azure vms. This is the configuration: > > MD (yes, i use md to aggregate 4 disks into a single 4Tb volume): > /dev/md128: > Version : 1.2 > Creation Time : Mon Aug 29 18:10:45 2016 > Raid Level : raid0 > Array Size : 4290248704 (4091.50 GiB 4393.21 GB) > Raid Devices : 4 > Total Devices : 4 > Persistence : Superblock is persistent > > Update Time : Mon Aug 29 18:10:45 2016 > State : clean > Active Devices : 4 > Working Devices : 4 > Failed Devices : 0 > Spare Devices : 0 > > Chunk Size : 512K > > Name : 128 > UUID : d5c51214:43e48da9:49086616:c1371514 > Events : 0 > > Number Major Minor RaidDevice State > 0 8 80 0 active sync /dev/sdf > 1 8 96 1 active sync /dev/sdg > 2 8 112 2 active sync /dev/sdh > 3 8 128 3 active sync /dev/sdi > > PV, VG, LV status > PV VG Fmt Attr PSize PFree DevSize PV UUID > /dev/md127 VGdata lvm2 a-- 2.00t 2.00t 2.00t > Kxb6C0-FLIH-4rB1-DKyf-IQuR-bbPE-jm2mu0 > /dev/md128 gluster lvm2 a-- 4.00t 1.07t 4.00t > lDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m > > VG Attr Ext #PV #LV #SN VSize VFree VG UUID > VProfile > VGdata wz--n- 4.00m 1 0 0 2.00t 2.00t > XI2V2X-hdxU-0Jrn-TN7f-GSEk-7aNs-GCdTtn > gluster wz--n- 4.00m 1 6 0 4.00t 1.07t > ztxX4f-vTgN-IKop-XePU-OwqW-T9k6-A6uDk0 > > LV VG #Seg Attr LSize Maj Min KMaj KMin Pool > Origin Data% Meta% Move Cpy%Sync Log Convert LV UUID > LProfile > apps-data gluster 1 Vwi-aotz-- 50.00g -1 -1 253 12 > thinpool 0.08 > znUMbm-ax1N-R7aj-dxLc-gtif-WOvk-9QC8tq > feed gluster 1 Vwi-aotz-- 100.00g -1 -1 253 14 > thinpool 0.08 > hZ4Isk-dELG-lgFs-2hJ6-aYid-8VKg-3jJko9 > homes gluster 1 Vwi-aotz-- 1.46t -1 -1 253 11 > thinpool 58.58 > salIPF-XvsA-kMnm-etjf-Uaqy-2vA9-9WHPkH > search-data gluster 1 Vwi-aotz-- 100.00g -1 -1 253 13 > thinpool 16.41 > Z5hoa3-yI8D-dk5Q-2jWH-N5R2-ge09-RSjPpQ > thinpool gluster 1 twi-aotz-- 2.93t -1 -1 253 9 > 29.85 60.00 > oHTbgW-tiPh-yDfj-dNOm-vqsF-fBNH-o1izx2 > video-asset-manager gluster 1 Vwi-aotz-- 100.00g -1 -1 253 15 > thinpool 0.07 > 4dOXga-96Wa-u3mh-HMmE-iX1I-o7ov-dtJ8lZ > > Gluster volume configuration (all volumes use the same exact configuration, > listing them all would be redundant) > Volume Name: vol-homes > Type: Replicate > Volume ID: 0c8fa62e-dd7e-429c-a19a-479404b5e9c6 > Status: Started > Number of Bricks: 1 x 3 = 3 > Transport-type: tcp > Bricks: > Brick1: glu01.prd.azr:/bricks/vol-homes/brick1 > Brick2: glu02.prd.azr:/bricks/vol-homes/brick1 > Brick3: glu03.prd.azr:/bricks/vol-homes/brick1 > Options Reconfigured: > performance.readdir-ahead: on > cluster.server-quorum-type: server > nfs.disable: disable > cluster.lookup-unhashed: auto > performance.nfs.quick-read: on > performance.nfs.read-ahead: on > performance.cache-size: 4096MB > cluster.self-heal-daemon: enable > diagnostics.brick-log-level: ERROR > diagnostics.client-log-level: ERROR > nfs.rpc-auth-unix: off > nfs.acl: off > performance.nfs.io-cache: on > performance.client-io-threads: on > performance.nfs.stat-prefetch: on > performance.nfs.io-threads: on > diagnostics.latency-measurement: on > diagnostics.count-fop-hits: on > performance.md-cache-timeout: 1 > performance.cache-refresh-timeout: 1 > performance.io-thread-count: 16 > performance.high-prio-threads: 16 > performance.normal-prio-threads: 16 > performance.low-prio-threads: 16 > performance.least-prio-threads: 1 > cluster.server-quorum-ratio: 60 > > fstab: > /dev/gluster/homes /bricks/vol-homes > xfs defaults,noatime,nobarrier,nofail 0 2 > > Software: > CentOS Linux release 7.1.1503 (Core) > glusterfs-api-3.7.13-1.el7.x86_64 > glusterfs-libs-3.7.13-1.el7.x86_64 > glusterfs-3.7.13-1.el7.x86_64 > glusterfs-fuse-3.7.13-1.el7.x86_64 > glusterfs-server-3.7.13-1.el7.x86_64 > glusterfs-client-xlators-3.7.13-1.el7.x86_64 > glusterfs-cli-3.7.13-1.el7.x86_64 > > > Log fragment 1: > Sep 22 19:56:50 glu03 lvm[868]: WARNING: Device for PV > lDazuw-zBPf-Duis-ZDg1-3zfg-53Ba-2ZF34m not found or rejected by a filter. > Sep 22 19:56:50 glu03 lvm[868]: Cannot change VG gluster while PVs are > missing. > Sep 22 19:56:50 glu03 lvm[868]: Consider vgreduce --removemissing. > Sep 22 19:56:50 glu03 lvm[868]: Failed to extend thin metadata > gluster-thinpool-tpool. > Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume > gluster-thinpool-tpool from /bricks/vol-homes. > Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume > gluster-thinpool-tpool from /bricks/vol-search-data. > Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume > gluster-thinpool-tpool from /bricks/vol-apps-data. > Sep 22 19:56:50 glu03 lvm[868]: Unmounting thin volume > gluster-thinpool-tpool from /bricks/vol-video-asset-manager. > Sep 22 19:57:02 glu03 bricks-vol-video-asset-manager-brick1[45162]: > [2016-09-22 17:57:02.713428] M [MSGID: 113075] > [posix-helpers.c:1844:posix_health_check_thread_proc] > 0-vol-video-asset-manager-posix: health-check failed, going down > Sep 22 19:57:05 glu03 bricks-vol-apps-data-brick1[44536]: [2016-09-22 > 17:57:05.186146] M [MSGID: 113075] > [posix-helpers.c:1844:posix_health_check_thread_proc] 0-vol-apps-data-posix: > health-check failed, going down > Sep 22 19:57:18 glu03 bricks-vol-search-data-brick1[40928]: [2016-09-22 > 17:57:18.674279] M [MSGID: 113075] > [posix-helpers.c:1844:posix_health_check_thread_proc] > 0-vol-search-data-posix: health-check failed, going down > Sep 22 19:57:32 glu03 bricks-vol-video-asset-manager-brick1[45162]: > [2016-09-22 17:57:32.714461] M [MSGID: 113075] > [posix-helpers.c:1850:posix_health_check_thread_proc] > 0-vol-video-asset-manager-posix: still alive! -> SIGTERM > Sep 22 19:57:32 glu03 kernel: XFS (dm-15): Unmounting Filesystem > Sep 22 19:57:35 glu03 bricks-vol-apps-data-brick1[44536]: [2016-09-22 > 17:57:35.186352] M [MSGID: 113075] > [posix-helpers.c:1850:posix_health_check_thread_proc] 0-vol-apps-data-posix: > still alive! -> SIGTERM > Sep 22 19:57:35 glu03 kernel: XFS (dm-12): Unmounting Filesystem > Sep 22 19:57:48 glu03 bricks-vol-search-data-brick1[40928]: [2016-09-22 > 17:57:48.674444] M [MSGID: 113075] > [posix-helpers.c:1850:posix_health_check_thread_proc] > 0-vol-search-data-posix: still alive! -> SIGTERM > Sep 22 19:57:48 glu03 kernel: XFS (dm-13): Unmounting Filesystem > > _______________________________________________ > Gluster-users mailing list > Gluster-users@xxxxxxxxxxx > http://www.gluster.org/mailman/listinfo/gluster-users _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users