On Mon, 2015-08-10 at 22:53 +0530, Atin Mukherjee wrote:
[snip]
> stat("/sys/fs/selinux", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
> brk(0) = 0x8db000
> brk(0x8fc000) = 0x8fc000
> mkdir("test", 0777
Can you also collect the statedump of all the brick processes when the command is hung?
+ Ravi, could you check this?
I ran the command but I could not find where it put the output:
[root@gluster1a-1 ~]# gluster volume statedump callrec all volume statedump: success [root@gluster1a-1 ~]# gluster volume info callrec Volume Name: callrec Type: Replicate Volume ID: a39830b7-eddb-4061-b381-39411274131a Status: Started Number of Bricks: 1 x 4 = 4 Transport-type: tcp Bricks: Brick1: gluster1a-1:/data/brick/callrec Brick2: gluster1b-1:/data/brick/callrec Brick3: gluster2a-1:/data/brick/callrec Brick4: gluster2b-1:/data/brick/callrec Options Reconfigured: performance.flush-behind: off [root@gluster1a-1 ~]#gluster volume status callrec Status of volume: callrec Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick gluster1a-1:/data/brick/callrec 49153 Y 29041 Brick gluster1b-1:/data/brick/callrec 49153 Y 31260 Brick gluster2a-1:/data/brick/callrec 49153 Y 31585 Brick gluster2b-1:/data/brick/callrec 49153 Y 12153 NFS Server on localhost 2049 Y 29733 Self-heal Daemon on localhost N/A Y 29741 NFS Server on gluster1b-1 2049 Y 31872 Self-heal Daemon on gluster1b-1 N/A Y 31882 NFS Server on gluster2a-1 2049 Y 32216 Self-heal Daemon on gluster2a-1 N/A Y 32226 NFS Server on gluster2b-1 2049 Y 12752 Self-heal Daemon on gluster2b-1 N/A Y 12762 Task Status of Volume callrec ------------------------------------------------------------------------------ There are no active volume tasks [root@gluster1a-1 ~]# ls -l /tmp total 144 drwx------. 3 root root 16 Aug 8 22:20 systemd-private-Dp10Pz -rw-------. 1 root root 5818 Jul 31 06:39 yum_save_tx.2015-07-31.06-39.JCvHd5.yumtx -rw-------. 1 root root 5818 Aug 1 06:58 yum_save_tx.2015-08-01.06-58.wBytr2.yumtx -rw-------. 1 root root 5818 Aug 2 05:18 yum_save_tx.2015-08-02.05-18.AXIFSe.yumtx -rw-------. 1 root root 5818 Aug 3 07:15 yum_save_tx.2015-08-03.07-15.EDd8rg.yumtx -rw-------. 1 root root 5818 Aug 4 03:48 yum_save_tx.2015-08-04.03-48.XE513B.yumtx -rw-------. 1 root root 5818 Aug 5 09:03 yum_save_tx.2015-08-05.09-03.mX8xXF.yumtx -rw-------. 1 root root 28869 Aug 6 06:39 yum_save_tx.2015-08-06.06-39.166wJX.yumtx -rw-------. 1 root root 28869 Aug 7 07:20 yum_save_tx.2015-08-07.07-20.rLqJnT.yumtx -rw-------. 1 root root 28869 Aug 8 08:29 yum_save_tx.2015-08-08.08-29.KKaite.yumtx [root@gluster1a-1 ~]#
Where should I find the output of the statedump command?
Cheers,
Kingsley.
>
>> >
>> >
>> >
>> >
>> >> >
>> >> > Then ... do I need to run something on one of the bricks while strace is
>> >> > running?
>> >> >
>> >> > Cheers,
>> >> > Kingsley.
>> >> >
>> >> >
>> >> > > >
>> >> > > > [root@gluster1b-1 ~]# gluster volume heal callrec info
>> >> > > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
>> >> > > > <gfid:164f888f-2049-49e6-ad26-c758ee091863>
>> >> > > > /recordings/834723/14391 - Possibly undergoing heal
>> >> > > >
>> >> > > > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f>
>> >> > > > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e>
>> >> > > > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c>
>> >> > > > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb>
>> >> > > > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd>
>> >> > > > Number of entries: 7
>> >> > > >
>> >> > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
>> >> > > > Number of entries: 0
>> >> > > >
>> >> > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
>> >> > > > <gfid:e280b40c-d8b7-43c5-9da7-4737054d7a7f>
>> >> > > > <gfid:164f888f-2049-49e6-ad26-c758ee091863>
>> >> > > > <gfid:650efeca-b45c-413b-acc3-f0a5853ccebd>
>> >> > > > <gfid:b1fbda4a-732f-4f5d-b5a1-8355d786073e>
>> >> > > > /recordings/834723/14391 - Possibly undergoing heal
>> >> > > >
>> >> > > > <gfid:edb74524-b4b7-4190-85e7-4aad002f6e7c>
>> >> > > > <gfid:9b8b8446-1e27-4113-93c2-6727b1f457eb>
>> >> > > > Number of entries: 7
>> >> > > >
>> >> > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
>> >> > > > Number of entries: 0
>> >> > > >
>> >> > > >
>> >> > > > If I query each brick directly for the number of files/directories
>> >> > > > within that, I get 1731 on gluster1a-1 and gluster2a-1, but 1737 on
>> >> > > the
>> >> > > > other two, using this command:
>> >> > > >
>> >> > > > # find /data/brick/callrec/recordings/834723/14391 -print | wc -l
>> >> > > >
>> >> > > > Cheers,
>> >> > > > Kingsley.
>> >> > > >
>> >> > > > On Mon, 2015-08-10 at 11:05 +0100, Kingsley wrote:
>> >> > > > > Sorry for the blind panic - restarting the volume seems to have
>> >> > > fixed
>> >> > > > > it.
>> >> > > > >
>> >> > > > > But then my next question - why is this necessary? Surely it
>> >> > > undermines
>> >> > > > > the whole point of a high availability system?
>> >> > > > >
>> >> > > > > Cheers,
>> >> > > > > Kingsley.
>> >> > > > >
>> >> > > > > On Mon, 2015-08-10 at 10:53 +0100, Kingsley wrote:
>> >> > > > > > Hi,
>> >> > > > > >
>> >> > > > > > We have a 4 way replicated volume using gluster 3.6.3 on CentOS
>> >> > > 7.
>> >> > > > > >
>> >> > > > > > Over the weekend I did a yum update on each of the bricks in
>> >> > > turn, but
>> >> > > > > > now when clients (using fuse mounts) try to access the volume,
>> >> > > it hangs.
>> >> > > > > > Gluster itself wasn't updated (we've disabled that repo so that
>> >> > > we keep
>> >> > > > > > to 3.6.3 for now).
>> >> > > > > >
>> >> > > > > > This was what I did:
>> >> > > > > >
>> >> > > > > > * on first brick, "yum update"
>> >> > > > > > * reboot brick
>> >> > > > > > * watch "gluster volume status" on another brick and wait
>> >> > > for it
>> >> > > > > > to say all 4 bricks are online before proceeding to
>> >> > > update the
>> >> > > > > > next brick
>> >> > > > > >
>> >> > > > > > I was expecting the clients might pause 30 seconds while they
>> >> > > notice a
>> >> > > > > > brick is offline, but then recover.
>> >> > > > > >
>> >> > > > > > I've tried re-mounting clients, but that hasn't helped.
>> >> > > > > >
>> >> > > > > > I can't see much data in any of the log files.
>> >> > > > > >
>> >> > > > > > I've tried "gluster volume heal callrec" but it doesn't seem to
>> >> > > have
>> >> > > > > > helped.
>> >> > > > > >
>> >> > > > > > What shall I do next?
>> >> > > > > >
>> >> > > > > > I've pasted some stuff below in case any of it helps.
>> >> > > > > >
>> >> > > > > > Cheers,
>> >> > > > > > Kingsley.
>> >> > > > > >
>> >> > > > > > [root@gluster1b-1 ~]# gluster volume info callrec
>> >> > > > > >
>> >> > > > > > Volume Name: callrec
>> >> > > > > > Type: Replicate
>> >> > > > > > Volume ID: a39830b7-eddb-4061-b381-39411274131a
>> >> > > > > > Status: Started
>> >> > > > > > Number of Bricks: 1 x 4 = 4
>> >> > > > > > Transport-type: tcp
>> >> > > > > > Bricks:
>> >> > > > > > Brick1: gluster1a-1:/data/brick/callrec
>> >> > > > > > Brick2: gluster1b-1:/data/brick/callrec
>> >> > > > > > Brick3: gluster2a-1:/data/brick/callrec
>> >> > > > > > Brick4: gluster2b-1:/data/brick/callrec
>> >> > > > > > Options Reconfigured:
>> >> > > > > > performance.flush-behind: off
>> >> > > > > > [root@gluster1b-1 ~]#
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > [root@gluster1b-1 ~]# gluster volume status callrec
>> >> > > > > > Status of volume: callrec
>> >> > > > > > Gluster process Port
>> >> > > Online Pid
>> >> > > > > >
>> >> > > ------------------------------------------------------------------------------
>> >> > > > > > Brick gluster1a-1:/data/brick/callrec 49153
>> >> > > Y 6803
>> >> > > > > > Brick gluster1b-1:/data/brick/callrec 49153
>> >> > > Y 2614
>> >> > > > > > Brick gluster2a-1:/data/brick/callrec 49153
>> >> > > Y 2645
>> >> > > > > > Brick gluster2b-1:/data/brick/callrec 49153
>> >> > > Y 4325
>> >> > > > > > NFS Server on localhost 2049
>> >> > > Y 2769
>> >> > > > > > Self-heal Daemon on localhost N/A
>> >> > > Y 2789
>> >> > > > > > NFS Server on gluster2a-1 2049
>> >> > > Y 2857
>> >> > > > > > Self-heal Daemon on gluster2a-1 N/A
>> >> > > Y 2814
>> >> > > > > > NFS Server on 88.151.41.100 2049
>> >> > > Y 6833
>> >> > > > > > Self-heal Daemon on 88.151.41.100 N/A
>> >> > > Y 6824
>> >> > > > > > NFS Server on gluster2b-1 2049
>> >> > > Y 4428
>> >> > > > > > Self-heal Daemon on gluster2b-1 N/A
>> >> > > Y 4387
>> >> > > > > >
>> >> > > > > > Task Status of Volume callrec
>> >> > > > > >
>> >> > > ------------------------------------------------------------------------------
>> >> > > > > > There are no active volume tasks
>> >> > > > > >
>> >> > > > > > [root@gluster1b-1 ~]#
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > [root@gluster1b-1 ~]# gluster volume heal callrec info
>> >> > > > > > Brick gluster1a-1.dns99.co.uk:/data/brick/callrec/
>> >> > > > > > /to_process - Possibly undergoing heal
>> >> > > > > >
>> >> > > > > > Number of entries: 1
>> >> > > > > >
>> >> > > > > > Brick gluster1b-1.dns99.co.uk:/data/brick/callrec/
>> >> > > > > > Number of entries: 0
>> >> > > > > >
>> >> > > > > > Brick gluster2a-1.dns99.co.uk:/data/brick/callrec/
>> >> > > > > > /to_process - Possibly undergoing heal
>> >> > > > > >
>> >> > > > > > Number of entries: 1
>> >> > > > > >
>> >> > > > > > Brick gluster2b-1.dns99.co.uk:/data/brick/callrec/
>> >> > > > > > Number of entries: 0
>> >> > > > > >
>> >> > > > > > [root@gluster1b-1 ~]#
>> >> > > > > >
>> >> > > > > >
>> >> > > > > > _______________________________________________
>> >> > > > > > Gluster-users mailing list
>> >> > > > > > Gluster-users@xxxxxxxxxxx
>> >> > > > > > http://www.gluster.org/mailman/listinfo/gluster-users
>> >> > > > > >
>> >> > > > >
>> >> > > > > _______________________________________________
>> >> > > > > Gluster-users mailing list
>> >> > > > > Gluster-users@xxxxxxxxxxx
>> >> > > > > http://www.gluster.org/mailman/listinfo/gluster-users
>> >> > > > >
>> >> > > >
>> >> > > > _______________________________________________
>> >> > > > Gluster-users mailing list
>> >> > > > Gluster-users@xxxxxxxxxxx
>> >> > > > http://www.gluster.org/mailman/listinfo/gluster-users
>> >> > >
>> >> > >
>> >> >
>> >>
>> >>
>>
>>
>>
>> _______________________________________________
>> Gluster-users mailing list
>> Gluster-users@xxxxxxxxxxx
>> http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users
_______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users