On 05/04/2016 06:18 PM, ABHISHEK PALIWAL wrote: > I am talking about the time taken by the GlusterD to mark the process > offline because > here GlusterD is responsible to making brick online/offline. > > is it configurable? No, there is no such configuration > > On Wed, May 4, 2016 at 5:53 PM, Atin Mukherjee <amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx>> wrote: > > Abhishek, > > See the response inline. > > > On 05/04/2016 05:43 PM, ABHISHEK PALIWAL wrote: > > Hi Atin, > > > > please reply, is there any configurable time out parameter for brick > > process to go offline which we can increase? > > > > Regards, > > Abhishek > > > > On Thu, Apr 21, 2016 at 12:34 PM, ABHISHEK PALIWAL > > <abhishpaliwal@xxxxxxxxx <mailto:abhishpaliwal@xxxxxxxxx> > <mailto:abhishpaliwal@xxxxxxxxx <mailto:abhishpaliwal@xxxxxxxxx>>> > wrote: > > > > Hi Atin, > > > > Please answer following doubts as well: > > > > 1 .If there is a temporary glitch in the network , will that affect > > the gluster brick process in anyway, Is there any timeout for the > > brick process to go offline in case of the glitch in the network. > If there is disconnection, GlusterD will receive it and mark the > brick as disconnected even if the brick process is online. So answer to > this question is both yes and no. From process perspective they are > still up but not to the other components/layers and that may impact the > operations (both mgmt & I/O given there is a disconnect between client > and brick processes too) > > > > 2. Is there is any configurable time out parameter which we can > > increase ? > I don't get this question. What time out are you talking about? > > > > 3.Brick and glusterd connected by unix domain socket.It is just a > > local socket then why it is disconnect in below logs: > This is not true, its over TCP socket. > > > > 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005] > > [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: > > Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from > > glusterd. > > 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0] > > [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: Setting > > brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped > > > > Regards, > > Abhishek > > > > > > On Tue, Apr 19, 2016 at 1:12 PM, ABHISHEK PALIWAL > > <abhishpaliwal@xxxxxxxxx <mailto:abhishpaliwal@xxxxxxxxx> > <mailto:abhishpaliwal@xxxxxxxxx <mailto:abhishpaliwal@xxxxxxxxx>>> > wrote: > > > > Hi Atin, > > > > Thanks. > > > > Have more doubts here. > > > > Brick and glusterd connected by unix domain socket.It is just a > > local socket then why it is disconnect in below logs: > > > > 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005] > > [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] 0-management: > > Brick 10.32. 1.144:/opt/lvmdir/c2/brick has disconnected from > > glusterd. > > 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0] > > [glusterd-utils.c:4872:glusterd_set_brick_status] 0-glusterd: > > Setting > > brick 10.32.1. 144:/opt/lvmdir/c2/brick status to stopped > > > > > > Regards, > > Abhishek > > > > > > On Fri, Apr 15, 2016 at 9:14 AM, Atin Mukherjee > > <amukherj@xxxxxxxxxx <mailto:amukherj@xxxxxxxxxx> > <mailto:amukherj@xxxxxxxxxx <mailto:amukherj@xxxxxxxxxx>>> wrote: > > > > > > > > On 04/14/2016 04:07 PM, ABHISHEK PALIWAL wrote: > > > > > > > > > On Thu, Apr 14, 2016 at 2:33 PM, Atin Mukherjee <amukherj@xxxxxxxxxx <mailto:amukherj@xxxxxxxxxx> > <mailto:amukherj@xxxxxxxxxx <mailto:amukherj@xxxxxxxxxx>> > > > <mailto:amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx> <mailto:amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx>>>> wrote: > > > > > > > > > > > > On 04/05/2016 03:35 PM, ABHISHEK PALIWAL wrote: > > > > > > > > > > > > On Tue, Apr 5, 2016 at 2:22 PM, Atin Mukherjee > <amukherj@xxxxxxxxxx <mailto:amukherj@xxxxxxxxxx> > <mailto:amukherj@xxxxxxxxxx <mailto:amukherj@xxxxxxxxxx>> > > <mailto:amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx> <mailto:amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx>>> > > > > <mailto:amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx> > > <mailto:amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx>> <mailto:amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx> > > <mailto:amukherj@xxxxxxxxxx > <mailto:amukherj@xxxxxxxxxx>>>>> wrote: > > > > > > > > > > > > > > > > On 04/05/2016 01:04 PM, ABHISHEK PALIWAL > wrote: > > > > > Hi Team, > > > > > > > > > > We are using Gluster 3.7.6 and facing one > > problem in which > > > brick is not > > > > > comming online after restart the board. > > > > > > > > > > To understand our setup, please look the > > following steps: > > > > > 1. We have two boards A and B on which > Gluster > > volume is > > > running in > > > > > replicated mode having one brick on each > board. > > > > > 2. Gluster mount point is present on the > Board > > A which is > > > sharable > > > > > between number of processes. > > > > > 3. Till now our volume is in sync and > > everthing is working fine. > > > > > 4. Now we have test case in which we'll stop > > the glusterd, > > > reboot the > > > > > Board B and when this board comes up, starts > > the glusterd > > > again on it. > > > > > 5. We repeated Steps 4 multiple times to > check the > > > reliability of system. > > > > > 6. After the Step 4, sometimes system > comes in > > working state > > > (i.e. in > > > > > sync) but sometime we faces that brick of > > Board B is present in > > > > > “gluster volume status” command but > not be > > online even > > > waiting for > > > > > more than a minute. > > > > As I mentioned in another email thread > until and > > unless the > > > log shows > > > > the evidence that there was a reboot > nothing can > > be concluded. > > > The last > > > > log what you shared with us few days back > didn't > > give any > > > indication > > > > that brick process wasn't running. > > > > > > > > How can we identify that the brick process is > > running in brick logs? > > > > > > > > > 7. When the Step 4 is executing at the same > > time on Board A some > > > > > processes are started accessing the > files from > > the Gluster > > > mount point. > > > > > > > > > > As a solution to make this brick online, we > > found some > > > existing issues > > > > > in gluster mailing list giving suggestion to > > use “gluster > > > volume start > > > > > <vol_name> force” to make the brick > 'offline' > > to 'online'. > > > > > > > > > > If we use “gluster volume start <vol_name> > > force” command. > > > It will kill > > > > > the existing volume process and started the > > new process then > > > what will > > > > > happen if other processes are accessing the > > same volume at > > > the time when > > > > > volume process is killed by this command > > internally. Will it > > > impact any > > > > > failure on these processes? > > > > This is not true, volume start force will > start > > the brick > > > processes only > > > > if they are not running. Running brick > processes > > will not be > > > > interrupted. > > > > > > > > we have tried and check the pid of process before > > force start and > > > after > > > > force start. > > > > the pid has been changed after force start. > > > > > > > > Please find the logs at the time of failure > attached > > once again with > > > > log-level=debug. > > > > > > > > if you can give me the exact line where you > are able > > to find out that > > > > the brick process > > > > is running in brick log file please give me > the line > > number of > > > that file. > > > > > > Here is the sequence at which glusterd and > respective > > brick process is > > > restarted. > > > > > > 1. glusterd restart trigger - line number 1014 in > > glusterd.log file: > > > > > > [2016-04-03 10:12:29.051735] I [MSGID: 100030] > > [glusterfsd.c:2318:main] > > > 0-/usr/sbin/glusterd: Started running /usr/sbin/ > > glusterd > > > version 3.7.6 (args: /usr/sbin/glusterd -p > > /var/run/glusterd.pid > > > --log-level DEBUG) > > > > > > 2. brick start trigger - line number 190 in > > opt-lvmdir-c2-brick.log > > > > > > [2016-04-03 10:14:25.268833] I [MSGID: 100030] > > [glusterfsd.c:2318:main] > > > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/ > > glusterfsd > > > version 3.7.6 (args: /usr/sbin/glusterfsd -s > > 10.32.1.144 --volfile-id > > > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p / > > > > > > system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid > > > -S > /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. > > socket > > > --brick-name /opt/lvmdir/c2/brick -l > > > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log > > --xlator-option > > > *-posix.glusterd- > > uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256 > > > --brick-port 49329 --xlator-option > > c_glusterfs-server.listen-port=49329) > > > > > > 3. The following log indicates that brick is up > and is > > now started. > > > Refer to line 16123 in glusterd.log > > > > > > [2016-04-03 10:14:25.336855] D [MSGID: 0] > > > > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] > > 0-management: > > > Connected to 10.32.1.144:/opt/lvmdir/c2/brick > > > > > > This clearly indicates that the brick is up and > > running as after that I > > > do not see any disconnect event been processed by > > glusterd for the brick > > > process. > > > > > > > > > Thanks for replying descriptively but please also clear > > some more doubts: > > > > > > 1. At this 10:14:25 moment of time brick is available > > because we have > > > removed brick and added it again to make it online: > > > following are the logs from cmd-history.log file of > 000300 > > > > > > [2016-04-03 10:14:21.446570] : volume status : SUCCESS > > > [2016-04-03 10:14:21.665889] : volume remove-brick > > c_glusterfs replica > > > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS > > > [2016-04-03 10:14:21.764270] : peer detach > 10.32.1.144 : > > SUCCESS > > > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : > > SUCCESS > > > [2016-04-03 10:14:25.649525] : volume add-brick > > c_glusterfs replica 2 > > > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS > > > > > > and also 10:12:29 was the last reboot time before this > > failure. So I am > > > totally agree what you said earlier. > > > > > > 2 .As you said at 10:12:29 glusterd restarted then > why we > > are not > > > getting 'brick start trigger' related logs > > > like below between 10:12:29 to 10:14:25 time stamp > which > > is something > > > two minute of time interval. > > So here is the culprit: > > > > 1667 [2016-04-03 10:12:32.984331] I [MSGID: 106005] > > [glusterd-handler.c:4908:__glusterd_brick_rpc_notify] > > 0-management: > > Brick 10.32. 1.144:/opt/lvmdir/c2/brick has > > disconnected from > > glusterd. > > 1668 [2016-04-03 10:12:32.984366] D [MSGID: 0] > > [glusterd-utils.c:4872:glusterd_set_brick_status] > > 0-glusterd: Setting > > brick 10.32.1. 144:/opt/lvmdir/c2/brick status > to stopped > > > > > > GlusterD received a disconnect event for this brick > process > > and mark it > > as stopped. This could happen due to two reasons. 1. brick > > process goes > > down or 2. Network issue. In this case its the later I > > believe since the > > brick process was running at that time. I'd request you to > > check this > > from the N/W side. > > > > > > > > > > [2016-04-03 10:14:25.268833] I [MSGID: 100030] > > [glusterfsd.c:2318:main] > > > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/ > > glusterfsd > > > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 > > --volfile-id > > > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p / > > > > > > system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid > > > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. > > socket > > > --brick-name /opt/lvmdir/c2/brick -l > > > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log > > --xlator-option > > > *-posix.glusterd- > > uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256 > > > --brick-port 49329 --xlator-option > > c_glusterfs-server.listen-port=49329) > > > > > > 3. We are continuously checking brick status in the > above > > time duration > > > using "gluster volume status" refer the cmd-history.log > > file from 000300 > > > > > > In glusterd.log file we are also getting below logs > > > > > > [2016-04-03 10:12:31.771051] D [MSGID: 0] > > > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] > > 0-management: > > > Connected to 10.32.1.144:/opt/lvmdir/c2/brick > > > > > > [2016-04-03 10:12:32.981152] D [MSGID: 0] > > > [glusterd-handler.c:4897:__glusterd_brick_rpc_notify] > > 0-management: > > > Connected to 10.32.1.144:/opt/lvmdir/c2/brick > > > > > > two times b/w 10:12:29 and 10:14:25 and as you said > these > > logs " > > > clearly indicates that the brick is up and running as > > after" then why > > > brick is not online in "gluster volume status" command > > > > > > [2016-04-03 10:12:33.990487] : volume status : SUCCESS > > > [2016-04-03 10:12:34.007469] : volume status : SUCCESS > > > [2016-04-03 10:12:35.095918] : volume status : SUCCESS > > > [2016-04-03 10:12:35.126369] : volume status : SUCCESS > > > [2016-04-03 10:12:36.224018] : volume status : SUCCESS > > > [2016-04-03 10:12:36.251032] : volume status : SUCCESS > > > [2016-04-03 10:12:37.352377] : volume status : SUCCESS > > > [2016-04-03 10:12:37.374028] : volume status : SUCCESS > > > [2016-04-03 10:12:38.446148] : volume status : SUCCESS > > > [2016-04-03 10:12:38.468860] : volume status : SUCCESS > > > [2016-04-03 10:12:39.534017] : volume status : SUCCESS > > > [2016-04-03 10:12:39.553711] : volume status : SUCCESS > > > [2016-04-03 10:12:40.616610] : volume status : SUCCESS > > > [2016-04-03 10:12:40.636354] : volume status : SUCCESS > > > ...... > > > ...... > > > ...... > > > [2016-04-03 10:14:21.446570] : volume status : SUCCESS > > > [2016-04-03 10:14:21.665889] : volume remove-brick > > c_glusterfs replica > > > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS > > > [2016-04-03 10:14:21.764270] : peer detach > 10.32.1.144 : > > SUCCESS > > > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : > > SUCCESS > > > [2016-04-03 10:14:25.649525] : volume add-brick > > c_glusterfs replica 2 > > > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS > > > > > > In above logs we are continuously checking brick status > > but when we > > > don't find brick status 'online' even after ~2 minutes > > then we removed > > > it and add it again to make it online. > > > > > > [2016-04-03 10:14:21.665889] : volume remove-brick > > c_glusterfs replica > > > 1 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS > > > [2016-04-03 10:14:21.764270] : peer detach > 10.32.1.144 : > > SUCCESS > > > [2016-04-03 10:14:23.060442] : peer probe 10.32.1.144 : > > SUCCESS > > > [2016-04-03 10:14:25.649525] : volume add-brick > > c_glusterfs replica 2 > > > 10.32.1.144:/opt/lvmdir/c2/brick force : SUCCESS > > > > > > that is why in logs we are gettting "brick start trigger > > logs" at time > > > stamp 10:14:25 > > > > > > [2016-04-03 10:14:25.268833] I [MSGID: 100030] > > [glusterfsd.c:2318:main] > > > 0-/usr/sbin/glusterfsd: Started running /usr/sbin/ > > glusterfsd > > > version 3.7.6 (args: /usr/sbin/glusterfsd -s 10.32.1.144 > > --volfile-id > > > c_glusterfs.10.32.1.144.opt-lvmdir-c2-brick -p / > > > > > > system/glusterd/vols/c_glusterfs/run/10.32.1.144-opt-lvmdir-c2-brick.pid > > > -S /var/run/gluster/697c0e4a16ebc734cd06fd9150723005. > > socket > > > --brick-name /opt/lvmdir/c2/brick -l > > > /var/log/glusterfs/bricks/opt-lvmdir-c2-brick.log > > --xlator-option > > > *-posix.glusterd- > > uuid=2d576ff8-0cea-4f75-9e34-a5674fbf7256 > > > --brick-port 49329 --xlator-option > > c_glusterfs-server.listen-port=49329) > > > > > > > > > Regards, > > > Abhishek > > > > > > > > > Please note that all the logs referred and > pasted are > > from 002500. > > > > > > ~Atin > > > > > > > > 002500 - Board B that brick is offline > > > > 00300 - Board A logs > > > > > > > > > > > > > > *Question : What could be contributing to > > brick offline?* > > > > > > > > > > > > > > > -- > > > > > > > > > > Regards > > > > > Abhishek Paliwal > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Gluster-devel mailing list > > > > > Gluster-devel@xxxxxxxxxxx > <mailto:Gluster-devel@xxxxxxxxxxx> > > <mailto:Gluster-devel@xxxxxxxxxxx > <mailto:Gluster-devel@xxxxxxxxxxx>> > > <mailto:Gluster-devel@xxxxxxxxxxx > <mailto:Gluster-devel@xxxxxxxxxxx> > > <mailto:Gluster-devel@xxxxxxxxxxx > <mailto:Gluster-devel@xxxxxxxxxxx>>> > > > <mailto:Gluster-devel@xxxxxxxxxxx > <mailto:Gluster-devel@xxxxxxxxxxx> > > <mailto:Gluster-devel@xxxxxxxxxxx > <mailto:Gluster-devel@xxxxxxxxxxx>> > > <mailto:Gluster-devel@xxxxxxxxxxx > <mailto:Gluster-devel@xxxxxxxxxxx> > > <mailto:Gluster-devel@xxxxxxxxxxx > <mailto:Gluster-devel@xxxxxxxxxxx>>>> > > > > > > > http://www.gluster.org/mailman/listinfo/gluster-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > > > > > Regards > > Abhishek Paliwal > > > > > -- > > > > > Regards > Abhishek Paliwal _______________________________________________ Gluster-users mailing list Gluster-users@xxxxxxxxxxx http://www.gluster.org/mailman/listinfo/gluster-users