Hi Tomoaki, To avoid the issue of the 'cluster' going into an undefined state, you need to avoid issuing peer addition/deletion commands in tandem with volume operations (create, add-brick, stop, delete etc). A part of the problem is that, all volume operations are performed such that all the peers part of the cluster are kept up to date about the 'proceedings'. Now adding newer members while the volume is being 'manipulated' leaves the new peer in a rather special situation. It does not hold the same 'view' as that of the other peers, about the ongoing volume operation. This is the summary of the problem. thanks, kp On 08/23/2011 10:28 AM, Tomoaki Sato wrote: > Hi kp, > > I anticipate the future version of gluster. > Do you have any recommends to avoid the issue for now ? > I've mentioned below but it's turned out to be false. > >>> I've noticed that following commands are stable. >>> >>> on baz-2-private through baz-5-private: >>> # <wait baz-1-private appears on the DNS> >>> # ssh baz-1-private gluster peer probe <me> >>> # ssh baz-1-private gluster volume add-brick baz <me>:/mnt/brick >>> # <register me to the DNS> > > Thanks, > tomo > > (2011/08/22 15:26), krish wrote: >> Hi Tomoaki, >> >> Issuing peer related commands like 'peer probe' and 'peer detach' >> concurrently with volume operations >> can cause the 'cluster' to get into an undefined state. We are >> working on getting glusterd cluster to handle concurrent commands >> robustly. See http://bugs.gluster.com/show_bug.cgi?id=3320 for >> updates on this issue. >> >> thanks, >> kp >> >> On 08/22/2011 10:39 AM, Tomoaki Sato wrote: >>> Hi kp, >>> >>> I've reproduce the issue in my environment. >>> please find attached taz. >>> >>> there are 5 VMs, baz-1-private through baz-5-private. >>> on each VMs, following commands are issued concurrently. >>> >>> on baz-1-private: >>> # gluster volume create baz baz-1-private:/mnt/brick >>> # gluster volume start baz >>> # <register baz-1-private to DNS> >>> >>> on baz-2-private through baz-5-private: >>> # <wait baz-1-private appears on the DNS> >>> # ssh baz-1-private gluster peer probe <me> >>> # gluster volume add-brick baz <me>:/mnt/brick >>> # <register me to the DNS> >>> >>> <me> = baz-n-private (n: 2,3,4,5) >>> >>> I've noticed that following commands are stable. >>> >>> on baz-2-private through baz-5-private: >>> # <wait baz-1-private appears on the DNS> >>> # ssh baz-1-private gluster peer probe <me> >>> # ssh baz-1-private gluster volume add-brick baz <me>:/mnt/brick >>> # <register me to the DNS> >>> >>> thanks, >>> >>> tomo >>> >>> >>> (2011/08/20 14:37), krish wrote: >>>> Hi Tomoaki, >>>> >>>> Can you attach the glusterd log files of the peersseeing the problem? >>>> Restarting glusterd(s) would solve the problem. Let me see the log >>>> files and let >>>> you know if anything else can be done to resolve the problem. >>>> >>>> thanks, >>>> kp >>>> >>>> >>>> On 08/18/2011 07:37 AM, Tomoaki Sato wrote: >>>>> Hi, >>>>> >>>>> baz-X-private and baz-Y-private, 2 newly probed peers, have issued >>>>> the each 'gluster volume add-brick baz >>>>> baz-{X|Y}-private:/mnt/brick' in very short period. >>>>> Both the 'add-brick's have returned without "Add Brick successful" >>>>> messages. >>>>> After that, 'add-brick' returns with "Another operation is in >>>>> progress, please retry after some time" on the both peers every time. >>>>> How should I clear this situation ? >>>>> >>>>> Best, >>>>> >>>>> tomo >>>>> _______________________________________________ >>>>> Gluster-users mailing list >>>>> Gluster-users at gluster.org >>>>> http://gluster.org/cgi-bin/mailman/listinfo/gluster-users >>>> >>> >> >