Re: Problem with glusterd locks on gluster 3.6.1

"B.K.Raghuram" <bkrram@xxxxxxxxx> · Fri, 17 Jun 2016 12:44:00 +0530

Thanks Atin.. I'm not familiar with pulling patches the review system but will try:)

On Fri, Jun 17, 2016 at 12:35 PM, Atin Mukherjee <amukherj@xxxxxxxxxx> wrote:

On 06/16/2016 06:17 PM, Atin Mukherjee wrote:

>

>

> On 06/16/2016 01:32 PM, B.K.Raghuram wrote:

>> Thanks a lot Atin,

>>

>> The problem is that we are using a forked version of 3.6.1 which has

>> been modified to work with ZFS (for snapshots) but we do not have the

>> resources to port that over to the later versions of gluster.

>>

>> Would you know of anyone who would be willing to take this on?!

>

> If you can cherry pick the patches and apply them on your source and

> rebuild it, I can point the patches to you, but you'd need to give a

> day's time to me as I have some other items to finish from my plate.

Here is the list of the patches need to be applied on the following order:

http://review.gluster.org/9328

http://review.gluster.org/9393

http://review.gluster.org/10023

>

> ~Atin

>>

>> Regards,

>> -Ram

>>

>> On Thu, Jun 16, 2016 at 11:02 AM, Atin Mukherjee <amukherj@xxxxxxxxxx

>> <mailto:amukherj@xxxxxxxxxx>> wrote:

>>

>>

>>

>>     On 06/16/2016 10:49 AM, B.K.Raghuram wrote:

>>     >

>>     >

>>     > On Wed, Jun 15, 2016 at 5:01 PM, Atin Mukherjee <amukherj@xxxxxxxxxx <mailto:amukherj@xxxxxxxxxx>

>>     > <mailto:amukherj@xxxxxxxxxx <mailto:amukherj@xxxxxxxxxx>>> wrote:

>>     >

>>     >

>>     >

>>     >     On 06/15/2016 04:24 PM, B.K.Raghuram wrote:

>>     >     > Hi,

>>     >     >

>>     >     > We're using gluster 3.6.1 and we periodically find that gluster commands

>>     >     > fail saying the it could not get the lock on one of the brick machines.

>>     >     > The logs on that machine then say something like :

>>     >     >

>>     >     > [2016-06-15 08:17:03.076119] E

>>     >     > [glusterd-op-sm.c:3058:glusterd_op_ac_lock] 0-management: Unable to

>>     >     > acquire lock for vol2

>>     >

>>     >     This is a possible case if concurrent volume operations are run. Do you

>>     >     have any script which checks for volume status on an interval from all

>>     >     the nodes, if so then this is an expected behavior.

>>     >

>>     >

>>     > Yes, I do have a couple of scripts that check on volume and quota

>>     > status.. Given this, I do get a "Another transaction is in progress.."

>>     > message which is ok. The problem is that sometimes I get the volume lock

>>     > held message which never goes away. This sometimes results in glusterd

>>     > consuming a lot of memory and CPU and the problem can only be fixed with

>>     > a reboot. The log files are huge so I'm not sure if its ok to attach

>>     > them to an email.

>>

>>     Ok, so this is known. We have fixed lots of stale lock issues in 3.7

>>     branch and some of them if not all were also backported to 3.6 branch.

>>     The issue is you are using 3.6.1 which is quite old. If you can upgrade

>>     to latest versions of 3.7 or at worst of 3.6 I am confident that this

>>     will go away.

>>

>>     ~Atin

>>     >

>>     >     >

>>     >     > After sometime, glusterd then seems to give up and die..

>>     >

>>     >     Do you mean glusterd shuts down or segfaults, if so I am more

>>     interested

>>     >     in analyzing this part. Could you provide us the glusterd log,

>>     >     cmd_history log file along with core (in case of SEGV) from

>>     all the

>>     >     nodes for the further analysis?

>>     >

>>     >

>>     > There is no segfault. glusterd just shuts down. As I said above,

>>     > sometimes this happens and sometimes it just continues to hog a lot of

>>     > memory and CPU..

>>     >

>>     >

>>     >     >

>>     >     > Interestingly, I also find the following line in the

>>     beginning of

>>     >     > etc-glusterfs-glusterd.vol.log and I dont know if this has any

>>     >     > significance to the issue :

>>     >     >

>>     >     > [2016-06-14 06:48:57.282290] I

>>     >     > [glusterd-store.c:2063:glusterd_restore_op_version]

>>     0-management:

>>     >     > Detected new install. Setting op-version to maximum : 30600

>>     >     >

>>     >

>>     >

>>     > What does this line signify?

>>

>>

_______________________________________________
Gluster-users mailing list
Gluster-users@xxxxxxxxxxx
http://www.gluster.org/mailman/listinfo/gluster-users