Re: question on time-out parameters

Pranith Kumar Karampuri <pkarampu@xxxxxxxxxx> · Wed, 1 Aug 2012 05:13:51 -0400 (EDT)

Jules,
    When a frame hits its time-out 'rpc/rpc-lib/src/rpc-clnt.c:138:call_bail (void *data)' is triggered.
When the client observes a network disconnection (ping-timer-expiry etc) it triggers 'rpc/rpc-lib/src/rpc-clnt.c:341:saved_frames_unwind (struct saved_frames *saved_frames)'. When a node goes down, ping timer will expire and then the frames are unwound in at max ~42 seconds. So in VM scenario it wont hang for 30 minutes.
To answer your actual question, why such a big frame timeout: Afr takes entry-locks while performing self-heals, which block other entry fops like create, delete etc. The timeout is put sufficiently large to succeed the entry operations.

Afr used to take a lock on entire file to perform data-self-heal on a regular file, we managed to remove that. We are working on doing the same for entry-self-heal. Once that happens we will be in a good position to change these to lower values.

Pranith.

----- Original Message -----
From: "Jules Wang" <lancelotds@xxxxxxx>
To: "devel" <gluster-devel@xxxxxxxxxx>
Sent: Wednesday, August 1, 2012 1:55:47 PM
Subject: question on time-out parameters

hi, all 
When I was tracking the bug https://bugzilla.redhat.com/show_bug.cgi?id=794699 

I noticed that the default value of "ping-timeout" was 42 and the default value of "frame-timeout" was 1800(30 minutes) (in xlators/protocol/client/src/client.c) 

When a node is down(ex. powered off), the volume will be out-of-service for a long time. If there is a vm run on the volume, it will probably get crush. 

So I wonder why we set large number to these parameters? 

Best Regards. 

Jules Wang 

_______________________________________________
Gluster-devel mailing list
Gluster-devel@xxxxxxxxxx
https://lists.nongnu.org/mailman/listinfo/gluster-devel