Re: [PATCH v2] pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done

Boaz Harrosh <bharrosh@xxxxxxxxxxx> · Wed, 15 Jan 2014 01:41:56 +0200

On 01/15/2014 12:47 AM, Trond Myklebust wrote:
> 
> On Jan 14, 2014, at 17:43, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx> wrote:
> 
>>
>> On Jan 14, 2014, at 17:21, Boaz Harrosh <bharrosh@xxxxxxxxxxx> wrote:
>>
>>> On 01/14/2014 09:05 PM, Trond Myklebust wrote:
>>>> On Tue, 2014-01-14 at 17:32 +0200, Boaz Harrosh wrote:
>>>>>
>>>>
>>>> For the default mount option of 'timeo=600', and the default #define
>>>> NFS4_POLL_RETRY_MIN==HZ/10, this means we can end up pounding the server
>>>> with 600 LAYOUTGET requests within the space of 1 minute, before giving
>>>> up. Is that reasonable?
>>>>
>>>
>>> It will never get there it will always be 1 or two sends. Usually it is
>>> just so the sequence of layout_get_done is out of the way and the
>>> LAYOUT_RECALL sequence+1 can get through and the layout released. Then
>>> the next time it will all be good and the LAYOUT_GET will succeed.
>>>
>>> Worst case is when the client is very busy with queue full of IO
>>> on the same busy layout that needs to be released by the recall. Personally
>>> I found that this never exceeds 40 IOPs in flight. Note that this is not
>>> the amount of total dirty memory but only the amount of already submitted
>>> IO. I guess that on a very slow connection these can take time but in
>>> regular line speeds I never observed more the 2 retries with this patch.
>>>
>>> It is all up to the client. NFS4ERR_RECALLCONFLICT means "the layouts you
>>> have need to be released" (I say released because the forgetful model does
>>> not actually returns them). Can you see a critical time when layouts are
>>> held for longer than a second ?
>>
>> That will probably depend on the workload and possibly on the layout type.
>>
>> My point was, however, about the potential for mischief due to the mismatch between the number of retries that the resulting code allows, and the fixed period between those retries of 1/10 seconds. Why not rather use something along the lines of "rpc_delay(rpc_task, min(giveup -jiffies , max(jiffies - lgp->args.timestamp, NFS4_POLL_RETRY_MIN)));”? That gives you an initially exponential back off with a minimum period of NFS4_POLL_RETRY_MIN, and with an expiry date of ‘timeo’ jiffies after the first attempt.
> 
> Whoops. That should probably be
> 
> max(NFS4_POLL_RETRY_MIN, min(giveup - jiffies , jiffies - lgp->args.timestamp))
> 
> so that the time interval is not < NFS4_POLL_RETRY_MIN.

OK I'll try that.

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html