Re: [PATCH v2] pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done

Boaz Harrosh <bharrosh@xxxxxxxxxxx> · Wed, 15 Jan 2014 00:21:58 +0200

On 01/14/2014 09:05 PM, Trond Myklebust wrote:
> On Tue, 2014-01-14 at 17:32 +0200, Boaz Harrosh wrote:
>>  
> 
> For the default mount option of 'timeo=600', and the default #define
> NFS4_POLL_RETRY_MIN==HZ/10, this means we can end up pounding the server
> with 600 LAYOUTGET requests within the space of 1 minute, before giving
> up. Is that reasonable?
> 

It will never get there it will always be 1 or two sends. Usually it is
just so the sequence of layout_get_done is out of the way and the
LAYOUT_RECALL sequence+1 can get through and the layout released. Then
the next time it will all be good and the LAYOUT_GET will succeed.

Worst case is when the client is very busy with queue full of IO
on the same busy layout that needs to be released by the recall. Personally
I found that this never exceeds 40 IOPs in flight. Note that this is not
the amount of total dirty memory but only the amount of already submitted
IO. I guess that on a very slow connection these can take time but in
regular line speeds I never observed more the 2 retries with this patch.

It is all up to the client. NFS4ERR_RECALLCONFLICT means "the layouts you
have need to be released" (I say released because the forgetful model does
not actually returns them). Can you see a critical time when layouts are
held for longer than a second ?

Thanks
Boaz

--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html