Re: [[RFC] 1/1] SUNRPC: dynamic rpc_slot allocator for TCP

Andy Adamson <andros@xxxxxxxxxx> · Wed, 4 May 2011 13:22:01 -0400

On May 4, 2011, at 12:01 PM, Chuck Lever wrote:

> 
> On May 4, 2011, at 11:52 AM, Andy Adamson wrote:
> 
>> 
>> On May 4, 2011, at 11:18 AM, Jeff Layton wrote:
>> 
>>> On Wed, 4 May 2011 10:54:57 -0400
>>> Andy Adamson <andros@xxxxxxxxxx> wrote:
>>> 
>>>> 
>>>> On May 4, 2011, at 7:54 AM, Jeff Layton wrote:
>>>> 
>>>>> On Wed, 4 May 2011 12:07:26 +1000
>>>>> NeilBrown <neilb@xxxxxxx> wrote:
>>>>> 
>>>>>> On Tue, 03 May 2011 21:46:03 -0400 Trond Myklebust
>>>>>> <Trond.Myklebust@xxxxxxxxxx> wrote:
>>>>>> 
>>>>>>> On Wed, 2011-05-04 at 11:18 +1000, NeilBrown wrote:
>>>>>>>> For rpc slots, I doubt you need mempools at all.
>>>>>>>> Mempools are only needed if failure is not an option and you would rather
>>>>>>>> wait, but you cannot wait for regular writeback because you are on the
>>>>>>>> writeback path.  So you use a mempool which waits for a previous request to
>>>>>>>> complete.  I don't think that describes rpc slots at all.
>>>>>>>> For rpc slots, you can afford to wait when setting up the transport, as you
>>>>>>>> are not on the writeout path yet, and later you can always cope with failure.
>>>>>>>> So just use kmalloc.
>>>>>>> 
>>>>>>> Errr.... No. By the time you get to allocating an RPC slot, you may be
>>>>>>> bang smack in the middle of the writeout path.
>>>>>>> 
>>>>>>> The scenario would be that something triggers a writeback (kswapd,
>>>>>>> direct reclaim,...) which triggers an RPC call, which requires you to
>>>>>>> allocate at least one rpc slot before you can put the write on the wire.
>>>>>>> 
>>>>>>> I agree with your assertion that we only need one successful slot
>>>>>>> allocation in order to make overall forward progress, but we definitely
>>>>>>> do need that one...
>>>>>>> 
>>>>>> 
>>>>>> Probably I misunderstood the code, but I thought that there was a base set of
>>>>>> slots preallocated.  Eventually one of those will become free won't it?
>>>>>> 
>>>>>> Looking at the code again, it only ever returns entries to the mempool when
>>>>>> it is about to destroy the xprt.  That makes no sense.  If you are using a
>>>>>> mempool, then you allocate when you need to use space, and free it as soon as
>>>>>> you have finished with it, so the next request can get a turn.
>>>>>> 
>>>>> 
>>>>> That was sort of my point, though I didn't articulate it very well.
>>>>> 
>>>>> I don't think we want to keep this "static" set of rqsts and the
>>>>> mempool. It seems like that's somewhat redundant and more complex than
>>>>> is warranted. The question is...which should we keep?
>>>>> 
>>>>> I think it sort of depends on what the intended model is. If we want to
>>>>> allocate out of the "guaranteed" pool first and only when that's
>>>>> exhausted fall back to doing dynamic allocations, then we probably want
>>>>> to keep the static list of entries.
>>>>> 
>>>>> If we want to allocate dynamically most of the time and only dip into
>>>>> the "guaranteed" pool when that can't be satisfied, then basing all of
>>>>> this around mempools might make more sense.
>>>> 
>>>> Thanks for the review - I've learned a lot about mempools from this discussion.
>>>> 
>>>> I've noticed that a lot of the time, (at least for TCP) rpc_slots are allocated and not used.
>>>> The default slot entries value is a guess. This is why I like the idea of dynamically allocating most 
>>>> of the time, and only dipping into the guaranteed pool when the dynamic allocation
>>>> can't be satisfied. e.g. get rid of the static set of slots.
>>>> Does this design require two mempools, one for dynamic allocation and a guaranteed one being small and "in reserve"?
>>>> 
>>> 
>>> No, a single mempool would give you that. Just declare the mempool with
>>> the amount of "emergency" slots you want, and then allocate directly
>>> from it and then free them back to the mempool. Most of the time (as
>>> Neil points out) you'll get them allocated from the slabcache. It'll
>>> it'll only dip into the reserved set when those can't be allocated.
>> 
>> Cool.
>> 
>>> 
>>> The other thing you have to determine is whether you want this
>>> guaranteed number of slots to be per-xprt or global. Today, it's
>>> per-xprt. Making it global might mean less wasted memory overall, but I
>>> could forsee situations where a hung mount could starve other mounts of
>>> slots.
>>> 
>>> Figuring that out might help guide the overall design here...
>>> 
>>>> For UDP and RDMA, the default # of slot entries can turn into an upper bound.
>>>> For TCP, the default # of slot entries would have no meaning WRT upper bound, as the upper bound is determined
>>>> by the TCP window size.
>>>> 
>>> 
>>> Maybe rather than having this dynamic_slot_alloc operation that's only
>>> conditionally called, it would cleaner to abstract out the slot
>>> allocator altogether.
>>> 
>>> UDP and RDMA could use the same routine and just allocate out of a
>>> fixed set of preallocated slots (sort of like how they do today), and
>>> TCP could use this new scheme.
>> 
>> We need rpc_slots (UDP/RDMA/TCP/....) to be freed back to the mempool in xprt_free_slot.
>> If we allocate the set # of rpc_slots, place them on the xprt->free list, and they don't get used,
>> then they won't be freed until the xprt is destroyed.
> 
> The reason to keep slots around and on a local free list is that full-out memory allocation is heavy weight.  I keep thinking of the talk Willy gave at LSF about how much scalability was affected in the SCSI I/O path by simple memory allocation operations.  Is using a SLAB or SLUB as fast as what we have today?

That is a good question. We could implement this same slot allocation scheme without a SLAB/SLUB - just use kmalloc and leave xprt_free_slot to just move the allocated slots to the free list as is done today. I don't have any information on a performance hit. 

Is this worth testing, or should we just stick with kmalloc and friends?

-->Andy
> 
>> I was thinking of using the same allocator for UDP/RDMA/TCP. With all transports, start off with a 
>> minimal # of pre-allocated slots (6? 4? 2?) and for UDP/RDMA trigger a dynamic allocation in xprt_alloc_slot
>> when there are no slots available and the # slots allocated is less than the default # slot entries for that transport.
>> 
>> -->Andy
>> 
>>> -- 
>>> Jeff Layton <jlayton@xxxxxxxxxx>
>> 
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> -- 
> Chuck Lever
> chuck[dot]lever[at]oracle[dot]com
> 
> 
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html