Re: [PATCH] target/user: random data ring allocation

Andy Grover <agrover@xxxxxxxxxx> · Tue, 1 Dec 2015 17:20:33 -0800

On 11/30/2015 03:58 PM, fjoe@xxxxxxxxxxxxxx wrote:
On 01 Dec 2015, at 01:24, Andy Grover <agrover@xxxxxxxxxx>
See tcmu_check_expired_cmd. If the command has timed out then we've
called target_complete_cmd already and the fabric may have already
freed the se_cmd. We clear the se_cmd pointer to make sure we don't
use a stale pointer, and therefore need our own copy of data_length
in tcmu_cmd in order to know how much space in the data ring was
being used by the timed-out command.

This might be a problem as I use sg lists to remember which data
chunks need to be free'd (I use dma_address for that).

Using dma_address for our own non-DMA-related purposes strikes me as 
asking for trouble.

The idea was to keep the original data allocator intact at least in
the first version of the patch. This can of course be changed to
always using the new data allocator and a capability returned by the
kernel that it can do ooo completion.

Yes you can replace it, the same allocator should be acceptable for both 
cases.

I can change that but this will complicate things as sg list is
PAGE_SIZE'd. Also, page-size based allocation is convenient if we
want to have a zero-copy TCMU (this is actually a long-term goal, or
2nd stage feature that we would like to have).

Like I said before, we can't use PAGE_SIZE. It would be too wasteful on 
some archs, and we would be limiting what the design could be by trying 
to have things set up for the long-term goal, which may or may not 
actually be achievable.

If you really want to talk about zero copy now, please start a new 
thread and we can try to work through all the obstacles I'm currently 
aware of.

So what I'd like to see in a v2 patch is *just* changing the data
area so it acts more like a heap instead of a ringbuffer -- replace
the head and tail stuff with something that allocates, probably
using a bitmap of used and free chunks. There should even be
existing in-kernel APIs to help manage this. Once that's working,
then we can talk about another change extending the
kernel-to-handler API so userspace knows it can return commands
OoO.

I was not able to find a ready-to-use in-kernel arena allocator (that
can allocate memory within the specified continuous memory chunk) so
I ended up with a free-list implementation. I thought that
full-featured bitmap or slab allocator solely for TCMU is an
overkill.

A full-featured one I agree is too much. But what about a simple one? We 
have a greater tolerance for false negatives -- failing to allocate 
space even if it's available -- than other allocators because this just 
means the cmd has to wait until previous commands complete, instead of 
failure.

If we had a small, fixed-size array in tcmu_cmd to keep track of its 
data area allocation ranges, wouldn't that let us track the ranges it 
was using so we could free them in the bitmap on completion[1]? If the 
allocation required more ranges than the fixed size, we'd just give up 
and sleep. Worst case, once all submitted commands completed we'd know 
we could satisfy the next pending one.

Thoughts on this approach?

Regards -- Andy

[1] basically a copy of the iovec[] userspace sees, but safe from 
userspace clobbering it.
--
To unsubscribe from this list: send the line "unsubscribe target-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html