On 5/9/22 11:35 PM, Hao Xu wrote: > ? 2022/5/9 ??10:49, Jens Axboe ??: >> On 5/9/22 7:20 AM, Hao Xu wrote: >>> ? 2022/5/9 ??7:49, Jens Axboe ??: >>>> Hi, >>>> >>>> Currently using direct descriptors with open or accept requires the >>>> application to manage the descriptor space, picking which slot to use >>>> for any given file. However, there are cases where it's useful to just >>>> get a direct descriptor and not care about which value it is, instead >>>> just return it like a normal open or accept would. >>>> >>>> This will also be useful for multishot accept support, where allocated >>>> direct descriptors are a requirement to make that feature work with >>>> these kinds of files. >>>> >>>> This adds support for allocating a new fixed descriptor. This is chosen >>>> by passing in UINT_MAX as the fixed slot, which otherwise has a limit >>>> of INT_MAX like any file descriptor does. >>>> >>>> fs/io_uring.c | 100 +++++++++++++++++++++++++++++++++++++++++++++++--- >>>> 1 file changed, 94 insertions(+), 6 deletions(-) >>>> >>> Hi Jens, >>> I've read this idea of leveraging bitmap, it looks great. a small flaw >>> of it is that when the file_table is very long, the bitmap searching >>> seems to be O({length of table}/BITS_PER_LONG), to make the time >>> complexity stable, I did a linked list version, could you have a look >>> when you're avalible. totally untested, just to show my idea. Basically >>> I use a list to link all the free slots, when we need a slot, just get >>> the head of it. >>> https://github.com/HowHsu/linux/commits/for-5.19/io_uring_multishot_accept_v5 >>> >>> (borrowed some commit message from your patches) >> >> While that's certainly true, I'm skeptical that the list management will >> be faster for most cases. It's worth nothing that the regular file >> allocator is very much the same thing. A full scan is unlikely unless >> you already got -ENFILE. Any clear in between will reset the hint and >> it'll be O(1) again. So yes, the pathological case of having no > > it's not O(1) actually, and a full bitmap is not the only worst case. > For instance, the bitmap is like: > hint > | > 1111111111111111111111111110000 > > then a bit is cleared and hint is updated: > hint > | > 1110111111111111111111111110000 > > then next time the complexity is high Next time it's fine, since the hint is that bit. If you do do, then yes the second would be a slower. > So in this kind of scenario(first allocate many in order, then clear > low bit and allocation goes on in turn), it would be slow. And I think > these cases are not rare since people usually allocate many fds then > free the early used fds from time to time. It's by no means perfect, but if it's good enough for the normal file allocator, then I don't think it'd be wise to over-engineer this one until there's a proven need to do so. The single list items tracking free items is most certainly a LOT slower for the common cases, so I don't think that's a good approach at all. My suggestion would be to stick with the proposed approach until there's evidence that the allocator needs improving. I did write a benchmark that uses a 500K map and does opens and closes, and I don't see anything to worry about in terms of overhead. The bitmap handling doesn't even really register, dwarfed by the rest of the open path. >> If the case of finding a new descriptor is slow for a mostly full space, >> in the past I've done something like axmap [1] in fio, where you each >> 64-bit entry is representing by a single bit a layer up. That still has >> very good space utilization and good cache layout, which the list very >> much does not. But given the above, I don't think we need to worry about >> that really. >> >> As a side note, I do think we need to just bump the size of the max >> direct descriptors we can have. With the file table potentially being >> vmalloc backed, there's no reason to limit it to the current 32K. > > Agree. > >> >> [1] https://git.kernel.dk/cgit/fio/tree/lib/axmap.c >> > Cool, I'll have a look. It can get boiled down to something a bit simpler as the fio implementation supports a variety of different use cases. For example, I think it should be implemented as a single indexed array that holds all the levels, rather than separate is it's done there. In short, it just condenses everything down to one qword eventually, and finding a free bit is always O(log64(N)). -- Jens Axboe