On 11/7/2024 10:53 PM, Pavel Begunkov wrote: > Let's say we have 3 different attributes META_TYPE{1,2,3}. > > How are they placed in an SQE? > > meta1 = (void *)get_big_sqe(sqe); > meta2 = meta1 + sizeof(?); // sizeof(struct meta1_struct) > meta3 = meta2 + sizeof(struct meta2_struct); Not necessary to do this kind of additions and think in terms of sequential ordering for the extra information placed into primary/secondary SQE. Please see v8: https://lore.kernel.org/io-uring/20241106121842.5004-7-anuj20.g@xxxxxxxxxxx/ It exposes a distinct flag (sqe->ext_cap) for each attribute/cap, and userspace should place the corresponding information where kernel has mandated. If a particular attribute (example write-hint) requires <20b of extra information, we should just place that in first SQE. PI requires more so we are placing that into second SQE. When both PI and write-hint flags are specified by user they can get processed fine without actually having to care about above additions/ordering. > Structures are likely not fixed size (?). At least the PI looks large > enough to force everyone to be just aliased to it. > > And can the user pass first meta2 in the sqe and then meta1? Yes. Just set the ext_cap flags without bothering about first/second. User can pass either or both, along with the corresponding info. Just don't have to assume specific placement into SQE. > meta2 = (void *)get_big_sqe(sqe); > meta1 = meta2 + sizeof(?); // sizeof(struct meta2_struct) > > If yes, how parsing should look like? Does the kernel need to read each > chunk's type and look up its size to iterate to the next one? We don't need to iterate if we are not assuming any ordering. > If no, what happens if we want to pass meta2 and meta3, do they start > from the big_sqe? The one who adds the support for meta2/meta3 in kernel decides where to place them within first/second SQE or get them fetched via a pointer from userspace. > How do we pass how many of such attributes is there for the request? ext_cap allows to pass 16 cap/attribute flags. Maybe all can or can not be passed inline in SQE, but I have no real visibility about the space requirement of future users. > It should support arbitrary number of attributes in the long run, which > we can't pass in an SQE, bumping the SQE size is not scalable in > general, so it'd need to support user pointers or sth similar at some > point. Placing them in an SQE can serve as an optimisation, and a first> step, though it might be easier to start with user pointer instead. > > Also, when we eventually come to user pointers, we want it to be > performant as well and e.g. get by just one copy_from_user, and the > api/struct layouts would need to be able to support it. And once it's > copied we'll want it to be handled uniformly with the SQE variant, that > requires a common format. For different formats there will be a question > of perfomance, maintainability, duplicating kernel and userspace code. > > All that doesn't need to be implemented, but we need a clear direction > for the API. Maybe we can get a simplified user space pseudo code > showing how the end API is supposed to look like? Yes. For a large/arbitrary number, we may have to fetch the entire attribute list using a user pointer/len combo. And parse it (that's where all your previous questions fit). And that can still be added on top of v8. For example, adding a flag (in ext_cap) that disables inline-sqe processing and switches to external attribute buffer: /* Second SQE has PI information */ #define EXT_CAP_PI (1U << 0) /* First SQE has hint information */ #define EXT_CAP_WRITE_HINT (1U << 1) /* Do not assume CAP presence in SQE, and fetch capability buffer page instead */ #define EXT_CAP_INDIRECT (1U << 2) Corresponding pointer (and/or len) can be put into last 16b of SQE. Use the same flags/structures for the given attributes within this buffer. That will keep things uniform and will reuse the same handling that we add for inline attributes.