On 11/6/2024 10:59 AM, Christoph Hellwig wrote: > On Tue, Nov 05, 2024 at 09:23:19AM -0700, Keith Busch wrote: >>>> The SQE128 requirement is only for PI type. >>>> Another different meta type may just fit into the first SQE. For that we >>>> don't have to mandate SQE128. >>> >>> Ok, I'm really confused now. The way I understood Anuj was that this >>> is NOT about block level metadata, but about other uses of the big SQE. >>> >>> Which version is right? Or did I just completely misunderstand Anuj? >> >> Let's not call this "meta_type". Can we use something that has a less >> overloaded meaning, like "sqe_extended_capabilities", or "ecap", or >> something like that. > > So it's just a flag that a 128-byte SQE is used? No, this flag tells that user decided to send PI in SQE. And this flag is kept into first half of SQE (which always exists). This is just additional detail/requirement that PI fields are kept into SQE128 (which is opt in). > Don't we know that > implicitly from the sq? Yes, we have a separate ring-level flag for that. #define IORING_SETUP_SQE128 (1U << 10) /* SQEs are 128 byte */ >>> - a flag that a pointer to metadata is passed. This can work with >>> a 64-bit SQE. >>> - another flag that a PI tuple is passed. This requires a 128-byte >>> and also the previous flag. >> >> I don't think anything done so far aligns with what Pavel had in mind. >> Let me try to lay out what I think he's going for. Just bare with me, >> this is just a hypothetical example. >> >> This patch adds a PI extension. >> Later, let's say write streams needs another extenion. >> Then key per-IO wants another extention. >> Then someone else adds wizbang-awesome-feature extention. >> >> Let's say you have device that can do all 4, or any combination of them. >> Pavel wants a solution that is future proof to such a scenario. So not >> just a single new "meta_type" with its structure, but a list of types in >> no particular order, and their structures. > > But why do we need the type at all? Each of them obvious needs two > things: > > 1) some space to actually store the extra fields > 2) a flag that the additional values are passed Yes, this is exactly how the patch is implemented. 'meta-type' is the flag that tells additional values (representing PI info) are passed. > any single value is not going to help with supporting arbitrary > combinations, Not a single value. It is a u16 field, so it can represent 16 possible flags. This part in the patch: +enum io_uring_sqe_meta_type_bits { + META_TYPE_PI_BIT, + /* not a real meta type; just to make sure that we don't overflow */ + META_TYPE_LAST_BIT, +}; + +/* meta type flags */ +#define META_TYPE_PI (1U << META_TYPE_PI_BIT) For future users, one can add things like META_TYPE_KPIO_BIT or META_TYPE_WRITE_HINT_BIT if they needed to send extra information in SQE. Note that these users may not require SQE128. It all depends on how much of extra information is required. We still have some free space in first SQE. because well, you can can mix and match, and you need > space for all them even if you are not using all of them. mix-and-match can be detected with the above flags. And in case two types don't go well together, that also. And for such types we can reuse the space.