On 11/14/24 10:45, Anuj Gupta wrote:
Add the ability to pass additional attributes along with read/write. Application can populate an array of 'struct io_uring_attr_vec' and pass its address using the SQE field: __u64 attr_vec_addr; Along with number of attributes using: __u8 nr_attr_indirect; Overall 16 attributes are allowed and currently one attribute 'ATTR_TYPE_PI' is supported.
Why only 16? It's possible that might need more, 256 would be a safer choice and fits into u8. I don't think you even need to commit to a number, it should be ok to add more as long as it fits into the given types (u8 above). It can also be u16 as well.
With PI attribute, userspace can pass following information: - flags: integrity check flags IO_INTEGRITY_CHK_{GUARD/APPTAG/REFTAG} - len: length of PI/metadata buffer - addr: address of metadata buffer - seed: seed value for reftag remapping - app_tag: application defined 16b value
In terms of flexibility I like it apart from small nits, but the double indirection could be a bit inefficient, this thing: struct pi_attr pi = {}; attr_array = { &pi, ... }; sqe->attr_addr = attr_array; So maybe we should just flatten it? An attempt to pseudo code it to understand what it entails is below. Certainly buggy and some handling is omitted, but should show the idea. // uapi/.../io_uring.h struct sqe { ... u64 attr_addr; /* the total size of the array pointed by attr_addr */ u16 attr_size; /* max 64KB, more than enough */ } struct io_attr_header { /* bit mask of attributes passed, can be helpful in the future * for optimising processing. */ u64 attr_type_map; }; /* each attribute should start with a preamble */ struct io_uring_attr_preamble { u16 attr_type; }; // user space struct PI_param { struct io_attr_header header; struct io_uring_attr_preamble preamble; struct io_uring_attr_pi pi; }; struct PI_param p = {}; p.header.map = 1 << ATTR_TYPE_PI; p.preamble.type = ATTR_TYPE_PI; p.pi = {...}; sqe->attr_addr = &p; sqe->attr_size = sizeof(p); The holes b/w structures should be packed better. For the same reason I don't like a separate preamble structure much, maybe it should be embedded into the attribute structures, e.g. struct io_uring_attr_pi { u16 attr_type; ... } The user side looks ok to me, should be pretty straightforward if the user can define a structure like PI_param, i.e. knows at compilation time which attributes it wants to use. // kernel space (current patch set, PI only) addr = READ_ONCE(sqe->attr_addr); if (addr) { size = READ_ONCE(sqe->attr_size); process_pi(addr, size); } process_pi(addr, size) { struct PI_param p; if (size != sizeof(PI_attr + struct attr_preamble + struct attr_header)) return -EINVAL; copy_from_user(p, addr, sizeof(p)); if (p.preamble != ATTR_TYPE_PI) return -EINVAL; do_pi_setup(&p->pi); } This one is pretty simple as well. A bit more troublesome if extended with many attributes, but it'd need additional handling regardless: process_pi(addr, size) { if (size < sizeof(header + preamble)) return -EINVAL; attr_array = malloc(size); // +caching by io_uring copy_from_user(attr_array); handle_attributes(attr_array, size); } handle_attributes(attr_array, size) { struct io_attr_header *hdr = attr_array; offset = sizeof(*hdr); while (1) { if (offset + sizeof(struct preamble) > size) break; struct preamble *pr = attr_array + offset; if (pr->type > MAX_TYPES) return -EINVAL; attr_size = attr_sizes[pr->type]; if (offset + sizeof(preamble) + attr_size > size) return -EINVAL; offset += sizeof(preamble) + attr_size; process_attr(pr->type, (void *)(pr + 1)); } } Some checks can probably be optimised by playing with the uapi a bit. attr_type_map is unused here, but I like the idea. I'd love to see all actual attribute handling to move deeper into the stack to those who actually need it, but that's for far away undecided future. E.g. io_uring_rw { p = malloc(); copy_from_user(p, sqe->attr_addr); kiocb->attributes = p; } block_do_read { hdr = kiocb->attributes; type_mask = /* all types block layer recognises */ if (hdr->attr_type_map & type_mask) use_attributes(); } copy_from_user can be optimised, I mentioned before, we can have a pre-mapped area into which the indirection can point. The infra is already in there and even used for passing waiting arguments. process_pi(addr, size) { struct PI_param *p, __p; if (some_flags & USE_REGISTERED_REGION) { // Glorified p = ctx->ptr; with some checks p = io_uring_get_mem(addr, size); } else { copy_from_user(__p, addr, sizeof(__p)); p = &__p; } ... } In this case all reads would need to be READ_ONCE, but that shouldn't be a problem. It might also optimise out the kmalloc in the extended version. -- Pavel Begunkov