Re: [PATCH v6 06/10] io_uring/rw: add support to send metadata along with read/write

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/10/24 18:36, Kanchan Joshi wrote:
On 11/7/2024 10:53 PM, Pavel Begunkov wrote:

Let's say we have 3 different attributes META_TYPE{1,2,3}.

How are they placed in an SQE?

meta1 = (void *)get_big_sqe(sqe);
meta2 = meta1 + sizeof(?); // sizeof(struct meta1_struct)
meta3 = meta2 + sizeof(struct meta2_struct);

Not necessary to do this kind of additions and think in terms of
sequential ordering for the extra information placed into
primary/secondary SQE.

Please see v8:
https://lore.kernel.org/io-uring/20241106121842.5004-7-anuj20.g@xxxxxxxxxxx/

It exposes a distinct flag (sqe->ext_cap) for each attribute/cap, and
userspace should place the corresponding information where kernel has
mandated.

If a particular attribute (example write-hint) requires <20b of extra
information, we should just place that in first SQE. PI requires more so
we are placing that into second SQE.

When both PI and write-hint flags are specified by user they can get
processed fine without actually having to care about above
additions/ordering.

Ok, this option is to statically define a place in SQE for each
meta type. The problem is that we can't place everything into
an SQE, and the next big meta would need to be a user pointer,
at which point copy_from_user() is expensive again and we need
to invent something new. PI becomes a special case, most likely
handled in a special way, and either becomes one of few "optimised"
or forces for nothing its users into SQE128 (with all additional
costs) when it could've been aligned with other later meta types.

Structures are likely not fixed size (?). At least the PI looks large
enough to force everyone to be just aliased to it.

And can the user pass first meta2 in the sqe and then meta1?

Yes. Just set the ext_cap flags without bothering about first/second.
User can pass either or both, along with the corresponding info. Just
don't have to assume specific placement into SQE.


meta2 = (void *)get_big_sqe(sqe);
meta1 = meta2 + sizeof(?); // sizeof(struct meta2_struct)

If yes, how parsing should look like? Does the kernel need to read each
chunk's type and look up its size to iterate to the next one?

We don't need to iterate if we are not assuming any ordering.

If no, what happens if we want to pass meta2 and meta3, do they start
from the big_sqe?

The one who adds the support for meta2/meta3 in kernel decides where to
place them within first/second SQE or get them fetched via a pointer
from userspace.

How do we pass how many of such attributes is there for the request?

ext_cap allows to pass 16 cap/attribute flags. Maybe all can or can not
be passed inline in SQE, but I have no real visibility about the space
requirement of future users.

I like ext_cap, if not in the current form / API, then as a user
hint - quick map of what meta types are passed.

--
Pavel Begunkov




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux