Re: [dm-devel] [RFC 0/3] Add support of iopoll for dm device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/7/20 1:45 AM, Mike Snitzer wrote:
On Thu, Nov 05 2020 at  9:51pm -0500,
JeffleXu <jefflexu@xxxxxxxxxxxxxxxxx> wrote:

On 11/4/20 11:08 PM, Mike Snitzer wrote:
I'm doubted if this should be implemented in block layer like:

```

struct bio {

     ...

     struct list_head  cookies;

};

```

After all it's only used by bio-based queue, or more specifically
only dm device currently.
I do think this line of work really should be handled in block core
because I cannot see any reason why MD or bcache or whatever bio-based
device wouldn't want the ability to better support io_uring (with IO
poll).

Another design I can come up with is to maintain a global data
structure for the very beginning
original bio. Currently the blocking point is that now one original
bio to the dm device (@bio of dm_submit()) can correspond to multiple
dm_io and thus we have nowhere to place the @cookies list.
Yes, and that will always be the case.  We need the design to handle an
arbitrary sprawl of splitting from a given bio.  The graph of bios
resulting from that fan-out needs to be walked at various levels -- be
it the top-level original bio's submit_bio() returned cookie or some
intermediate point in the chain of bios.

The problem is the lifetime of the data structure created for a given
split bio versus layering boundaries (that come from block core's
simplistic recursion via bio using submit_bio).

Now we have to maintain one data structure for every original bio,
something like

```

struct dm_poll_instance {

     ...

     struct list_head cookies;

};

```
I do think we need a hybrid where at the point of recursion we're able
to make the associated data structure available across the recursion
boundary so that modeling the association in a chain of split bios is
possible. (e.g. struct dm_poll_data or dm_poll_instance as you named it,
_but_ that struct definition would live in block core, but would be part
of per-bio-data; so 'struct blk_poll_data' is more logical name when
elevated to block core).

It _might_ be worthwhile to see if a new BIO_ flag could be added to
allow augmenting the bio_split + bio_chain pattern to also track this
additional case of carrying additional data per-bio while creating
bio-chains.  I may not be clear yet, said differently: augmenting
bio_chain to not only chain bios, but to _also_ thread/chain together
per-bio-data that lives within those chained bios.  SO you have the
chain of bios _and_ the chain of potentially opaque void * that happens
to point to a list head for a list of 'struct blk_poll_data'.

Does that make sense?

I'm doubted if it really makes sense to maintain a *complete* bio
chain across the recursive

call boundary.


Considering the following device stack:

```

                                   dm0

         dm1                   dm2           dm3

nvme0  nvme1          ....               ....

```


We have the following bio graph (please let me know if it's wrong or
the image can't display)


For example, for dm1 there are three bios containing valid cookie in
the end, that is

bio 9/10/11. We only need to insert the corresponding blk_poll_data
(containing

request_queue, cookie, etc) of these three bios into the very
beginning original

bio (that is bio0). Of course we can track all the sub-bios down
through the device

stack, layer by layer, e.g.,

- get bio 1/2/3 from bio 0

- get bio 4 from bio 3

- finally get bio 9 from bio 4

But I'm doubted if it's overkill to just implement the iopoll.


Another issue still unclear is that, if we should implement the
iopoll in a recursive way.

In a recursive way, to poll dm 0, we should poll all its
sub-devices, that is, bio 4/5/7/8.

Oppositely we can insert only the bottom bio (bio 9/10/11 which have
valid cookie) at

the very beginning (at submit_bio() phase), and to poll dm 0, we
only need to poll bio 9/10/11.
I feel we need the ability to walk the entire graph and call down to
next level.  The lowest level would be what you call a "valid cookie"
that blk-mq returned.  But the preceding cookies need to be made valid
and used when walking the graph (from highest to lowest) and calling
down to the underlying layers.


To implement this non-recursive design, we can add a field in struct bio

```

struct bio {

     ...

     struct bio *orig;

}

```

@orig points to the original bio inputted into submit_bio(), that is, bio 0.


@orig field is transmitted through bio splitting.

```

bio_split()

     split->orig = bio->orig ? : bio


dm.c: __clone_and_map_data_bio

     clone->orig = bio->orig ? : bio

```


Finally bio 9/10/11 can be inserted into bio 0.

```

blk-mq.c: blk_mq_submit_bio

     if (bio->orig)

         init blk_poll_data and insert it into bio->orig's @cookies list

```
If you feel that is doable: certainly give it a shot.

Make sense.

But it is not clear to me how you intend to translate from cookie passed
in to ->blk_poll hook (returned from submit_bio) to the saved off
bio->orig.

What is your cookie strategy in this non-recursive implementation?  What
will you be returning?  Where will you be storing it?

Actually I think it's a common issue to design the cookie returned by submit_bio() whenever

it's implemented in a recursive or non-recursive way. After all you need to translate the

returned cookie to the original bio even if it's implemented in a recursive way as you

described. Or how could you walk through the bio graph when the returned cookie is

only 'unsigned int' type?


How about this:


```

typedef uintptr_t blk_qc_t;

```


or something like union

```

typedef union {

    unsigned int cookie;

    struct bio *orig; // the original bio of submit_bio()

} blk_qc_t;

```


When serving for blk-mq, the integer part of blk_qc_t is used and it stores the valid cookie,

while it stores a pointer to the original bio when serving bio-based device.


By the way, would you mind sharing your plan and progress on this work, I mean, supporting

iopoll for dm device. To be honest, I don't want to re-invent the wheel as you have started on

this work, but I do want to participate in somehow. Please let me know if there's something

I could do here.


--
Thanks,
Jeffle




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux