Re: [PATCH v17 44/45] sg: add blk_poll support

Douglas Gilbert <dgilbert@xxxxxxxxxxxx> · Fri, 9 Apr 2021 02:00:52 -0400

On 2021-04-08 12:28 p.m., Douglas Gilbert wrote:
On 2021-04-08 4:14 a.m., Hannes Reinecke wrote:
On 4/8/21 3:45 AM, Douglas Gilbert wrote:
<snip>

+/*
+ * If the sg_request object is not inflight, return -ENODATA. This function
+ * returns 1 if the given object was in inflight state and is in await_rcv
+ * state after blk_poll() returns 1 or more. If blk_poll() fails, then that
+ * (negative) value is returned. Otherwise returns 0. Note that blk_poll()
+ * may complete unrelated requests that share the same q and cookie.
+ */
+static int
+sg_srp_q_blk_poll(struct sg_request *srp, struct request_queue *q, int 
loop_count)
+{
+    int k, n, num;
+
+    num = (loop_count < 1) ? 1 : loop_count;
+    for (k = 0; k < num; ++k) {
+        if (atomic_read(&srp->rq_st) != SG_RS_INFLIGHT)
+            return -ENODATA;
+        n = blk_poll(q, srp->cookie, loop_count < 0 /* spin if negative */);
+        if (n > 0)
+            return atomic_read(&srp->rq_st) == SG_RS_AWAIT_RCV;

That _technically_ is a race condition;
the first atomic_read() call might return a different value than the second one.
And this arguably is a mis-use of atomic _counters_, as here they are just 
used to store fixed values, not counters per se.
Why not use 'READ/WRITE_ONCE' ?

Here is what I found in testing:

         thread 1                     thread 2
         [want sr1p compl.]          [want sr2p compl.]
        ===============================================
         sr1p INFLIGHT                sr2p INFLIGHT
         blk_poll(cookie)
             -> 1 (sr2p -> AWAIT)
         sr1p not in AWAIT
             so return 0
                                      blk_poll(cookie)
                                        ->1 (sr1p -> AWAIT)
                                      sr2p is in AWAIT
                                         so return 1
          [called again]
          sr1p not INFLIGHT
             so returns -ENODATA

Assuming the caller in thread 1 was sg_wait_event_srp() then
an -ENODATA return is interpreted as found one (and a check is
done for the AWAIT state and if not -EPROTO is returned to the
user).

So both threads have found their requests have been completed
so all is well programmatically. But blk_poll(), which becomes
mq_poll() to the LLD, found completions other than what the sg
driver (or other ULD) was looking for since both requests were
on the same (hardware) queue.

Whenever blk_poll() returns > 0, I believe the question of a before
and after race is moot. That is because blk_poll() has done a lot
of work when it returns > 0 including the possibility of changing
the state of the current request (in the current thread). It has:
   - visited the LLD which has completed at least one outstanding
     request on given q/cookie
   - told the block layer it has completed 1 or more requests
   - the block layer completion code:
       - calls the scsi mid-level completion code
          - calls the scsi ULD completion code

 From my testing without that recently added line:
     return atomic_read(&srp->rq_st) == SG_RS_AWAIT_RCV;

then test code with a single thread won't fail, two threads will seldom
fail (by incorrectly believing its srp has completed just because
blk_poll() completed _something_ in the window it was looking in).
And the failure rate will increase with the number of threads
running.

Stepping back from the details, this is all new stuff (i.e. iopoll/
blk_poll in the scsi subsystem). The patches it depends on by Kashyap
Desai haven't hit the mainline yet. My testing is based on a patch
in the scsi_debug driver modelling the way I think it will work. My
megaraid hardware is from the previous century (or close to it) so
I doubt that I can test it on real hardware in the short term.

Plus I believe iopoll/blk_poll will work async (or at least I haven't
found a reason why not) but all the blk_poll() code I have found
in the kernel is only sync.

Somewhat related to this discussion I have found a new issue that
I would like to fix. So I plan to have a version 18 . Plus I have
been changing http to https in the various links to my documentation.

Doug Gilbert