Re: [PATCH 04/14] cxlflash: Avoid command room violation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for catching this Matt. Looking into this. Will send out a V2.

On 11/17/2016 1:36 PM, Matthew R. Ochs wrote:
Hi Uma,

I do see a potential hang issue with this patch. See my comments below.


-matt

On Nov 15, 2016, at 5:14 PM, Uma Krishnan <ukrishn@xxxxxxxxxxxxxxxxxx> wrote:

During test, a command room violation interrupt is occasionally seen
for the master context when the CXL flash devices are stressed.

After studying the code, there could be gaps in the way command room
value is being cached in cxlflash. When the cached command room is zero
the thread attempting to send becomes burdened with updating the cached
value with the actual value from the AFU. Today, this is handled with
an atomic set operation of the raw value read. Following the atomic
update, the thread proceeds to send.

This behavior is incorrect on two counts:

  - The update fails to take into account the current thread and its
    consumption of one of the hardware commands.

  - The update does not take into account other threads also atomically
    updating. Per design, a worker thread updates the cached value when
    a send thread times out. By not performing an atomic compare/exchange,
    the cached value can be incorrectly clobbered.

To correct these issues, the runtime updates of the cached command room
are updated to use atomic64_cmpxchg() and the send routine is updated to
take into account the current thread consuming a hardware command.

Signed-off-by: Uma Krishnan <ukrishn@xxxxxxxxxxxxxxxxxx>
---
drivers/scsi/cxlflash/main.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/drivers/scsi/cxlflash/main.c b/drivers/scsi/cxlflash/main.c
index 6d33d8c..1a32e8b 100644
--- a/drivers/scsi/cxlflash/main.c
+++ b/drivers/scsi/cxlflash/main.c
@@ -322,9 +322,10 @@ static int send_cmd(struct afu *afu, struct afu_cmd *cmd)
	if (!newval) {

When this path is invoked, the current thread is consuming the last entry
available entry before the room must be read again. While the change
below is fine for circumstances where the hardware queue has room for
more than one command, consider a scenario where the queue has room
for only 1 command (the command that you just consumed via the atomic
but are not really consuming with a MMIO due to the revised goto).

In such a scenario this code would loop endlessly, bypassing the timeout
logic completely, until the read room reflected a value greater than 1.

		do {
			room = readq_be(&afu->host_map->cmd_room);
-			atomic64_set(&afu->room, room);
-			if (room)
-				goto write_ioarrin;
+			if (room) {
+				atomic64_cmpxchg(&afu->room, 0, room);
+				goto retry;
+			}

If you instead fully consume the entry (goto write_ioarrin - similar as it was
before) and take into account the consumption when you update the cached
value (i.e.: cmpxchg(..., 0, room - 1) the scenario described above will not occur.



--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux