Re: [PATCH] mpt3sas: Fix for regression caused due to cf6bf9710c patch

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Tue, 03 Jul 2018 08:23:04 -0700

On Tue, 2018-07-03 at 22:49 +0900, David Miller wrote:
> From: Sreekanth Reddy <sreekanth.reddy@xxxxxxxxxxxx>
> Date: Tue, 3 Jul 2018 17:48:49 +0530
> 
> > Any suggestion/update over my previous mail. I am using 4.13
> kernel.
> 
> I think the issue is that if you are reading a 32-bit word and then
> interpreting it as a struct full of individual bytes, you have to
> order the bytes in the structure appropriately for the cpu
> endianness.

This is undoubtedly it.  The point being if you read from a structure
using readX, you have to read every element at its correct length for
the endian swaps to work.  You can't do a readq on 2 32 bit words and
expect the endianness to be correct (you'll find they come out in the
wrong order).

I think you're using a shared (device and cpu) memory mapped structured
data with a doorbell register, which is pretty identical to how the
qla1280 does it.  We went through several iterations of fixing that
driver for big endian but finally settled on putting __le annotations
on all the structures and doing cpu_to_leX() swaps as we wrote them
(and obviously leX_to_cpu() swaps to read them), meaning the structure
in memory is always correct for the device.  Then we used a writeX to
poke the doorbell and the device just picked up the correct
information.

The rule you want to be following is: memory mapped structure, you're
responsible for annotation and swapping; readX/writeX to correctly
sized data, the API will swap for you.

So, can we just revert the original patch which is clearly now a
regression and try to get this fixed in the merge window?  I think the
actual bug is simply you're missing __leX annotations on the shared
memory mapped structure to fix sparse, but otherwise everything is
working.

James