On Tue, Jul 3, 2018 at 8:53 PM, James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> wrote: > On Tue, 2018-07-03 at 22:49 +0900, David Miller wrote: >> From: Sreekanth Reddy <sreekanth.reddy@xxxxxxxxxxxx> >> Date: Tue, 3 Jul 2018 17:48:49 +0530 >> >> > Any suggestion/update over my previous mail. I am using 4.13 >> kernel. >> >> I think the issue is that if you are reading a 32-bit word and then >> interpreting it as a struct full of individual bytes, you have to >> order the bytes in the structure appropriately for the cpu >> endianness. > > This is undoubtedly it. The point being if you read from a structure > using readX, you have to read every element at its correct length for > the endian swaps to work. You can't do a readq on 2 32 bit words and > expect the endianness to be correct (you'll find they come out in the > wrong order). > > I think you're using a shared (device and cpu) memory mapped structured > data with a doorbell register, [Sreekanth] Yes it is correct, we are using a shared memory mapped structured data with a doorbell register. In this particular handshake function, HBA will send only 16 bit data at a time. So driver has to read 16 bit data at a time in a loop until complete reply message is read. > which is pretty identical to how the > qla1280 does it. We went through several iterations of fixing that > driver for big endian but finally settled on putting __le annotations > on all the structures and doing cpu_to_leX() swaps as we wrote them > (and obviously leX_to_cpu() swaps to read them), meaning the structure > in memory is always correct for the device. Then we used a writeX to > poke the doorbell and the device just picked up the correct > information. > > The rule you want to be following is: memory mapped structure, you're > responsible for annotation and swapping; readX/writeX to correctly > sized data, the API will swap for you. > > So, can we just revert the original patch which is clearly now a > regression and try to get this fixed in the merge window? [Sreekanth] Yes we can revert the original patch. > I think the > actual bug is simply you're missing __leX annotations on the shared > memory mapped structure to fix sparse, but otherwise everything is > working. Actually our memory mapped structure variables are declared with __leX annotations as shown below, typedef struct _MPI2_DEFAULT_REPLY { U16 FunctionDependent1; /*0x00 */ U8 MsgLength; /*0x02 */ U8 Function; /*0x03 */ U16 FunctionDependent2; /*0x04 */ U8 FunctionDependent3; /*0x06 */ U8 MsgFlags; /*0x07 */ U8 VP_ID; /*0x08 */ U8 VF_ID; /*0x09 */ U16 Reserved1; /*0x0A */ U16 FunctionDependent5; /*0x0C */ U16 IOCStatus; /*0x0E */ U32 IOCLogInfo; /*0x10 */ } MPI2_DEFAULT_REPLY, *PTR_MPI2_DEFAULT_REPLY, MPI2DefaultReply_t, *pMPI2DefaultReply_t; and typedef u8 U8; typedef __le16 U16; typedef __le32 U32; typedef __le64 U64 __attribute__ ((aligned(4))); Also I tried replacing readl() API with readw()API (as HBA FW will send 16 bit data at a time) as shown below and still I see same issue, MPI2DefaultReply_t *default_reply = (MPI2DefaultReply_t *)reply; u16 reply1; reply1 = readw(&ioc->chip->Doorbell); reply[1] = reply1; printk("LSI debug.. 0x%x, 0x%x, 0x%x \n", reply1, reply[1]); writel(0, &ioc->chip->HostInterruptStatus); printk("LSI MsgLength :%d\n", default_reply->MsgLength); And I got below output where message length is wrong, when I execute above code on SPARC64 machine, LSI debug.. 0x311, 0x311 LSI MsgLength :3 Thanks, Sreekanth > > James >