On 8/3/20 9:02 AM, Arnd Bergmann wrote: > On Mon, Aug 3, 2020 at 5:42 AM Samuel Holland <samuel@xxxxxxxxxxxx> wrote: >> On 7/31/20 2:29 AM, Arnd Bergmann wrote: >>> On Fri, Jul 31, 2020 at 12:07 AM Samuel Holland <samuel@xxxxxxxxxxxx> wrote: >>>> >>>> The main issue observed was at the call to scsi_set_resid, where the >>>> byteswapped parameter would eventually trigger the alignment check at >>>> drivers/scsi/sd.c:2009. At that point, the kernel would continuously >>>> complain about an "Unaligned partial completion", and no further I/O >>>> could occur. >>>> >>>> This gets the controller working on big endian powerpc64. >>>> >>>> Signed-off-by: Samuel Holland <samuel@xxxxxxxxxxxx> >>>> --- >>>> >>>> Changes since v1: >>>> - Include changes to use __le?? types in command structures >>>> - Use an object literal for the intermediate "schedulertime" value >>>> - Use local "error" variable to avoid repeated byte swapping >>>> - Create a local "length" variable to avoid very long lines >>>> - Move byte swapping to TW_REQ_LUN_IN/TW_LUN_OUT to avoid long lines >>>> >>> >>> Looks much better, thanks for the update. I see one more issue here >>>> /* Command Packet */ >>>> typedef struct TW_Command { >>>> - unsigned char opcode__sgloffset; >>>> - unsigned char size; >>>> - unsigned char request_id; >>>> - unsigned char unit__hostid; >>>> + u8 opcode__sgloffset; >>>> + u8 size; >>>> + u8 request_id; >>>> + u8 unit__hostid; >>>> /* Second DWORD */ >>>> - unsigned char status; >>>> - unsigned char flags; >>>> + u8 status; >>>> + u8 flags; >>>> union { >>>> - unsigned short block_count; >>>> - unsigned short parameter_count; >>>> + __le16 block_count; >>>> + __le16 parameter_count; >>>> } byte6_offset; >>>> union { >>>> struct { >>>> - u32 lba; >>>> - TW_SG_Entry sgl[TW_ESCALADE_MAX_SGL_LENGTH]; >>>> - dma_addr_t padding; >>>> + __le32 lba; >>>> + TW_SG_Entry sgl[TW_ESCALADE_MAX_SGL_LENGTH]; >>>> + dma_addr_t padding; >>> >>> >>> The use of dma_addr_t here seems odd, since this is neither endian-safe nor >>> fixed-length. I see you replaced the dma_addr_t in TW_SG_Entry with >>> a variable-length fixed-endian word. I guess there is a chance that this is >>> correct, but it is really confusing. On top of that, it seems that there is >>> implied padding in the structure when built with a 64-bit dma_addr_t >>> on most architectures but not on x86-32 (which uses 32-bit alignment for >>> 64-bit integers). I don't know what the hardware definition is for TW_Command, >>> but ideally this would be expressed using only fixed-endian fixed-length >>> members and explicit padding. >> >> All of the command structures are packed, due to the "#pragma pack(1)" earlier >> in the file. So alignment is not an issue. This dma_addr_t member _is_ the >> explicit padding to make sizeof(TW_Command) - >> sizeof(TW_Command.byte8_offset.{io,param}.sgl) equal TW_COMMAND_SIZE * 4. And >> indeed the structure is expected to be a different size depending on >> sizeof(dma_addr_t). > > Ah, so only the first few members are accessed by hardware and the > last union is only accessed by the OS then? In that case I suppose it is > all fine, but I would also suggest removing the "#pragma packed" > to get somewhat more efficient access on systems that have problems > with misaligned accesses. I don't know what part the hardware accesses; everything I know about the hardware comes from reading the driver. The problem with removing the "#pragma pack(1)" is that the structure is inherently misaligned: byte8_offset.io.sgl starts at offset 12, but it may begin with a __le64. Regards, Samuel