On Fri, 26 Oct 2018 12:01:52 +0200 Arnd Bergmann <arnd@xxxxxxxx> wrote: > On Fri, Oct 26, 2018 at 9:57 AM Boris Brezillon > <boris.brezillon@xxxxxxxxxxx> wrote: > > On Fri, 26 Oct 2018 09:43:25 +0200 > > Arnd Bergmann <arnd@xxxxxxxx> wrote: > > > > > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon > > > <boris.brezillon@xxxxxxxxxxx> wrote: > > > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@xxxxxxxx> wrote: > > > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@xxxxxxxxxxx> wrote: > > > > > > On Thu, 25 Oct 2018 17:30:26 +0200 > > > Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public > > > interfaces then, or is this something else? > > > > i3c_master_send_ccc_cmd_locked() calls master->ops->send_ccc_cmd(), so > > it's part of the master controller interface. > > > > > > > > If you place a buffer on the stack, it is not DMA capable, but > > > it is guaranteed to be at least 32-bit word aligned, and should > > > not cause an exception in readsl(), unless it starts with a couple of > > > (not multiple of four) extra bytes that are not sent to the devices. > > > Is that what happens here? > > > > Here is the report I received from Vitor: > > > > " > > Hi Boris, > > > > > > I'm trying this new patch-set version but I get some issues when use > > readsl() function. > > > > Basically the system complain about memory alignment. > > > > > > +static int i3c_master_getpid_locked(struct i3c_master_controller *master, > > > + struct i3c_device_info *info) > > > +{ > > > + struct i3c_ccc_getpid getpid; > > > > at this point the getpid struct it is already unaligned with > > > > i3c_master_getpid_locked:1129 getpid_add=0x9a249c7a > > > > > + struct i3c_ccc_cmd_dest dest = { > > > + .addr = info->dyn_addr, > > > + .payload.len = sizeof(struct i3c_ccc_getpid), > > > + .payload.data = &getpid, > > > + }; > > > > +} > > > + > > > > and them when > > > > static void dw_i3c_master_read_rx_fifo(struct dw_i3c_master *master, > > u8 *bytes, int nbytes) > > { > > readsl(master->regs + RX_TX_DATA_PORT, bytes, nbytes / 4); > > ... > > } > > Ok, I spent an hour chasing the ARM implementation and finding > no way this could go wrong here. I see that 'struct i3c_ccc_getpid' > may be misaligned on the stack (it normally won't be), and that > the ARM readsl() has a lot of extra code to handle unaligned > output. I didn't have this problem on xtensa either. > However, the dump that Vitor reports > > > [ECR ]: 0x00230400 => Misaligned r/w from 0x9a249c7a > > [EFA ]: 0x9a249c7a > > [BLINK ]: dw_i3c_master_irq_handler+0x200/0x2fc [dw_i3c_master] > > Is from an arch/arc kernel that uses asm-generic/io.h, and > that stores the output using a u32 pointer: > > static inline void readsl(const volatile void __iomem *addr, void *buffer, > unsigned int count) > { > if (count) { > u32 *buf = buffer; > > do { > u32 x = __raw_readl(addr); > *buf++ = x; > } while (--count); > } > } > > This is apparently not allowed on ARC when 'buffer' is > unaligned. I think what we need here is to use > put_unaligned() instead of the pointer dereference. > For architectures that can do unaligned accesses, > the result is the same, but for ARC it will fix the problem. Okay, so writesl()/readsl() should deal with unaligned pointers, and default implementations should be fixed. I guess you'll send a patch to use put/get_unaligned(). > > > > One way to address this might be to always bounce any > > > messages that are less than a cache line through a > > > (pre-)kmallocated buffer, and require any longer messages > > > to be cache capable. This could also solve the issue with > > > readsl(), but it would be a rather confusing user interface. > > > > > > Another option might be to have separate interfaces for > > > "short" and "long" messages at the API level and have > > > distinct rules for those: short would always be bounced > > > by the i3c code, and long puts restrictions on the buffer > > > location. > > > > Hm, let's keep the API simple. I'll just mandate that all payload bufs > > passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated. > > Ok. What about i2c commands sent to the same i3c controller > then? Still not taken care of. > Do we need to copy those to satisfy the requirements > of the i3c layer? I guess we should. The question is, should we do that unconditionally or should we try to optimize thins with something like: if (!virt_addr_valid(xfer->buf) || object_is_on_stack(xfer->buf)) /* Alloc bounce buf. */ else /* Use provided buf. */