Re: [PATCH v9 6/9] i3c: master: Add driver for Cadence IP

Boris Brezillon <boris.brezillon@xxxxxxxxxxx> · Fri, 26 Oct 2018 14:46:47 +0200

On Fri, 26 Oct 2018 12:01:52 +0200
Arnd Bergmann <arnd@xxxxxxxx> wrote:

> On Fri, Oct 26, 2018 at 9:57 AM Boris Brezillon
> <boris.brezillon@xxxxxxxxxxx> wrote:
> > On Fri, 26 Oct 2018 09:43:25 +0200
> > Arnd Bergmann <arnd@xxxxxxxx> wrote:
> >  
> > > On Thu, Oct 25, 2018 at 6:30 PM Boris Brezillon
> > > <boris.brezillon@xxxxxxxxxxx> wrote:  
> > > > On Thu, 25 Oct 2018 18:13:51 +0200 Arnd Bergmann <arnd@xxxxxxxx> wrote:
> > > > On Thu, Oct 25, 2018 at 6:07 PM Boris Brezillon <boris.brezillon@xxxxxxxxxxx> wrote:  
> > > > > > On Thu, 25 Oct 2018 17:30:26 +0200  
> > > Ok. Is i3c_master_send_ccc_cmd_locked() what implements the public
> > > interfaces then, or is this something else?  
> >
> > i3c_master_send_ccc_cmd_locked() calls master->ops->send_ccc_cmd(), so
> > it's part of the master controller interface.
> >  
> > >
> > > If you place a buffer on the stack, it is not DMA capable, but
> > > it is guaranteed to be at least 32-bit word aligned, and should
> > > not cause an exception in readsl(), unless it starts with a couple of
> > > (not multiple of four)  extra bytes that are not sent to the devices.
> > > Is that what happens here?  
> >
> > Here is the report I received from Vitor:
> >
> > "
> >         Hi Boris,
> >
> >
> >         I'm trying this new patch-set version but I get some issues when use
> >         readsl() function.
> >
> >         Basically the system complain about memory alignment.
> >  
> 
> >         > +static int i3c_master_getpid_locked(struct i3c_master_controller *master,
> >         > +                                 struct i3c_device_info *info)
> >         > +{
> >         > +     struct i3c_ccc_getpid getpid;  
> >
> >         at this point the getpid struct it is already unaligned with
> >
> >         i3c_master_getpid_locked:1129 getpid_add=0x9a249c7a
> >  
> >         > +     struct i3c_ccc_cmd_dest dest = {
> >         > +             .addr = info->dyn_addr,
> >         > +             .payload.len = sizeof(struct i3c_ccc_getpid),
> >         > +             .payload.data = &getpid,
> >         > +     };  
> 
> >         > +}
> >         > +  
> >
> >         and them when
> >
> >         static void dw_i3c_master_read_rx_fifo(struct dw_i3c_master *master,
> >                                 u8 *bytes, int nbytes)
> >         {
> >              readsl(master->regs + RX_TX_DATA_PORT, bytes, nbytes / 4);
> >         ...
> >         }  
> 
> Ok, I spent an hour chasing the ARM implementation and finding
> no way this could go wrong here. I see that 'struct i3c_ccc_getpid'
> may be misaligned on the stack (it normally won't be), and that
> the ARM readsl() has a lot of extra code to handle unaligned
> output.

I didn't have this problem on xtensa either.

> However, the dump that Vitor reports
> 
> >         [ECR   ]: 0x00230400 => Misaligned r/w from 0x9a249c7a
> >         [EFA   ]: 0x9a249c7a
> >        [BLINK ]: dw_i3c_master_irq_handler+0x200/0x2fc [dw_i3c_master]  
> 
> Is from an arch/arc kernel that uses asm-generic/io.h, and
> that stores the output using a u32 pointer:
> 
> static inline void readsl(const volatile void __iomem *addr, void *buffer,
>                           unsigned int count)
> {
>         if (count) {
>                 u32 *buf = buffer;
> 
>                 do {
>                         u32 x = __raw_readl(addr);
>                         *buf++ = x;
>                 } while (--count);
>         }
> }
> 
> This is apparently not allowed on ARC when 'buffer' is
> unaligned. I think what we need here is to use
> put_unaligned() instead of the pointer dereference.
> For architectures that can do unaligned accesses,
> the result is the same, but for ARC it will fix the problem.

Okay, so writesl()/readsl() should deal with unaligned pointers, and
default implementations should be fixed. I guess you'll send a patch to
use put/get_unaligned().

> 
> > > One way to address this might be to always bounce any
> > > messages that are less than a cache line through a
> > > (pre-)kmallocated buffer, and require any longer messages
> > > to be cache capable. This could also solve the issue with
> > > readsl(), but it would be a rather confusing user interface.
> > >
> > > Another option might be to have separate interfaces for
> > > "short" and "long" messages at the API level and have
> > > distinct rules for those: short would always be bounced
> > > by the i3c code, and long puts restrictions on the buffer
> > > location.  
> >
> > Hm, let's keep the API simple. I'll just mandate that all payload bufs
> > passed to i3c_master_send_ccc_cmd_locked() be dynamically allocated.  
> 
> Ok. What about i2c commands sent to the same i3c controller
> then?

Still not taken care of.

> Do we need to copy those to satisfy the requirements
> of the i3c layer?

I guess we should. The question is, should we do that unconditionally
or should we try to optimize thins with something like:

	if (!virt_addr_valid(xfer->buf) ||
	    object_is_on_stack(xfer->buf))
		/* Alloc bounce buf. */
	else
		/* Use provided buf. */