On Tuesday, May 17, 2016 04:50:48 PM John Youn wrote: > On 5/14/2016 6:11 AM, Christian Lamparter wrote: > > On Thursday, May 12, 2016 11:40:28 AM John Youn wrote: > >> On 5/12/2016 6:30 AM, Christian Lamparter wrote: > >>> On Thursday, May 12, 2016 01:55:44 PM Arnd Bergmann wrote: > >>>> On Thursday 12 May 2016 11:58:18 Christian Lamparter wrote: > >>>>>>>> Detecting the endianess of the > >>>>>>>> device is probably the best future-proof solution, but it's also > >>>>>>>> considerably more work to do in the driver, and comes with a > >>>>>>>> tiny runtime overhead. > >>>>>>> > >>>>>>> The runtime overhead is probably non-measurable compared with the cost > >>>>>>> of the actual MMIOs. > >>>>>> > >>>>>> Right. The code size increase is probably measurable (but still small), > >>>>>> the runtime overhead is not. > >>>>> > >>>>> Ok, so no rebuts or complains have been posted. > >>>>> > >>>>> I've tested the patch you made in: https://lkml.org/lkml/2016/5/9/354 > >>>>> and it works: > >>>>> > >>>>> Tested-by: Christian Lamparter <chunkeey@xxxxxxxxxxxxxx> > >>>>> > >>>>> So, how do we go from here? There is are two small issues with the > >>>>> original patch (#ifdef DWC2_LOG_WRITES got converted to lower case: > >>>>> #ifdef dwc2_log_writes) and I guess a proper subject would be nice. > >>>>> > >>>>> Arnd, can you please respin and post it (cc'd stable as well)? > >>>>> So this is can be picked up? Or what's your plan? > >>>> > >>>> (I just realized my reply was stuck in my outbox, so the patch > >>>> went out first) > >>>> > >>>> If I recall correctly, the rough consensus was to go with your longer > >>>> patch in the future (fixed up for the comments that BenH and > >>>> I sent), and I'd suggest basing it on top of a fixed version of > >>>> my patch. > >>> Well, but it comes with the "overhead"! So this was just as I said: > >>> "Let's look at it and see if it's any good"... And I think it isn't > >>> since the usb/host/ehci people also opted for #ifdef CONFIG_BIG_ENDIAN > >>> archs etc... > >> > >> I slightly prefer the more general patch for future kernel versions. > >> The overhead will probably be negligible, but we can perform some > >> testing to make sure. > >> > >> Can you resubmit with all gathered feedback? > > > > Yes, here are the changes. > > > > I've tested it on my MyBook Live Duo. The usbotg comes right up: > > [12610.540004] dwc2 4bff80000.usbotg: USB bus 1 deregistered > > [12612.513934] dwc2 4bff80000.usbotg: Specified GNPTXFDEP=1024 > 256 > > [12612.518756] dwc2 4bff80000.usbotg: EPs: 3, shared fifos, 2042 entries in SPRAM > > [12612.530112] dwc2 4bff80000.usbotg: DWC OTG Controller > > [12612.533948] dwc2 4bff80000.usbotg: new USB bus registered, assigned bus number 1 > > [12612.540083] dwc2 4bff80000.usbotg: irq 33, io mem 0x00000000 > > > > John: Can you run some perf test with it? > > > > I've based this on: > > > > commit 6ea2fffc9057a67df1994d85a7c085d899eaa25a > > Author: Arnd Bergmann <arnd@xxxxxxxx> > > Date: Fri May 13 15:52:27 2016 +0200 > > > > usb: dwc2: fix regression on big-endian PowerPC/ARM systems > > > > so naturally, it needs to be applied first. > > Most of the conversion work was done by the attached > > coccinelle semantic patches. > > > > I had to edit the __bic32 and __orr32 helpers by hand. > > As well as some debugfs code and stuff in gadget.c. > > > > Thanks Christian. > > I'll keep this in our internal tree and send it to Felipe later. This > causes a bunch of conflicts that I have to fix up and I should do a > bit of testing as well. > > And since there is a patch that fixes the regression this is can wait. > > Regards, > John --- Hey, that's really nice of you to do that :-D. Please keep me in the loop (Cc) for those then. Yes, this needs definitely testing on all the affected ARCHs. I've attached a diff to a updated version of the patch. It drops the special MIPS case (as requested by Arnd). BTW, I looked into the ioread32_rep and iowrite32_rep again. I'm not entirely convinced that the hardware FIFOs are actually endian neutral. But I can't verify it since my Western Digital My Book Live only supports the host configuration (forces host mode), so I don't know what a device in dual-mode or peripheral do here. The reason why I think it was broken is because there's a PIO copy to and from the HCFIFO(x) in dwc2_hc_write_packet and dwc2_hc_read_packet access in the hcd.c file as well... And there, the code was using the dwc2_readl and dwc2_writel to access the data. I added special accessors for the FIFOS now: dwc2_readl_rep and dwc2_writel_rep. I went all the way and implemented the helpers to do unaligned access if necessary (not sure if adding likely branches is a good idea, as this could be either always true or false for a specific driver the whole time). NB: it also fixes a "regs variable not used in dwc2_hsotg_dump" warning if DEBUG isn't selected. NB2: If it you need a patch against a specific tree, please let me know. --- diff --git a/drivers/usb/dwc2/core.h b/drivers/usb/dwc2/core.h index 2fa57cd..69030bb 100644 --- a/drivers/usb/dwc2/core.h +++ b/drivers/usb/dwc2/core.h @@ -42,6 +42,7 @@ #include <linux/usb/gadget.h> #include <linux/usb/otg.h> #include <linux/usb/phy.h> +#include <asm/unaligned.h> #include "hw.h" /* @@ -958,50 +959,6 @@ enum dwc2_halt_status { DWC2_HC_XFER_URB_DEQUEUE, }; -#ifdef CONFIG_MIPS -/* - * There are some MIPS machines that can run in either big-endian - * or little-endian mode and that use the dwc2 register without - * a byteswap in both ways. - * Unlike other architectures, MIPS apparently does not require a - * barrier before the __raw_writel() to synchronize with DMA but does - * require the barrier after the __raw_writel() to serialize a set of - * writes. This set of operations was added specifically for MIPS and - * should only be used there. - */ -static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg, - ptrdiff_t reg) -{ - const void __iomem *addr = hsotg->regs + reg; - u32 value = __raw_readl(addr); - - /* - * In order to preserve endianness __raw_* operation is used. Therefore - * a barrier is needed to ensure IO access is not re-ordered across - * reads or writes - */ - mb(); - return value; -} - -static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value, - ptrdiff_t reg) -{ - const void __iomem *addr = hsotg->regs + reg; - __raw_writel(value, addr); - - /* - * In order to preserve endianness __raw_* operation is used. Therefore - * a barrier is needed to ensure IO access is not re-ordered across - * reads or writes - */ - mb(); -#ifdef DWC2_LOG_WRITES - pr_info("INFO:: wrote %08x to %p\n", value, addr); -#endif -} -#else -/* Normal architectures just use readl/write_be */ static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg, ptrdiff_t reg) { @@ -1014,7 +971,8 @@ static inline u32 dwc2_readl(struct dwc2_hsotg *hsotg, } -static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value, +static inline void dwc2_writel(struct dwc2_hsotg *hsotg, + const u32 value, ptrdiff_t reg) { void __iomem *addr = hsotg->regs + reg; @@ -1028,7 +986,103 @@ static inline void dwc2_writel(struct dwc2_hsotg *hsotg, u32 value, pr_info("info:: wrote %08x to %p\n", value, addr); #endif } -#endif + +static inline void dwc2_readl_rep(struct dwc2_hsotg *hsotg, + ptrdiff_t reg, + u32 *buf, const size_t len) +{ + void __iomem *addr = hsotg->regs + reg; + size_t i, remaining = len & ~4; + + if (hsotg->is_big_endian) { + if (likely(IS_ALIGNED(*buf, 0x4))) { + for (i = len >> 2; i > 0; i--) + *buf++ = ioread32be(addr); + } else { + /* xfer_buf is not DWORD aligned */ + for (i = len >> 2; i > 0; i--) { + u32 data = ioread32be(addr); + + put_unaligned(data, buf); + buf++; + } + } + } else { + /* little-endian accessors */ + if (likely(IS_ALIGNED(*buf, 0x4))) { + for (i = len >> 2; i > 0; i--) + *buf++ = ioread32(addr); + + } else { + /* xfer_buf is not DWORD aligned */ + for (i = len >> 2; i > 0; i--) { + u32 data = ioread32be(addr); + + put_unaligned(data, buf); + buf++; + } + } + } + + if (unlikely(remaining)) { + u32 data_u32; + u8 *buf_u8 = (u8 *) buf; + u8 *data_u8 = (u8 *) &data_u32; + + data_u32 = dwc2_readl(hsotg, reg); + + while (remaining--) + *buf_u8++ = *data_u8++; + } +} + +static inline void dwc2_writel_rep(struct dwc2_hsotg *hsotg, + const ptrdiff_t reg, + const u32 *buf, const size_t len) +{ + void __iomem *addr = hsotg->regs + reg; + size_t i, remaining = len & ~4; + + if (hsotg->is_big_endian) { + if (likely(IS_ALIGNED(*buf, 0x4))) { + for (i = len >> 2; i > 0; i--) + iowrite32be(*buf++, addr); + } else { + /* xfer_buf is not DWORD aligned */ + for (i = len >> 2; i > 0; i--) { + u32 data = get_unaligned(buf); + + iowrite32be(data, addr); + buf++; + } + } + } else { + /* little-endian accessors */ + if (likely(IS_ALIGNED(*buf, 0x4))) { + for (i = len >> 2; i > 0; i--) + iowrite32(*buf++, addr); + } else { + /* xfer_buf is not DWORD aligned */ + for (i = len >> 2; i > 0; i--) { + u32 data = get_unaligned(buf); + + iowrite32(data, addr); + buf++; + } + } + } + + if (unlikely(remaining)) { + u32 data_u32; + u8 *buf_u8 = (u8 *) buf; + u8 *data_u8 = (u8 *) &data_u32; + + while (remaining--) + *data_u8++ = *buf_u8++; + + dwc2_writel(hsotg, data_u32, reg); + } +} extern int dwc2_detect_endiannes(struct dwc2_hsotg *hsotg); diff --git a/drivers/usb/dwc2/gadget.c b/drivers/usb/dwc2/gadget.c index 2c687d9..531b30f 100644 --- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -317,7 +317,7 @@ static int dwc2_hsotg_write_fifo(struct dwc2_hsotg *hsotg, u32 gnptxsts = dwc2_readl(hsotg, GNPTXSTS); int buf_pos = hs_req->req.actual; int to_write = hs_ep->size_loaded; - void *data; + u32 *data; int can_write; int pkt_round; int max_transfer; @@ -457,10 +457,9 @@ static int dwc2_hsotg_write_fifo(struct dwc2_hsotg *hsotg, if (periodic) hs_ep->fifo_load += to_write; - to_write = DIV_ROUND_UP(to_write, 4); data = hs_req->req.buf + buf_pos; - iowrite32_rep(hsotg->regs + EPFIFO(hs_ep->index), data, to_write); + dwc2_writel_rep(hsotg, EPFIFO(hs_ep->index), data, to_write); return (to_write >= can_write) ? -ENOSPC : 0; } @@ -1439,12 +1438,11 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size) { struct dwc2_hsotg_ep *hs_ep = hsotg->eps_out[ep_idx]; struct dwc2_hsotg_req *hs_req = hs_ep->req; - void __iomem *fifo = hsotg->regs + EPFIFO(ep_idx); + u32 *data; int to_read; int max_req; int read_ptr; - if (!hs_req) { u32 epctl = dwc2_readl(hsotg, DOEPCTL(ep_idx)); int ptr; @@ -1455,7 +1453,7 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size) /* dump the data from the FIFO, we've nothing we can do */ for (ptr = 0; ptr < size; ptr += 4) - (void)dwc2_readl(hsotg, EPFIFO(ep_idx)); + (void)__raw_readl(hsotg->regs + EPFIFO(ep_idx)); return; } @@ -1479,13 +1477,14 @@ static void dwc2_hsotg_rx_data(struct dwc2_hsotg *hsotg, int ep_idx, int size) hs_ep->total_data += to_read; hs_req->req.actual += to_read; - to_read = DIV_ROUND_UP(to_read, 4); /* * note, we might over-write the buffer end by 3 bytes depending on * alignment of the data. */ - ioread32_rep(fifo, hs_req->req.buf + read_ptr, to_read); + data = hs_req->req.buf + read_ptr; + + dwc2_readl_rep(hsotg, EPFIFO(ep_idx), data, to_read); } /** @@ -3411,7 +3416,6 @@ static void dwc2_hsotg_dump(struct dwc2_hsotg *hsotg) { #ifdef DEBUG struct device *dev = hsotg->dev; - void __iomem *regs = hsotg->regs; u32 val; int idx; diff --git a/drivers/usb/dwc2/hcd.c b/drivers/usb/dwc2/hcd.c index dcd6338..8568ff4 100644 --- a/drivers/usb/dwc2/hcd.c +++ b/drivers/usb/dwc2/hcd.c @@ -567,19 +567,10 @@ u32 dwc2_calc_frame_interval(struct dwc2_hsotg *hsotg) void dwc2_read_packet(struct dwc2_hsotg *hsotg, u8 *dest, u16 bytes) { u32 *data_buf = (u32 *)dest; - int word_count = (bytes + 3) / 4; - int i; - - /* - * Todo: Account for the case where dest is not dword aligned. This - * requires reading data from the FIFO into a u32 temp buffer, then - * moving it into the data buffer. - */ dev_vdbg(hsotg->dev, "%s(%p,%p,%d)\n", __func__, hsotg, dest, bytes); - for (i = 0; i < word_count; i++, data_buf++) - *data_buf = dwc2_readl(hsotg, HCFIFO(0)); + dwc2_readl_rep(hsotg, HCFIFO(0), data_buf, bytes); } /** @@ -1236,10 +1227,8 @@ static void dwc2_set_pid_isoc(struct dwc2_host_chan *chan) static void dwc2_hc_write_packet(struct dwc2_hsotg *hsotg, struct dwc2_host_chan *chan) { - u32 i; u32 remaining_count; u32 byte_count; - u32 dword_count; u32 *data_buf = (u32 *)chan->xfer_buf; if (dbg_hc(chan)) @@ -1251,20 +1240,7 @@ static void dwc2_hc_write_packet(struct dwc2_hsotg *hsotg, else byte_count = remaining_count; - dword_count = (byte_count + 3) / 4; - - if (((unsigned long)data_buf & 0x3) == 0) { - /* xfer_buf is DWORD aligned */ - for (i = 0; i < dword_count; i++, data_buf++) - dwc2_writel(hsotg, *data_buf, HCFIFO(chan->hc_num)); - } else { - /* xfer_buf is not DWORD aligned */ - for (i = 0; i < dword_count; i++, data_buf++) { - u32 data = data_buf[0] | data_buf[1] << 8 | - data_buf[2] << 16 | data_buf[3] << 24; - dwc2_writel(hsotg, data, HCFIFO(chan->hc_num)); - } - } + dwc2_writel_rep(hsotg, HCFIFO(chan->hc_num), data_buf, byte_count); chan->xfer_count += byte_count; chan->xfer_buf += byte_count;