On Sat, Aug 27, 2011 at 11:33:26PM +0800, Ming Lei wrote: > Hi, > > On Sat, Aug 27, 2011 at 11:13 PM, Greg KH <greg@xxxxxxxxx> wrote: > > On Sat, Aug 27, 2011 at 10:48:35PM +0800, ming.lei@xxxxxxxxxxxxx wrote: > >> From: Ming Lei <ming.lei@xxxxxxxxxxxxx> > >> > >> This patch fixs one performance bug on ARM Cortex A9 dual core platform, > >> which has been reported on quite a few ARM machines(OMAP4, Tegra 2, snowball...), > >> see details from link of https://bugs.launchpad.net/bugs/709245. > >> > >> In fact, one mb() on ARM is enough to flush L2 cache, but > >> 'dummy->hw_token = token;' after mb() is added just for obeying > >> correct mb() usage. > > > > Really? A mb() should not be flushing any caches, it's just a memory > > barrier. Or is ARM somehow "special" in this way? > > As Santosh pointed out, mb on ARM will flush L2 write buffer. The > description here is wrong. Then this can't be accepted as-is :) > I think the below should make the writing reach into memory on all > ARCH after ' token = dummy->hw_token;' is executed. > > dummy->hw_token = token; > mb() > token = dummy->hw_token; > > The above is the idea introduced to fix the problem. Are you sure? Have you read the documentation about memory barriers to confirm this? > >> The patch has been tested ok on OMAP4 panda A1 board, the performance > >> of 'dd' over usb mass storage can be increased from 4~5MB/sec to > >> 14~16MB/sec after applying this patch. > > > > That's impressive, but I don't think this is really the proper way to do > > this... > > > >> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxxxxx> > >> --- > >> drivers/usb/host/ehci-q.c | 14 ++++++++++++++ > >> 1 files changed, 14 insertions(+), 0 deletions(-) > >> > >> diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c > >> index 0917e3a..65b5021 100644 > >> --- a/drivers/usb/host/ehci-q.c > >> +++ b/drivers/usb/host/ehci-q.c > >> @@ -1082,6 +1082,20 @@ static struct ehci_qh *qh_append_tds ( > >> wmb (); > >> dummy->hw_token = token; > >> > >> + /* The mb() below is added to make sure that > >> + * 'token' can be writen into qtd, so that ehci > >> + * HC can see the up-to-date qtd descriptor. On > >> + * some archs(at least on ARM Cortex A9 dual core), > >> + * writing into coherenet memory doesn't mean the > >> + * value written can reach physical memory > >> + * immediately, and the value may be buffered > >> + * inside L2 cache. 'dummy->hw_token = token;' > >> + * after mb() is added for obeying correct mb() > >> + * usage. > >> + * */ > >> + mb(); > >> + token = dummy->hw_token; > > > > Your comment does not match the code, so something is wrong here. > > If you mean "L2 cache flush", I confess to the mistaken description, > and will update it later. If you mean others, could you help to point it out? I mean others, please read the the last 3 lines of the comment and compare that to the code lines you added. greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-omap" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html