On Monday 22 June 2015 01:38 PM, Alexey Brodkin wrote: > Hi all, > > On Wed, 2015-06-17 at 07:03 +0000, Vineet Gupta wrote: > +CC linux-arch, linux-mm, Arnd and Marek > > On Tuesday 16 June 2015 11:11 PM, Alexey Brodkin wrote: > > Current implementtion of descriptor init procedure only takes care about > ownership flag. While it is perfectly possible to have underlying memory > filled with garbage on boot or driver installation. > > And randomly set flags in non-zeroed des0 and des1 fields may lead to > unpredictable behavior of the GMAC DMA block. > > Solution to this problem is as simple as explicit zeroing of both des0 > and des1 fields of all buffer descriptors. > > Signed-off-by: Alexey Brodkin <abrodkin@xxxxxxxxxxxx><mailto:abrodkin@xxxxxxxxxxxx> > Cc: Giuseppe Cavallaro <peppe.cavallaro@xxxxxx><mailto:peppe.cavallaro@xxxxxx> > Cc: arc-linux-dev@xxxxxxxxxxxx<mailto:arc-linux-dev@xxxxxxxxxxxx> > Cc: linux-kernel@xxxxxxxxxxxxxxx<mailto:linux-kernel@xxxxxxxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx<mailto:stable@xxxxxxxxxxxxxxx> > > FWIW, this was causing sporadic/random networking flakiness on ARC SDP platform (scheduled for upstream inclusion in next window) > > This also leads to an interesting question - should arch/*/dma_alloc_coherent() and friends unconditionally zero out memory (vs. the current semantics of letting only doing it based on gfp, as requested by driver). This is the second instance we ran into stale descriptor memory, the first one was in dw_mmc driver which was recently fixed in upstream as well (although debugged independently by Alexey and using the upstream fix) > > http://www.spinics.net/lists/linux-mmc/msg31600.html > > The pros is better out of box experience (despite buggy drivers) while the cons are they remain broken and perhaps increased boot time due to extra memzero.... > > Probably if we already have dma_zalloc_coherent() that does explicit zeroing of returned memory then there's no need to do implicit zeroing in dma_alloc_coherent()? The question is, when drivers don't have dma_zalloc_coherent() - meaning they don't pass __GFP_ZERO, which causes these random issues, do we need to be more conservative in arch code (ARC at least is) or do we need to debug and fix these drivers - one by one. FWIW, ARC needs to fix __GFP_ZERO case, since we are doing memzero twice. -Vineet -- To unsubscribe from this list: send the line "unsubscribe stable" in