On Tue, Mar 05, 2019 at 03:29:26PM +0500, Embedded Engineer wrote: > On Tue, Mar 5, 2019 at 3:07 PM Russell King - ARM Linux admin > <linux@xxxxxxxxxxxxxxx> wrote: > > > > Please apply this patch so we can see the (ptrval) values. Thanks. > > Please find below logs after applying patch: > > https://pastebin.com/6TaBxPX5 Hm... so looks like what you're getting here is the error spew from the DMA pool debug code in mm/dmax_pool.c. The way I understand it is that that will initialize the memory for each page allocated from the pool with the POOL_POISON_FREED (0xa7) (see pool_alloc_page()) and then upon adding the page to the pool list, it'll store the offset to page->offset field and check the contents of the page. The contents of the page then don't match the expected poison. The dump of the corrupted memory is somewhat confusing because the values that don't match the poison are actually expected, at least partially. From my reading of the DMA pool code, the first four bytes store the offset of the DMA block into the physical memory page. However, given the size of the hexdump, it looks like the pool was allocated with a block size of 64 bytes, which matches the code in drivers/usb/chipidea/udc.c that allocates the "ci_hw_qh" pool. What's strange here, though, is that the offset that's stored to the first four bytes of a block seems to actually be stored twice per block. The first offset seems to be correct, since it's apparently used to find the offset of the next block to allocate. If you look at the first corrupted hexdump: [ 1.327553] tegra-udc 7d000000.usb: dma_pool_alloc ci_hw_qh, ec056080 (corrupted) [ 1.335058] 00000000: c0 00 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ [ 1.343077] 00000010: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ [ 1.351095] 00000020: 80 00 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ [ 1.359113] 00000030: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ This is the entry for the block at offset 0x00000080 and the offset for the next block is 0x000000c0, which is exactly 64 bytes after the current block. However, if you then look at the second offset that's stored at offset 0x00000020 in the block, it's 0x00000080, which does match the offset of the current block, but I think that may just be coincidence. The same coincidence happens for the second corrupted block: [ 1.367210] tegra-udc 7d000000.usb: dma_pool_alloc ci_hw_qh, ec056140 (corrupted) [ 1.374709] 00000000: 80 01 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ [ 1.382727] 00000010: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ [ 1.390744] 00000020: 40 01 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 @............... [ 1.398760] 00000030: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ But not for the third: [ 1.406965] tegra-udc 7d000000.usb: dma_pool_alloc ci_hw_qh, ec0561c0 (corrupted) [ 1.414466] 00000000: 00 02 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ [ 1.422483] 00000010: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ [ 1.430502] 00000020: 40 03 00 00 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 @............... [ 1.438519] 00000030: a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 a7 ................ The fact that we see the offset stored at offset 0x20 in each block makes me think there's perhaps some sort of aliasing happening here. But I'm not sure how the system would even boot this far if aliasing was really the problem. Things should be falling apart much sooner if that's really what's going on here. However, this sort of aliasing is not something that your typical memory test will catch, so it could explain why they aren't reporting any errors. Thierry
Attachment:
signature.asc
Description: PGP signature