On 28 August 2011 13:11, Russell King - ARM Linux <linux@xxxxxxxxxxxxxxxx> wrote: > On Sun, Aug 28, 2011 at 12:50:54PM +0200, Per Forlin wrote: >> On 26 August 2011 18:28, Santosh <santosh.shilimkar@xxxxxx> wrote: >> > + Balaji, >> > >> > On Friday 26 August 2011 09:49 PM, Russell King - ARM Linux wrote: >> >> >> >> I'm not sure who's responsible for this, but the nonblocking MMC stuff is >> >> broken on OMAPs HSMMC: >> >> >> >> mmcblk0: error -84 transferring data, sector 149201, nr 64, cmd response >> >> 0x900, card status 0xb00 >> >> mmcblk0: retrying using single block read >> >> ------------[ cut here ]------------ >> >> WARNING: at /home/rmk/git/linux-2.6-rmk/lib/dma-debug.c:811 >> >> check_unmap+0x1ac/0x764() >> >> omap_hsmmc omap_hsmmc.0: DMA-API: device driver tries to free DMA memory >> >> it has not allocated [device address=0x0000000080933000] [size=20480 bytes] >> >> Modules linked in: >> >> Backtrace: >> >> [<c0017874>] (dump_backtrace+0x0/0x10c) from [<c02ce8ac>] >> >> (dump_stack+0x18/0x1c) >> >> r7:c1abfcb8 r6:c0186248 r5:c037de51 r4:0000032b >> >> [<c02ce894>] (dump_stack+0x0/0x1c) from [<c0039ed4>] >> >> (warn_slowpath_common+0x58/0x70) >> >> [<c0039e7c>] (warn_slowpath_common+0x0/0x70) from [<c0039f90>] >> >> (warn_slowpath_fmt+0x38/0x40) >> >> r8:c1abfd50 r7:00000000 r6:00005000 r5:00000000 r4:80933000 >> >> [<c0039f58>] (warn_slowpath_fmt+0x0/0x40) from [<c0186248>] >> >> (check_unmap+0x1ac/0x764) >> >> r3:c0367d55 r2:c037e24d >> >> [<c018609c>] (check_unmap+0x0/0x764) from [<c0186978>] >> >> (debug_dma_unmap_sg+0x100/0x134) >> >> [<c0186878>] (debug_dma_unmap_sg+0x0/0x134) from [<c0019770>] >> >> (dma_unmap_sg+0x24/0x7c) >> >> [<c001974c>] (dma_unmap_sg+0x0/0x7c) from [<c0207220>] >> >> (omap_hsmmc_post_req+0x48/0x54) >> >> [<c02071d8>] (omap_hsmmc_post_req+0x0/0x54) from [<c01fb644>] >> >> (mmc_start_req+0x9c/0x128) >> >> r4:c1a76000 >> >> [<c01fb5a8>] (mmc_start_req+0x0/0x128) from [<c02049fc>] >> >> (mmc_blk_issue_rw_rq+0x80/0x460) >> >> r8:c1ab5c00 r7:00000001 r6:c1ab5824 r5:c1ab5824 r4:c1ab5c00 >> >> [<c020497c>] (mmc_blk_issue_rw_rq+0x0/0x460) from [<c02050a8>] >> >> (mmc_blk_issue_rq+0x2cc/0x2fc) >> >> [<c0204ddc>] (mmc_blk_issue_rq+0x0/0x2fc) from [<c02056c4>] >> >> (mmc_queue_thread+0xa0/0x104) >> >> [<c0205624>] (mmc_queue_thread+0x0/0x104) from [<c0054f8c>] >> >> (kthread+0x88/0x90) >> >> [<c0054f04>] (kthread+0x0/0x90) from [<c003d9a4>] (do_exit+0x0/0x618) >> >> r7:00000013 r6:c003d9a4 r5:c0054f04 r4:c1a7bc7c >> >> ---[ end trace 3314ad56daf5d14f ]--- >> >> >> >> Luckily thinks continue to work, but clearly releasing DMA mappings which >> >> drivers don't own is bad news. Unfortunately, I don't have the bandwidth >> >> to be able to investigate this at present - well, I could do but then I'd >> >> have to drop working on the OMAP4 vs generic suspend/resume code and learn >> >> about something I've no current clue about. >> >> >> > Please continue your help on generic suspend. >> > >> >> Can someone please investigate and fix whatever is broken. >> >> >> > Will find somebody to look at this issue. >> > >> I had a look at this and my guess is that the same mapped DMA memory >> is unmapped twice. First the memory is unmapped in >> omap_hsmmc_dma_cleanup() due to the mmc error, then later in >> omap_hsmmc_post_req(). Here is a patch to resolve that. I haven't had >> the chance to test it yet though. >> >> --- >> drivers/mmc/host/omap_hsmmc.c | 7 +++++-- >> 1 files changed, 5 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/mmc/host/omap_hsmmc.c b/drivers/mmc/host/omap_hsmmc.c >> index 21e4a79..7f40767 100644 >> --- a/drivers/mmc/host/omap_hsmmc.c >> +++ b/drivers/mmc/host/omap_hsmmc.c >> @@ -1011,6 +1011,7 @@ static void omap_hsmmc_dma_cleanup(struct >> omap_hsmmc_host *host, int errno) >> host->data->sg_len, >> omap_hsmmc_get_dma_dir(host, host->data)); >> omap_free_dma(dma_ch); >> + data->host_cookie = 0; >> } >> host->data = NULL; >> } >> @@ -1576,8 +1577,10 @@ static void omap_hsmmc_post_req(struct mmc_host >> *mmc, struct mmc_request *mrq, >> struct mmc_data *data = mrq->data; >> >> if (host->use_dma) { >> - dma_unmap_sg(mmc_dev(host->mmc), data->sg, data->sg_len, >> - omap_hsmmc_get_dma_dir(host, data)); >> + if (data->host_cookie) >> + dma_unmap_sg(mmc_dev(host->mmc), data->sg, >> + data->sg_len, >> + omap_hsmmc_get_dma_dir(host, data)); >> data->host_cookie = 0; >> >> >> I also checked the mmci code for this same issue and mmci doesn't have >> the same bug. >> MMCI works fine in 3.1-rc2 even though the code is "wrong". I propose >> this change. >> @@ -529,7 +529,7 @@ static void mmci_post_request(struct mmc_host >> *mmc, struct mmc_request *mrq, >> if (chan) { >> if (err) >> dmaengine_terminate_all(chan); >> - if (err || data->host_cookie) >> + if (data->host_cookie) >> >> I'll post these patches as soon as I have managed to test them or >> sooner if I get a tested-by. > > Looking at MMCI, are you sure all those DMA engine terminate calls are > correct? > I should probably do so together with adding better descriptions on what happens and what the intention is. > For example, why are we terminating all DMA engine transfers in the > post request function? The reason for calling DMA terminate here is to handle the case of cleaning up an MMC request that has only been prepared in mmci_pre_request(), and not been passed to mmci_request(). The if (err) is only true if that scenario has happened. The variable err may need to be better defined. I can think of three states: 1. err == no error 2. err == mmc request failed 3. err == mmc request is prepared but not started Currently "err" is only set if #3 occurs. The parts to clean up in post_request() if #3 happens (err == true): in mmci: DMA mapped memory and prepared DMA descriptors in omap_hsmmc: DMA mapped memory Now when I think more about I need to check the DMA drivers more closely if dma_terminate_all() will free resources for prepared but not submitted descriptors. I will have to add this support to the dma drivers if it's missing. > of by the driver before it told the core that the request was completed. > If the post request function is called for the previous request, meanwhile > the next request is in flight, terminating the DMA engine transfer will > cause invalid state in the driver. > > Basically, this looks completely buggered. I think you need to review > your changes with a thought to what happens logically with DMA and the > MMC host drivers whenever an error occurs. > I agree, I need to look more closely on the error handling. Regards, Per -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html