On 11/17/2017 12:01 PM, Adam Wallis wrote: > On 11/17/2017 10:57 AM, Dan Williams wrote: >> On Fri, Nov 17, 2017 at 7:28 AM, Adam Wallis <awallis@xxxxxxxxxxxxxx> wrote: >>> On 11/17/2017 10:12 AM, Dan Williams wrote: >>>> On Fri, Nov 17, 2017 at 6:11 AM, Adam Wallis <awallis@xxxxxxxxxxxxxx> wrote: >>>>> Commit adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>>>> introduced a bug (that is in fact documented by the patch commit text) >>>>> that leaves behind a dangling pointer. Since the done_wait structure is >>>>> allocated on the stack, future invocations to the DMATEST can produce >>>>> undesirable results (e.g., corrupted spinlocks). >>>>> >>>>> Commit a9df21e34b42 ("dmaengine: dmatest: warn user when dma test times >>>>> out") attempted to WARN the user that the stack was likely corrupted but >>>>> did not fix the actual issue. >>>>> >>>>> This patch fixes the issue by pushing the wait queue and callback >>>>> structs into the the thread structure. If a failure occurs due to time, >>>>> dmaengine_terminate_all will force the callback to safely call >>>>> wake_up_all() without possibility of using a freed pointer. >>>>> >>>>> Cc: stable@xxxxxxxxxxxxxxx # 4.13.x: a9df21e: dmatest: Warn User >>>>> Cc: stable@xxxxxxxxxxxxxxx # 4.13.x >>>>> Cc: stable@xxxxxxxxxxxxxxx # 4.14.x >>>> >>> >>> Sure - do you want me to remove them? I was just following the instructions on >>> stable. >> >> It's not broken, just a note for next time. >> >>> >>>> You don't need 3 cc stables, you don't even need the "# >>>> kernel-version". Since you have the "Fixes:" line the target kernel(s) >>>> for the backport can be auto-determined. I should go update >>>> Documentation/process/stable-kernel-rules.rst to mention this. >>>> >>>>> Bug: https://bugzilla.kernel.org/show_bug.cgi?id=197605 >>>>> Fixes: adfa543e7314 ("dmatest: don't use set_freezable_with_signal()") >>>>> Reviewed-by: Sinan Kaya <okaya@xxxxxxxxxxxxxx> >>>>> Suggested-by: Shunyong Yang <shunyong.yang@xxxxxxxxxxxxxxxx> >>>>> Signed-off-by: Adam Wallis <awallis@xxxxxxxxxxxxxx> >>>>> --- >>>>> changes from v2: Added "Fixes" tag >>>>> changes from v1: Added pre-req patches for stable >>>>> >>>>> drivers/dma/dmatest.c | 37 ++++++++++++++++--------------------- >>>>> 1 file changed, 16 insertions(+), 21 deletions(-) >>>>> >>>>> diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c >>>>> index 47edc7f..2573b6c 100644 >>>>> --- a/drivers/dma/dmatest.c >>>>> +++ b/drivers/dma/dmatest.c >>>>> @@ -155,6 +155,12 @@ struct dmatest_params { >>>>> #define PATTERN_COUNT_MASK 0x1f >>>>> #define PATTERN_MEMSET_IDX 0x01 >>>>> >>>>> +/* poor man's completion - we want to use wait_event_freezable() on it */ >>>>> +struct dmatest_done { >>>>> + bool done; >>>>> + wait_queue_head_t *wait; >>>>> +}; >>>>> + >>>>> struct dmatest_thread { >>>>> struct list_head node; >>>>> struct dmatest_info *info; >>>>> @@ -165,6 +171,8 @@ struct dmatest_thread { >>>>> u8 **dsts; >>>>> u8 **udsts; >>>>> enum dma_transaction_type type; >>>>> + wait_queue_head_t done_wait; >>>> >>>> Why are we defining a waitquehead per thread vs defining one globally >>>> for the whole module with "static DECLARE_WAIT_QUEUE_HEAD(x);"? >>> >>> This is how the original dmatest functions. Each thread had a wait queue that it >>> created so that it could go to sleep while the DMA transfer occurred. Each >>> thread is dependent on its own DMA transaction for the wakeup call. Again, this >>> is how the test originally worked. I just moved the wait queue from the stack >>> (which was getting corrupted) to the thread context to allow for safe cleanup. >>> In other words, I haven't really changed how the test works...just fixing a bug >>> with the current implementation. >> >> Ok, always takes me a bit to re-orient myself to this file since I >> only look at it once a year. >> >> This fix seems incomplete. The next test iteration after a timeout >> will now reuse the per-thread 'done' notification. If the engine that >> timed out still completes its dma it will collide with the next >> operation that is using the same 'done' variable. So it seems to me >> that the wait_queue_head should be global, and the 'done' variable >> should be either allocated per-operation or we should call >> dmaengine_terminate_all() after a timeout. Since not all engines >> implement a terminate I think the potential memory leak of a few >> 'done' variables is a better option. >> > Dan > An important part of my patch was severed in this v3 submission. My apologies. > > There is a change that addresses, I believe, your concern that was in v2 > > /* terminate all transfers on specified channels */ > - if (ret) > + if (ret || failed_tests) > dmaengine_terminate_all(chan); > > Will clean up again, retest, and resubmit. Thanks for your patience and instruction. Dan, I thought the patch was truncated, but it's all there in V3. I should have finished my coffee before responding. You are absolutely right that in the timed out case that dmaengine_terminate_all(chan) should be called, and that change is in fact already included in this patch set @@ -789,7 +782,7 @@ static int dmatest_func(void *data) dmatest_KBs(runtime, total_len), ret); /* terminate all transfers on specified channels */ - if (ret) + if (ret || failed_tests) dmaengine_terminate_all(chan); Would you prefer that I add a better description in the commit text to address the fact this was in fact added? > > Adam > -- Adam Wallis Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project. -- To unsubscribe from this list: send the line "unsubscribe dmaengine" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html