Hello! > From: Sergei Shtylyov, Sent: Monday, July 20, 2020 4:20 AM > > Hello! > > Sorry about another late reply, was having h/w issues on the new work... No problem! :) Thank you for your reply! > On 07/06/2020 12:25 PM, Yoshihiro Shimoda wrote: <snip> > >> Maybe we should just fix those blind assumptions? > > > > Maybe I should have described some facts instead of assumptions like below? > > If so, I should modify the code too. > > > > After ravb_stop_dma() was called, the driver assumed any transfers were > > stopped. However, the current ravb_tx_timeout_work() doesn't check whether > > the ravb_stop_dma() is succeeded without any error or not. So, we should > > fix it. > > Yes. Better a stuck TX queue (with a chance to recover) than kernel oops... I got it. <snip> > >> Well, I was thinking of polling TCCR and CSR like the current > >> ravb_stop_dma() does, but if that works... > > > > I'm not sure whether polling TCCR and CSR is enough or not. > > Instead of polling those registers, maybe we should poll whether > > ravb_stop_dma() is succeeded or not? > > Yes, if by polling you mean just checking the result of it. :-) Yes, I intend to check the result of it :) > > Especially, result of ravb_config() is > > a key point whether the hardware is really stopped or not. > > So, I'm thinking that just polling the ravb_stop_dma() in > > ravb_tx_timeout_work() is better than the per-queue tear-down and > > re-init now. But, what do you think? > > I don't think it's better since we're now supposed to handle a per-queue > TX timeout (still not sure it's possible with this h/w). But of course, it's > better as it's simple enough for a bug fix. Thank you for your comment. Yes, I agree it's better to handle a per-queue TX timeout. However, I think we need refactoring for it. So, I'd like to fix a bug by simple code. > >>> < Prepare new descriptors and start again > > >>> 4. Prepare new descriptors. > >> > >> That's where the cause for using the workqueue lies -- the descriptors are > >> allocated with GFP_KERNEL, not GFP_ATOMIC... > > > > IIUC, we can avoid to use the workqueue if re-allocation is not really necessary. > > > >> if you have time/desire to > >> untangle all this, I'd appreciate it; else I'd have to work on this in my > >> copious free time... :-) > > > > If we don't need refactoring, I think I can do it :) > > Let's go forward with the simple fix (assuming it fixes the original oops). Thank you for your comment! I'll do that. Best regards, Yoshihiro Shimoda > [...] > > MBR, Sergei