Greg. Going back further to 3.16 or 3.18 looks like a lot of work and I have low confidence of generating correct code. There are changes like 3ef28e83ab15799742e55fd13243a5f678b04242 (from 4.3) which changed the locking from blk_mq_queue_enter to blk_queue_enter. I'm going to stand down here. Sorry about this. Giuliano. On Tue, 7 Apr 2020 at 17:31, Giuliano Procida <gprocida@xxxxxxxxxx> wrote: > > Hi Greg. > > On Fri, 3 Apr 2020 at 23:30, Giuliano Procida <gprocida@xxxxxxxxxx> wrote: > > > > Hi Greg. > > > > I also have 4.14 and 4.9, I'll send them on for comparison. > > I've done this. > > > I will try 4.4 but, as one call site doesn't exist and the other > > didn't have any locking to start with, I'd like to try to reproduce > > the issue first. > > I have failed to build a bootable 4.4 kernel which is surprising / > embarrassing, as my current toolchain (even after working around > various known issues) compiles kernels that either panic or > triple-fault (apparently, as there's no log output, just a reboot) on > my amd64 hardware. Running an old live distribution with a 4.4 kernel, > I wasn't able to reproduce the issue apparently resolved by these > fixes after several hours of running. > > I've also spent most of 2 days looking at unfamiliar code. > > The code in 4.4 uses a timer instead of a workqueue for timeout > callbacks. The callbacks have also have blk_queue_enter/exit > protection in 4.9 but not 4.4. I'm guessing, but don't know, that the > execution contexts are sufficiently similar between timers and > workqueues that this protection should be back-ported to 4.4. This is > relatively simple, it's bits of a couple of extra commits. > > f5bbbbe4d635 adds to blk_mq_queue_tag_busy_iter an RCU-protected test > to see if the blk_queue is held before doing any work. It also adds > RCU synchronisation to code that manipulates the number of hardware > queues. The follow-up 530ca2c9bd more sensibly just (possibly > recursively) does try-to-enter/exit instead. 4.4 doesn't have code > that manipulates the number of hardware queues. However, the > blk_mq_queue_tag_busy_iter locking may be enough to prevent > ioctl/procfs concurrency. > > To this end, I've put together patches for 4.4. They are completely > untested. Once I've verified they actually compile I'll send them on. > > Giuliano. > > > I should have some spare time for this soon. > > > > Giuilano. > > > > On Fri, 3 Apr 2020 at 10:20, Greg KH <greg@xxxxxxxxx> wrote: > > > > > > On Wed, Apr 01, 2020 at 05:47:02PM +0000, Giuliano Procida wrote: > > > > This issue was found in 4.14 and is present in earlier kernels. > > > > > > > > Please backport > > > > > > > > f5bbbbe4d635 blk-mq: sync the update nr_hw_queues with > > > > blk_mq_queue_tag_busy_iter > > > > 530ca2c9bd69 blk-mq: Allow blocking queue tag iter callbacks > > > > > > > > onto the stable branches that don't have these. The second is a fix > > > > for the first. Thank you. > > > > > > > > 4.19.y and later - commits already present > > > > 4.14.y - f5bbbbe4d635 doesn't patch cleanly but it's still > > > > straightforward, just drop the comment and code mentioning switching > > > > to 'none' in the trailing context > > > > 4.9.y - ditto > > > > 4.4.y - there was a refactoring of the code in commit > > > > 0bf6cd5b9531bcc29c0a5e504b6ce2984c6fd8d8 making this non-trivial > > > > 3.16.y - ditto > > > > > > > > I am happy to try to produce clean patches, but it may be a day or so. > > > > > > I have done this for 4.14.y and 4.9.y, can you please provide a backport > > > for 4.4.y that I can queue up? > > > > > > thanks, > > > > > > greg k-h