On 11/12/2015 06:03 AM, Sagi Grimberg wrote: > >> The bug is caused by this patch: >> >> 659743b02c411075b26601725947b21df0bb29c8 >> >> which allowed the task lists to be manipulated under different locks >> in the xmit and completion path. >> >> To fix the oops this patch just reverts that patch. It also reverts >> these 2 patches for regressions that were also a result of that patch: > > Whoa now Mike, this is a major change. Can we take a step back and think > about this for a second? The issue has been on the open-iscsi list for a week! You are on the list still right? Or is even ccd on the thread. > > My understanding is that the kfifo circular buffer design allows a > writer (e.g. the producer) and a reader (e.g. the consumer) to avoid > extra locking when accessing the kfifo buffer. > For the next feature window I am working on patch that makes the api easier to use (the cleanup_task locking is bad as can be seen from the bnx2i regression the patch also caused) and also uses kfifos for the fast path. > From include/linux/kfifo.h: > /* > * Note about locking : There is no locking required until only * one > reader > * and one writer is using the fifo and no kfifo_reset() will be * called > * kfifo_reset_out() can be safely used, until it will be only called > * in the reader thread. > * For multiple writer and one reader there is only a need to lock the > writer. > * And vice versa for only one writer and multiple reader there is only > a need > * to lock the reader. > */ > > The patch by Shlomo, implements the locking policy documented above: > - multiple readers: multiple threads entering queuecommand reading the > kfifo (kfifo_out) are mutually excluded by the frwd_lock. > - multiple writers: completion contexts writing to the kfifo (kfifo_in) > are mutually excluded by the back_lock. > > Can you provide a more detailed analysis of why is this list corruption > triggered? What scenario was not honoring the locking policy? Basic locking around a linked list bug. iscsi_queuecommand adds it under the frwd lock and iscsi_complete_task was taking it off the back_lock. > This code has been running reliably in our labs for a long time now > (both iser and tcp). The patch has caused multiple regressions, did not even compile when sent to me, and was poorly reviewed and I have not heard from you guys in a week. Given the issues the patch has had and the current time, I do not feel comfortable with it anymore. I want to re-review it and fix it up when there is more time. -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html