> -----Original Message----- > From: Shyam Iyer [mailto:shyam_iyer@xxxxxxxx] > Sent: Friday, July 30, 2010 7:22 PM > To: sekharan@xxxxxxxxxx > Cc: Moger, Babu; device-mapper development; linux-scsi@xxxxxxxxxxxxxxx; > Qi, Yanling; Chauhan, Vijay; Dachepalli, Sudhir; Stankey, Robert > Subject: Re: [PATCH 0/6] scsi_dh : Couple of fixes for scsi device > handlers > > On 07/30/2010 07:56 PM, Chandra Seetharaman wrote: > > On Fri, 2010-07-30 at 12:12 -0600, Moger, Babu wrote: > > > > > >>>> Yes, We can do that. Problem is I am hitting the issue with > BUG_ON > >>>> in get_rdac_data which is there in the beginning of rdac_activate. > >>>> > >>> I do not understand the problem. > >>> > >>> If the BUG_ON on get_rdac_data() is being triggered, that means > >>> sdev->scsi_dh_data is NULL, if that is the case, how can you access > >>> sdev->scsi_dh_data->kref in scsi_dh_activate (in patch 2/6) ? Wouldn't > >>> it trip a oops ? > >>> > >> Test case is deleting both active and passive paths almost together > during > >> the multipath testing. Looks like DM picked up the active path > failure > >> first. Then failed the active path and started scheduling activate_path > to > >> failover to passive path. Passive path is also about to go down pretty > soon. > >> When the control was in scsi_dh_activate, Looks like scsi_dh_data was > still > >> valid because I did not see panic here. But scsi_dh_data became NULL > when > >> control went to rdac_activate. That is when I hit the bug on. > >> > >> kernel BUG at /usr/src/packages/BUILD/lsi-scsi_dh_rdac- > 01.00/obj/default/scsi_dh_rdac.c:232! > >> RIP: 0010: rdac_activate+0x257/0x387 [scsi_dh_rdac] > >> > >> My understanding is someone triggered scsi_dh->detach for passive path > during > >> this small window. Only way I could see problem go away is holding > reference > >> counts between these calls. Did i miss anything here? See the code > snippet below.. > >> > > So, basically, the new patch just reduces the window of race. IOW, if it > > just spins for few seconds (or preempted) just before calling the > > kref_get() in the new patch, it will generate an oops as scsi_dh_data > > would be NULL. That is correct. > > > > Looks like you have to create a lock to protect scsi_dh_data and then > > call kref_get() with the protection of lock. > > > > > Second that.. Sounds like each kref_get/kref_put .. needs a lock > protection not just this scenario. I agree. I will attempt one more set of patches. This time I will try to match get and put sequence in a one file. Please feel free to comment. ��.n��������+%������w��{.n�����{������ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f