On 07/30/2010 07:56 PM, Chandra Seetharaman wrote:
On Fri, 2010-07-30 at 12:12 -0600, Moger, Babu wrote:
Yes, We can do that. Problem is I am hitting the issue with BUG_ON
in get_rdac_data which is there in the beginning of rdac_activate.
I do not understand the problem.
If the BUG_ON on get_rdac_data() is being triggered, that means
sdev->scsi_dh_data is NULL, if that is the case, how can you access
sdev->scsi_dh_data->kref in scsi_dh_activate (in patch 2/6) ? Wouldn't
it trip a oops ?
Test case is deleting both active and passive paths almost together during
the multipath testing. Looks like DM picked up the active path failure
first. Then failed the active path and started scheduling activate_path to
failover to passive path. Passive path is also about to go down pretty soon.
When the control was in scsi_dh_activate, Looks like scsi_dh_data was still
valid because I did not see panic here. But scsi_dh_data became NULL when
control went to rdac_activate. That is when I hit the bug on.
kernel BUG at /usr/src/packages/BUILD/lsi-scsi_dh_rdac-01.00/obj/default/scsi_dh_rdac.c:232!
RIP: 0010: rdac_activate+0x257/0x387 [scsi_dh_rdac]
My understanding is someone triggered scsi_dh->detach for passive path during
this small window. Only way I could see problem go away is holding reference
counts between these calls. Did i miss anything here? See the code snippet below..
So, basically, the new patch just reduces the window of race. IOW, if it
just spins for few seconds (or preempted) just before calling the
kref_get() in the new patch, it will generate an oops as scsi_dh_data
would be NULL.
Looks like you have to create a lock to protect scsi_dh_data and then
call kref_get() with the protection of lock.
Second that.. Sounds like each kref_get/kref_put .. needs a lock protection not just this scenario.
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel