Re: [PATCH 1/1]: missing call to pg_init_done causes I/O to be hung forever

<Menny_Hamburger@xxxxxxxx> · Tue, 14 Dec 2010 08:38:19 +0000

Hi 

 

Attached is the updated patch
file.

I added testing instructions for
non ISCSI environments.

 

From looking into the
code,  I have a question on a path that may cause the same problems:

In rdac_activate, when
(h->lun_state == RDAC_LUN_UNOWNED) the callback function is also not called.

Could there be a scenario, where
this would have the same effects?

 

Thanks,

Menny

 





From:
dm-devel-bounces@xxxxxxxxxx [mailto:dm-devel-bounces@xxxxxxxxxx] On Behalf Of
Hamburger, Menny

Sent: 14 December, 2010 09:32

To: dm-devel@xxxxxxxxxx

Subject: Re:  [PATCH 1/1]: missing call to pg_init_done causes
I/O to be hung forever





 

Hi,

 

I tried that as one of my
options when I worked on this issue – it works, however it seemed to
general to me back then since it required testing additional areas such as
other H/W handlers and perhaps other md modules. I do not have the required
resources for testing this, however I would gladly send the other version of
the patch.

 

Best Regards,

Menny 

 

 

 





From:
dm-devel-bounces@xxxxxxxxxx [mailto:dm-devel-bounces@xxxxxxxxxx] On Behalf
Of Moger, Babu

Sent: 13 December, 2010 20:03

To: device-mapper development

Subject: Re:  [PATCH 1/1]: missing call to pg_init_done causes
I/O to be hung forever





 

Menny,

   Yes, I agree there is a problem. Wouldn’t it be
simpler if you could handle everything scsi_dh.c..  See my response
below.. 

Thanks

Babu

  











From:
dm-devel-bounces@xxxxxxxxxx [mailto:dm-devel-bounces@xxxxxxxxxx] On Behalf
Of Menny_Hamburger@xxxxxxxx

Sent: Monday, December 13, 2010 9:35 AM

To: dm-devel@xxxxxxxxxx

Subject:  [PATCH 1/1]: missing call to pg_init_done causes I/O
to be hung forever



 

When scsi_dh_activate returns SCSI_DH_NOSYS the H/W
handler callback is not called, pg_init_done is not called in 

the multipath layer and pending I/O is requeued
forever; this situation causes all userland processes currently performing I/O

on the device to I/O hang. A similar situation
occurs when the device has transitioned to SDEV_CANCEL/SDEV_DEL and the device

handler data had not yet been deleted.

 

The easiest way to reproduce this is in an ISCSI
environment:

  dd if=/dev/dm-0 of=/dev/zero bs=8k count=1000000
&

  /etc/init.d/iscsi stop

In this example, dd will I/O hang forever and the
only way to release it will be to reboot the machine

 

This patch calls pg_init_done directly from the
mpath code when the scsi_dh_activate returns a non SCSI_DH_OK error.

 

Note:

The patch is over RHEL5.5.

When running an upstream kernel, the above scenario
may not occur because the request queue is aborted in dm-mpath.c:fail_path.

This patch makes sure the problem does not occur at
all, rather than handling it when it does. In addition, it seems too risky to
apply 

request queue abort functionality on RHEL5 at this
stage.

 

diff -r -U 2 a/drivers/md/dm-mpath.c
b/drivers/md/dm-mpath.c

--- a/drivers/md/dm-mpath.c   2010-12-13
09:16:31.358858000 +0200

+++ b/drivers/md/dm-mpath.c   2010-12-13
09:16:31.796998000 +0200

@@ -1190,4 +1190,5 @@

      case SCSI_DH_OK:

           
break;

+     case SCSI_DH_DEV_OFFLINED:

 

If you are not doing anything
special then I would let default take care of it.  No need of this
change..

 

      case SCSI_DH_NOSYS:

           
if (!m->hw_handler_name) {

@@ -1252,7 +1253,15 @@

 {

      struct pgpath *pgpath
= (struct pgpath *) data;

+     int err;

 

-    
scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),

+     err = scsi_dh_activate(bdev_get_queue(pgpath->path.dev->bdev),

                       
pg_init_done, &pgpath->path);

+

+     /*

+     * If error is not
SCSI_DH_OK, we have not entered the scsi_dh H/W handler and did not call
pg_init_done - 

+     * need to call pg_init_done
directly.

+     */

+     if (err) 

+          
pg_init_done(&pgpath->path, err);

 }

You can move this to scsi_dh.c

 

 

diff -r -U 2 a/drivers/scsi/device_handler/scsi_dh.c
b/drivers/scsi/device_handler/scsi_dh.c

--- a/drivers/scsi/device_handler/scsi_dh.c    
2010-12-13 09:16:31.616554000 +0200

+++
b/drivers/scsi/device_handler/scsi_dh.c     2010-12-13
09:16:31.878170000 +0200

@@ -443,4 +443,9 @@

     
spin_unlock_irqrestore(q->queue_lock, flags);

 

+     if (sdev->sdev_state ==
SDEV_CANCEL ||

+        
sdev->sdev_state == SDEV_DEL ||

+        
sdev->sdev_state == SDEV_OFFLINE)

+          
err = SCSI_DH_DEV_OFFLINED;

+

You can change it something like
below..

      if (err) {

                     
   if(fn)

fn(data, err);

           
return err;

         
}

 

 








Attachment:
missing_pg_init_done.patch

Description: missing_pg_init_done.patch
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel