After applying the changes from Fujita, I see that application never
completes when BSG time out happens. Once the BSG request times out,
I see fc_bsg_softirq_done routine destroying the bsg_job but does not
send any response back to the application. The application infinitely
waits for the response with following warning message
Jun 9 17:22:09 elab60 kernel: [ 480.666830]INFO: task sgv4_els:6058
blocked for more than 120 seconds.
Jun 9 17:22:09 elab60 kernel: [ 480.666833] "echo 0 > /proc/sys/
kernel/hung_task_timeout_secs" disables this message.
Jun 9 17:22:09 elab60 kernel: [ 480.666835] sgv4_els D
0000000000000000 0 6058 5993
Jun 9 17:22:09 elab60 kernel: [ 480.666838] ffff88007f173b78
0000000000000082 0000000000000000 ffffffffa003f880
Jun 9 17:22:09 elab60 kernel: [ 480.666842] ffff880001030000
000000000000ff00 000000000000c8b8 ffff88007fbf6990
Jun 9 17:22:09 elab60 kernel: [ 480.666845] ffff88007fbf6c18
00000001a00382cf 00000000ffff3524 ffff88007f93c990
Jun 9 17:22:09 elab60 kernel: [ 480.666848] Call Trace:
Jun 9 17:22:09 elab60 kernel: [ 480.666858] [<ffffffffa00015d2>] ?
fc_bsg_map_buffer+0x2a/0x72 [scsi_transport_fc]
Jun 9 17:22:09 elab60 kernel: [ 480.666864] [<ffffffff8029a2ba>] ?
cache_alloc_debugcheck_after+0x73/0x243
Jun 9 17:22:09 elab60 kernel: [ 480.666868] [<ffffffff80511ebe>]
schedule+0x9/0x1d
Jun 9 17:22:09 elab60 kernel: [ 480.666871] [<ffffffff8051210f>]
schedule_timeout+0x12f/0x164
Jun 9 17:22:09 elab60 kernel: [ 480.666873] [<ffffffff805113f7>]
wait_for_common+0xb8/0x15e
Jun 9 17:22:09 elab60 kernel: [ 480.666878] [<ffffffff80230feb>] ?
default_wake_function+0x0/0xf
Jun 9 17:22:09 elab60 kernel: [ 480.666880] [<ffffffff80511527>]
wait_for_completion+0x18/0x1a
Jun 9 17:22:09 elab60 kernel: [ 480.666884] [<ffffffff80360da6>]
blk_execute_rq+0x7f/0xc9
Jun 9 17:22:09 elab60 kernel: [ 480.666887] [<ffffffff80365c28>]
bsg_ioctl+0x1c0/0x227
Jun 9 17:22:09 elab60 kernel: [ 480.666890] [<ffffffff80514362>] ?
_spin_unlock_irqrestore+0x2b/0x32
Jun 9 17:22:09 elab60 kernel: [ 480.666894] [<ffffffff802adb36>]
vfs_ioctl+0x2a/0x95
Jun 9 17:22:09 elab60 kernel: [ 480.666896] [<ffffffff802adc22>]
do_vfs_ioctl+0x81/0x583
Jun 9 17:22:09 elab60 kernel: [ 480.666898] [<ffffffff80514372>] ?
_spin_unlock+0x9/0xb
Jun 9 17:22:09 elab60 kernel: [ 480.666901] [<ffffffff802ae165>]
sys_ioctl+0x41/0x65
Jun 9 17:22:09 elab60 kernel: [ 480.666904] [<ffffffff8020b26b>]
system_call_fastpath+0x16/0x1b
I see that function blk_end_request_all calls blk_finish_request
routine to complete the response to application. After adding this
call in fc_bsg_softirq_done function, the application gets the
response and completes.
Is this a proper fix? How does block layer request completes when
timeout happens?
/**
* fc_bsg_softirq_done - softirq done routine for destroying the bsg
requests
* @req: BSG request that holds the job to be destroyed
*/
static void fc_bsg_softirq_done(struct request *rq)
{
struct fc_bsg_job *job = rq->special;
unsigned long flags;
spin_lock_irqsave(&job->job_lock, flags);
+ job->state_flags |= FC_RQST_STATE_DONE;
job->ref_cnt--;
spin_unlock_irqrestore(&job->job_lock, flags);
+ blk_end_request_all(rq, rq->errors);
fc_destroy_bsgjob(job);
- Giridhar.M.B
Once the BSG request times
On May 27, 2009, at 11:12 PM, FUJITA Tomonori wrote:
On Thu, 28 May 2009 15:01:21 +0900
FUJITA Tomonori <fujita.tomonori@xxxxxxxxxxxxx> wrote:
CC'ed James Smart,
On Tue, 26 May 2009 11:38:14 -0700
Giridhar Malavali <giridhar.malavali@xxxxxxxxxx> wrote:
Thanks for the pointer. I will check with the post-merge tree.
The crash I am seeing is because of softirq_done_fn not set in the
request queue for BSG request. Even in the post-merge tree I don't
see
FC transport setting this function during the allocation of the
request queue. When BSG request times out, I see that it executes
__blk_complete_request function where check is done for its
existence.
I see this getting set for SCSI request during queue allocation in
scsi_lib.c. Is this required for BSG request?
Yeah, you need to set q->softirq_done_fn if you use the block timeout
infrastructure. The current bsg user, SMP, uses bsg but it doesn't
use
the timeout infrastructure so it doesn't set q->softirq_done_fn.
If q->softirq_done_fn returns BLK_EH_HANDLED, the block layer doesn't
expect that q->softirq_done_fn frees the request (currently,
fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn
for it.
Oops,
If q->rq_timed_out_fn returns BLK_EH_HANDLED, the block layer doesn't
expect that q->rq_timed_out_fn frees the request (currently,
fc_bsg_job_timeout does); The block layer calls q->softirq_done_fn to
clean up the request.
The attached patch works? It just adds q->softirq_done_fn and moves
fc_destroy_bsgjob from fc_bsg_job_timeout to it. fc_bsg_job_timeout
returns BLK_EH_NOT_HANDLED when a job is done since the job will be
finished shortly so we don't want the block layer to do anything for
the job.
It might be better to use q->softirq_done_fn for all the requests not
only for expired requests, as SCSI-ml does, that is, job->job_done
calls blk_complete_request().
diff --git a/drivers/scsi/scsi_transport_fc.c b/drivers/scsi/
scsi_transport_fc.c
index 3f64d93..c58e33a 100644
--- a/drivers/scsi/scsi_transport_fc.c
+++ b/drivers/scsi/scsi_transport_fc.c
@@ -3439,6 +3439,16 @@ fc_bsg_jobdone(struct fc_bsg_job *job)
fc_destroy_bsgjob(job);
}
+static void fc_bsg_softirq_done(struct request *rq)
+{
+ struct fc_bsg_job *job = rq->special;
+ unsigned long flags;
+
+ spin_lock_irqsave(&job->job_lock, flags);
+ job->ref_cnt--;
+ spin_unlock_irqrestore(&job->job_lock, flags);
+ fc_destroy_bsgjob(job);
+}
/**
* fc_bsg_job_timeout - handler for when a bsg request timesout
@@ -3471,19 +3481,12 @@ fc_bsg_job_timeout(struct request *req)
"abort failed with status %d\n", err);
}
- if (!done) {
- spin_lock_irqsave(&job->job_lock, flags);
- job->ref_cnt--;
- spin_unlock_irqrestore(&job->job_lock, flags);
- fc_destroy_bsgjob(job);
- }
-
- /* the blk_end_sync_io() doesn't check the error */
- return BLK_EH_HANDLED;
+ if (done)
+ return BLK_EH_NOT_HANDLED;
+ else
+ return BLK_EH_HANDLED;
}
-
-
static int
fc_bsg_map_buffer(struct fc_bsg_buffer *buf, struct request *req)
{
@@ -3879,6 +3882,7 @@ fc_bsg_hostadd(struct Scsi_Host *shost,
struct fc_host_attrs *fc_host)
q->queuedata = shost;
queue_flag_set_unlocked(QUEUE_FLAG_BIDI, q);
+ blk_queue_softirq_done(q, fc_bsg_softirq_done);
blk_queue_rq_timed_out(q, fc_bsg_job_timeout);
blk_queue_rq_timeout(q, FC_DEFAULT_BSG_TIMEOUT);
--
To unsubscribe from this list: send the line "unsubscribe linux-
scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html