Re: stgt: how to avoid WARNING at block/blk-core.c:1080

Joe Eykholt <jeykholt@xxxxxxxxx> · Tue, 03 Aug 2010 17:23:32 -0700

On 8/3/10 12:40 PM, Joe Eykholt wrote:
On 7/31/10 6:49 PM, FUJITA Tomonori wrote:
On Fri, 30 Jul 2010 10:32:34 -0700
Joe Eykholt<jeykholt@xxxxxxxxx> wrote:

Hi All,

I'm working on a target module for scsi_tgt to work with libfc / fcoe
in Linux 2.6.33.5. I'm missing some step in handling completion of
a request.

After each Inquiry or READ operation, in the transfer_response
callback I send the data and response and then call the (*done)()
routine.

I figure I must be missing something, because I get this WARNING:

[ 271.521650] ------------[ cut here ]------------
[ 271.521658] WARNING: at block/blk-core.c:1080
__blk_put_request+0x4f/0xbe()
[ 271.521660] Hardware name:<snip>
[ 271.521662] Modules linked in: libfc_tgt scsi_tgt nfs lockd nfs_acl
auth_rpcgss autofs4 sunrpc ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput
fnic libfcoe libfc enic i2c_i801
iTCO_wdt serio_raw scsi_transport_fc iTCO_vendor_support pcspkr
e1000e shpchp radeon ttm drm_kms_helper drm
i2c_algo_bit i2c_core
[ 271.521705] Pid: 2769, comm: scsi_tgtd/2 Tainted: G W 2.6.33.5-ftgt #9
[ 271.521708] Call Trace:
[ 271.521715] [<ffffffff811696ae>] ? __blk_put_request+0x4f/0xbe
[ 271.521720] [<ffffffff81041462>] warn_slowpath_common+0x77/0xa4
[ 271.521724] [<ffffffff8104149e>] warn_slowpath_null+0xf/0x11
[ 271.521728] [<ffffffff811696ae>] __blk_put_request+0x4f/0xbe
[ 271.521735] [<ffffffffa02b14b7>] scsi_host_put_command+0x52/0x80
[scsi_tgt]
[ 271.521740] [<ffffffffa02b14e5>] ? scsi_tgt_cmd_destroy+0x0/0x3f
[scsi_tgt]
[ 271.521745] [<ffffffffa02b151f>] scsi_tgt_cmd_destroy+0x3a/0x3f
[scsi_tgt]
[ 271.521751] [<ffffffff810513a0>] worker_thread+0x131/0x1bd
[ 271.521756] [<ffffffff81054bfc>] ? autoremove_wake_function+0x0/0x38
[ 271.521760] [<ffffffff8105126f>] ? worker_thread+0x0/0x1bd
[ 271.521764] [<ffffffff810547fe>] kthread+0x7d/0x85
[ 271.521770] [<ffffffff810099a4>] kernel_thread_helper+0x4/0x10
[ 271.521774] [<ffffffff81054781>] ? kthread+0x0/0x85
[ 271.521779] [<ffffffff810099a0>] ? kernel_thread_helper+0x0/0x10
[ 271.521782] ---[ end trace 096013af4d0a82a0 ]---

It's this code in block/blk-core.c:

void __blk_put_request(struct request_queue *q, struct request *req)
{
if (unlikely(!q))
return;
if (unlikely(--req->ref_count))
return;

elv_completed_request(q, req);

/* this is a bio leak */
WARN_ON(req->bio != NULL);

...

So, what should I be doing to tell the block layer that the request
is done?
Or, is there something that scsi_tgt should be doing to disassociate
the bio
from the req?

Maybe we should call blk_end_request_all().

But I don't think that bio is leaked in our case. So setting req->bio
to NULL might be fine.

I can't access to my IBM POWER box now but I'll try later.

I tried it with my module (not with ibmvstgt) and it seems to work better.
Here's the patch I used (cut-and-paste from stg so for illustration only).
Feel free to do something different. If setting rq->bio = NULL is OK,
it might be better. I haven't researched this enough to know what's best.

Author: Joe Eykholt <jeykholt@xxxxxxxxx>
Date: Tue Aug 3 12:33:07 2010 -0700

stgt: fix warning from __blk_put_request()

Every ftgt I/O ending was getting a WARNING and stack dump
from block/blk-core:1080 __blk_put_request() because a request
still had a bio associated with it.

Call blk_end_request_all() after unmapping the user pages.

Signed-off-by: Joe Eykholt <jeykholt@xxxxxxxxx>

diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 1030327..a454ed0 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -151,6 +151,7 @@ void scsi_host_put_command(struct Scsi_Host *shost,
struct scsi_cmnd *cmd)
kmem_cache_free(scsi_tgt_cmd_cache, tcmd);

spin_lock_irqsave(q->queue_lock, flags);
+ __blk_end_request_all(rq, 0);
__blk_put_request(q, rq);
spin_unlock_irqrestore(q->queue_lock, flags);


Although that seemed to work on 2.6.35, when I tried it on
the fcoe-next tree, which is 2.6.35-rc3, it caused a crash due to
blk_update_request referencing a freed bio.  So, maybe just setting
rq->bio to NULL is correct.  If we freed the bio, we should do that
at the same place.   Just wanted to let you know that there's a
problem with the patch at this point.

      Joe

--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html