Re: stgt: how to avoid WARNING at block/blk-core.c:1080

Joe Eykholt <jeykholt@xxxxxxxxx> · Thu, 05 Aug 2010 12:28:09 -0700

On 8/5/10 1:09 AM, FUJITA Tomonori wrote:
On Tue, 03 Aug 2010 17:23:32 -0700
Joe Eykholt<jeykholt@xxxxxxxxx>  wrote:

On 8/3/10 12:40 PM, Joe Eykholt wrote:
On 7/31/10 6:49 PM, FUJITA Tomonori wrote:
On Fri, 30 Jul 2010 10:32:34 -0700
Joe Eykholt<jeykholt@xxxxxxxxx>  wrote:

Hi All,

I'm working on a target module for scsi_tgt to work with libfc / fcoe
in Linux 2.6.33.5. I'm missing some step in handling completion of
a request.

After each Inquiry or READ operation, in the transfer_response
callback I send the data and response and then call the (*done)()
routine.

I figure I must be missing something, because I get this WARNING:

[ 271.521650] ------------[ cut here ]------------
[ 271.521658] WARNING: at block/blk-core.c:1080
__blk_put_request+0x4f/0xbe()
[ 271.521660] Hardware name:<snip>
[ 271.521662] Modules linked in: libfc_tgt scsi_tgt nfs lockd nfs_acl
auth_rpcgss autofs4 sunrpc ip6t_REJECT
nf_conntrack_ipv6 ip6table_filter ip6_tables ipv6 dm_multipath uinput
fnic libfcoe libfc enic i2c_i801
iTCO_wdt serio_raw scsi_transport_fc iTCO_vendor_support pcspkr
e1000e shpchp radeon ttm drm_kms_helper drm
i2c_algo_bit i2c_core
[ 271.521705] Pid: 2769, comm: scsi_tgtd/2 Tainted: G W 2.6.33.5-ftgt #9
[ 271.521708] Call Trace:
[ 271.521715] [<ffffffff811696ae>] ? __blk_put_request+0x4f/0xbe
[ 271.521720] [<ffffffff81041462>] warn_slowpath_common+0x77/0xa4
[ 271.521724] [<ffffffff8104149e>] warn_slowpath_null+0xf/0x11
[ 271.521728] [<ffffffff811696ae>] __blk_put_request+0x4f/0xbe
[ 271.521735] [<ffffffffa02b14b7>] scsi_host_put_command+0x52/0x80
[scsi_tgt]
[ 271.521740] [<ffffffffa02b14e5>] ? scsi_tgt_cmd_destroy+0x0/0x3f
[scsi_tgt]
[ 271.521745] [<ffffffffa02b151f>] scsi_tgt_cmd_destroy+0x3a/0x3f
[scsi_tgt]
[ 271.521751] [<ffffffff810513a0>] worker_thread+0x131/0x1bd
[ 271.521756] [<ffffffff81054bfc>] ? autoremove_wake_function+0x0/0x38
[ 271.521760] [<ffffffff8105126f>] ? worker_thread+0x0/0x1bd
[ 271.521764] [<ffffffff810547fe>] kthread+0x7d/0x85
[ 271.521770] [<ffffffff810099a4>] kernel_thread_helper+0x4/0x10
[ 271.521774] [<ffffffff81054781>] ? kthread+0x0/0x85
[ 271.521779] [<ffffffff810099a0>] ? kernel_thread_helper+0x0/0x10
[ 271.521782] ---[ end trace 096013af4d0a82a0 ]---

It's this code in block/blk-core.c:

void __blk_put_request(struct request_queue *q, struct request *req)
{
if (unlikely(!q))
return;
if (unlikely(--req->ref_count))
return;

elv_completed_request(q, req);

/* this is a bio leak */
WARN_ON(req->bio != NULL);

...

So, what should I be doing to tell the block layer that the request
is done?
Or, is there something that scsi_tgt should be doing to disassociate
the bio
from the req?

Maybe we should call blk_end_request_all().

But I don't think that bio is leaked in our case. So setting req->bio
to NULL might be fine.

I can't access to my IBM POWER box now but I'll try later.

I tried it with my module (not with ibmvstgt) and it seems to work better.
Here's the patch I used (cut-and-paste from stg so for illustration only).
Feel free to do something different. If setting rq->bio = NULL is OK,
it might be better. I haven't researched this enough to know what's best.

Author: Joe Eykholt<jeykholt@xxxxxxxxx>
Date: Tue Aug 3 12:33:07 2010 -0700

stgt: fix warning from __blk_put_request()

Every ftgt I/O ending was getting a WARNING and stack dump
from block/blk-core:1080 __blk_put_request() because a request
still had a bio associated with it.

Call blk_end_request_all() after unmapping the user pages.

Signed-off-by: Joe Eykholt<jeykholt@xxxxxxxxx>

diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 1030327..a454ed0 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -151,6 +151,7 @@ void scsi_host_put_command(struct Scsi_Host *shost,
struct scsi_cmnd *cmd)
kmem_cache_free(scsi_tgt_cmd_cache, tcmd);

spin_lock_irqsave(q->queue_lock, flags);
+ __blk_end_request_all(rq, 0);
__blk_put_request(q, rq);
spin_unlock_irqrestore(q->queue_lock, flags);


Although that seemed to work on 2.6.35, when I tried it on
the fcoe-next tree, which is 2.6.35-rc3, it caused a crash due to
blk_update_request referencing a freed bio.  So, maybe just setting
rq->bio to NULL is correct.  If we freed the bio, we should do that
at the same place.   Just wanted to let you know that there's a
problem with the patch at this point.

I think that we need to call blk_end_request_all before calling
blk_rq_unmap_user.

I've just updated the kernel version to the latest git and confirmed
that this patch works for my IBM POWER box.

=
From: FUJITA Tomonori<fujita.tomonori@xxxxxxxxxxxxx>
Date: Thu, 5 Aug 2010 17:02:42 +0900
Subject: [PATCH] tgt: fix warning

We don't set rq->bio to NULL so we get the following warnings:

Badness at /home/fujita/git/linux-2.6/block/blk-core.c:1108
NIP: c0000000001bfb58 LR: c0000000001bfb48 CTR: c0000000001bfb08

We don't leak bios (blk_rq_unmap_user should free them). We don't need
to call blk_end_request_all but let's finish a request like everyone.

Signed-off-by: FUJITA Tomonori<fujita.tomonori@xxxxxxxxxxxxx>
---
  drivers/scsi/scsi_tgt_lib.c |    1 +
  1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/drivers/scsi/scsi_tgt_lib.c b/drivers/scsi/scsi_tgt_lib.c
index 66241dd..fff1b61 100644
--- a/drivers/scsi/scsi_tgt_lib.c
+++ b/drivers/scsi/scsi_tgt_lib.c
@@ -184,6 +184,7 @@ static void scsi_tgt_cmd_destroy(struct work_struct *work)

  	dprintk("cmd %p %d %u\n", cmd, cmd->sc_data_direction,
  		rq_data_dir(cmd->request));
+	blk_end_request_all(cmd->request, 0);
  	scsi_unmap_user_pages(tcmd);
  	scsi_host_put_command(scsi_tgt_cmd_to_host(cmd), cmd);

Both blk_end_request_all() and scsi_host_put_command() do
a __blk_put_request().

I get a fault in the above call to scsi_tgt_cmd_to_host() where
it derefs the request, presumably because the request has been
freed by blk_end_request_all().

This happens once, early in my LUN discovery sequence (maybe
an Inquiry or Report LUNs) and then it seems OK.  My system
goes on after the fault.

So, is there a case where we might not have started a request?
I don't see that.  I thought maybe a special case like
an I/O with no buffer and dir == DMA_NONE, like a Test Unit Ready.
Or maybe the request refcnt is not correct in some case?

I tried merely getting the host pointer before blk_end_request_all(),
but that just lead to another problem in scsi_host_put_command()
since it also refers to the request to get the tcmd.

	Regards,
	Joe
--
To unsubscribe from this list: send the line "unsubscribe stgt" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html