Re: SQ overflow seen running isert traffic with high block sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 15, 2018 at 03:12:36AM -0700, Kalderon, Michal wrote:
> > From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
> > owner@xxxxxxxxxxxxxxx] On Behalf Of Nicholas A. Bellinger
> > Sent: Monday, January 15, 2018 6:57 AM
> > To: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
> > Cc: Amrani, Ram <Ram.Amrani@xxxxxxxxxx>; Sagi Grimberg
> > <sagi@xxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Elior, Ariel
> > <Ariel.Elior@xxxxxxxxxx>; target-devel <target-devel@xxxxxxxxxxxxxxx>;
> > Potnuri Bharat Teja <bharat@xxxxxxxxxxx>
> > Subject: Re: SQ overflow seen running isert traffic with high block sizes
> > 
> > Hi Shiraz, Ram, Ariel, & Potnuri,
> > 
> > Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> > target queue-full memory leak:
> > 
> > https://www.spinics.net/lists/target-devel/msg16282.html
> > 
> > Just curious how frequent this happens in practice with sustained large block
> > workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> > qedr and iw_cxgb4)..?
> > 
> > Is there anything else from an iser-target consumer level that should be
> > changed for iwarp to avoid repeated ib_post_send() failures..?
> > 
> Would like to mention, that although we are an iWARP RNIC as well, we've hit this
> Issue when running RoCE. It's not iWARP related. 
> This is easily reproduced within seconds with IO size of 5121K
> Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.
> 
> IO Command used:
> maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1
> 
> thanks,
> Michal

Its seen with block size >= 2M on a single target 1 RAM disk config. And similar to Michals report;
rather quickly, in a matter of seconds.

fio --rw=read --bs=2048k --numjobs=1 --iodepth=128 --runtime=30 --size=20g --loops=1 --ioengine=libaio 
--direct=1 --invalidate=1 --fsync_on_close=1 --norandommap --exitall --filename=/dev/sdb --name=sdb 

Shiraz

> 
> > On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote:
> > > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote:
> > > > Hi Nicholas,
> > > >
> > > > > Just to confirm, the following four patches where required to get
> > > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly
> > > > > small number of hw SGEs:
> > > > >
> > > > > 7a56dc8 iser-target: avoid posting a recv buffer twice 555a65f
> > > > > iser-target: Fix queue-full response handling
> > > > > a446701 iscsi-target: Propigate queue_data_in + queue_status
> > > > > errors fa7e25c target: Fix unknown fabric callback queue-full
> > > > > errors
> > > > >
> > > > > So Did you test with Q-Logic/Cavium with RoCE using these four
> > > > > patches, or just with commit a4467018..?
> > > > >
> > > > > Note these have not been CC'ed to stable yet, as I was reluctant
> > > > > since they didn't have much mileage on them at the time..
> > > > >
> > > > > Now however, they should be OK to consider for stable, especially
> > > > > if they get you unblocked as well.
> > > >
> > > > The issue is still seen with these four patches.
> > > >
> > > > Thanks,
> > > > Ram
> > >
> > > Hi,
> > >
> > > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ
> > > overflow being hit on isert for larger block sizes. 4.14-rc2 kernel.
> > >
> > > Eventually there is a timeout/conn-error on iser initiator and the
> > connection is torn down.
> > >
> > > The aforementioned patches dont seem to be alleviating the SQ overflow
> > issue?
> > >
> > > Initiator
> > > ------------
> > >
> > > [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295]
> > > iscsi: invalid can_queue of 55. can_queue must be a power of 2.
> > > [17007.466924] iscsi: Rounding can_queue to 32.
> > > [17007.471535] scsi 11:0:0:0: Direct-Access     LIO-ORG  ramdisk1_40G     4.0
> > PQ: 0 ANSI: 5
> > > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit
> > > TPGS [17007.471656] scsi 11:0:0:0: alua: device
> > > naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1
> > > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0
> > > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks:
> > > (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is
> > > off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08
> > > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache:
> > > enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb]
> > > Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition
> > > timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00
> > state A non-preferred supports TOlUSNA [17031.269821]  sdb:
> > > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data
> > > mode. Opts: (null) [17049.056155]  connection2:0: ping timeout of 5
> > > secs expired, recv timeout 5, last rx 4311705998, last ping
> > > 4311711232, now 4311716352 [17049.057499]  connection2:0: detected
> > > conn error (1022) [17049.057558] modifyQP to CLOSING qp 3
> > > next_iw_state 3 [..]
> > >
> > >
> > > Target
> > > ----------
> > > [....]
> > > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020
> > > failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ea1f8 failed to post RDMA res [17066.397184]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> > > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30
> > > failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8f20a0 failed to post RDMA res [17066.397195]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
> > > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48
> > > failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ec020 failed to post RDMA res [17066.397204]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
> > > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id =
> > > 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0
> > > failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ecc30 failed to post RDMA res [17066.397215]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
> > > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800
> > > failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ede48 failed to post RDMA res [17066.397232]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
> > > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8
> > > failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode
> > > 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8e9bf0 failed to post RDMA res [17066.397242]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
> > > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id =
> > > 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0
> > > failed to post RDMA res [17066.397251] QP 3 flush_issued
> > > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800
> > > failed to post RDMA res [17066.397253] Got unknown fabric queue
> > > status: -22 [17066.397254] QP 3 flush_issued [17066.397254]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
> > > [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP
> > > 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0
> > > ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown
> > > fabric queue status: -22 [17066.397267] QP 3 flush_issued
> > > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8
> > > failed to post RDMA res [17066.397268] Got unknown fabric queue
> > > status: -22 [17066.397287] QP 3 flush_issued [17066.397287]
> > > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert:
> > > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> > > [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP
> > > 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0
> > > ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd:
> > > ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown
> > > fabric queue status: -22 [17066.397295] QP 3 flush_issued
> > > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0
> > > failed to post RDMA res [17066.397297] Got unknown fabric queue
> > > status: -22 [17066.397307] QP 3 flush_issued [17066.397307]
> > > i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert:
> > > isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw
> > > i40iw_qp_disconnect Call close API [....]
> > >
> > > Shiraz
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe
> > > target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> > body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> > http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux