RE: SQ overflow seen running isert traffic with high block sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: linux-rdma-owner@xxxxxxxxxxxxxxx [mailto:linux-rdma-
> owner@xxxxxxxxxxxxxxx] On Behalf Of Nicholas A. Bellinger
> Sent: Monday, January 15, 2018 6:57 AM
> To: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
> Cc: Amrani, Ram <Ram.Amrani@xxxxxxxxxx>; Sagi Grimberg
> <sagi@xxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Elior, Ariel
> <Ariel.Elior@xxxxxxxxxx>; target-devel <target-devel@xxxxxxxxxxxxxxx>;
> Potnuri Bharat Teja <bharat@xxxxxxxxxxx>
> Subject: Re: SQ overflow seen running isert traffic with high block sizes
> 
> Hi Shiraz, Ram, Ariel, & Potnuri,
> 
> Following up on this old thread, as it relates to Potnuri's recent fix for a iser-
> target queue-full memory leak:
> 
> https://www.spinics.net/lists/target-devel/msg16282.html
> 
> Just curious how frequent this happens in practice with sustained large block
> workloads, as it appears to effect at least three different iwarp RNICS (i40iw,
> qedr and iw_cxgb4)..?
> 
> Is there anything else from an iser-target consumer level that should be
> changed for iwarp to avoid repeated ib_post_send() failures..?
> 
Would like to mention, that although we are an iWARP RNIC as well, we've hit this
Issue when running RoCE. It's not iWARP related. 
This is easily reproduced within seconds with IO size of 5121K
Using 5 Targets with 2 Ram Disk each and 5 targets with FileIO Disks each.

IO Command used:
maim -b512k -T32 -t2 -Q8 -M0 -o -u -n -m17 -ftargets.dat -d1

thanks,
Michal

> On Fri, 2017-10-06 at 17:40 -0500, Shiraz Saleem wrote:
> > On Mon, Jul 17, 2017 at 03:26:04AM -0600, Amrani, Ram wrote:
> > > Hi Nicholas,
> > >
> > > > Just to confirm, the following four patches where required to get
> > > > Potnuri up and running on iser-target + iw_cxgb4 with a similarly
> > > > small number of hw SGEs:
> > > >
> > > > 7a56dc8 iser-target: avoid posting a recv buffer twice 555a65f
> > > > iser-target: Fix queue-full response handling
> > > > a446701 iscsi-target: Propigate queue_data_in + queue_status
> > > > errors fa7e25c target: Fix unknown fabric callback queue-full
> > > > errors
> > > >
> > > > So Did you test with Q-Logic/Cavium with RoCE using these four
> > > > patches, or just with commit a4467018..?
> > > >
> > > > Note these have not been CC'ed to stable yet, as I was reluctant
> > > > since they didn't have much mileage on them at the time..
> > > >
> > > > Now however, they should be OK to consider for stable, especially
> > > > if they get you unblocked as well.
> > >
> > > The issue is still seen with these four patches.
> > >
> > > Thanks,
> > > Ram
> >
> > Hi,
> >
> > On X722 Iwarp NICs (i40iw) too, we are seeing a similar issue of SQ
> > overflow being hit on isert for larger block sizes. 4.14-rc2 kernel.
> >
> > Eventually there is a timeout/conn-error on iser initiator and the
> connection is torn down.
> >
> > The aforementioned patches dont seem to be alleviating the SQ overflow
> issue?
> >
> > Initiator
> > ------------
> >
> > [17007.465524] scsi host11: iSCSI Initiator over iSER [17007.466295]
> > iscsi: invalid can_queue of 55. can_queue must be a power of 2.
> > [17007.466924] iscsi: Rounding can_queue to 32.
> > [17007.471535] scsi 11:0:0:0: Direct-Access     LIO-ORG  ramdisk1_40G     4.0
> PQ: 0 ANSI: 5
> > [17007.471652] scsi 11:0:0:0: alua: supports implicit and explicit
> > TPGS [17007.471656] scsi 11:0:0:0: alua: device
> > naa.6001405ab790db5e8e94b0998ab4bf0b port group 0 rel port 1
> > [17007.471782] sd 11:0:0:0: Attached scsi generic sg2 type 0
> > [17007.472373] sd 11:0:0:0: [sdb] 83886080 512-byte logical blocks:
> > (42.9 GB/40.0 GiB) [17007.472405] sd 11:0:0:0: [sdb] Write Protect is
> > off [17007.472406] sd 11:0:0:0: [sdb] Mode Sense: 43 00 00 08
> > [17007.472462] sd 11:0:0:0: [sdb] Write cache: disabled, read cache:
> > enabled, doesn't support DPO or FUA [17007.473412] sd 11:0:0:0: [sdb]
> > Attached SCSI disk [17007.478184] sd 11:0:0:0: alua: transition
> > timeout set to 60 seconds [17007.478186] sd 11:0:0:0: alua: port group 00
> state A non-preferred supports TOlUSNA [17031.269821]  sdb:
> > [17033.359789] EXT4-fs (sdb1): mounted filesystem with ordered data
> > mode. Opts: (null) [17049.056155]  connection2:0: ping timeout of 5
> > secs expired, recv timeout 5, last rx 4311705998, last ping
> > 4311711232, now 4311716352 [17049.057499]  connection2:0: detected
> > conn error (1022) [17049.057558] modifyQP to CLOSING qp 3
> > next_iw_state 3 [..]
> >
> >
> > Target
> > ----------
> > [....]
> > [17066.397179] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397180] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020
> > failed to post RDMA res [17066.397183] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397183] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ea1f8 failed to post RDMA res [17066.397184]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397184] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> > [17066.397187] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397188] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30
> > failed to post RDMA res [17066.397192] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397192] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8f20a0 failed to post RDMA res [17066.397195]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397196] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800 failed to post RDMA res
> > [17066.397196] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397197] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48
> > failed to post RDMA res [17066.397200] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397200] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ec020 failed to post RDMA res [17066.397204]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397204] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8 failed to post RDMA res
> > [17066.397206] i40iw i40iw_process_aeq ae_id = 0x503 bool qp=1 qp_id =
> > 3 [17066.397207] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397207] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0
> > failed to post RDMA res [17066.397211] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397211] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ecc30 failed to post RDMA res [17066.397215]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397215] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0 failed to post RDMA res
> > [17066.397218] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397219] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800
> > failed to post RDMA res [17066.397219] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397220] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ede48 failed to post RDMA res [17066.397232]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397233] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ec020 failed to post RDMA res
> > [17066.397237] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397237] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8
> > failed to post RDMA res [17066.397238] i40iw_post_send: qp 3 wr_opcode
> > 0 ret_err -12 [17066.397238] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8e9bf0 failed to post RDMA res [17066.397242]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -12 [17066.397242] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ecc30 failed to post RDMA res
> > [17066.397245] i40iw_post_send: qp 3 wr_opcode 0 ret_err -12
> > [17066.397247] i40iw i40iw_process_aeq ae_id = 0x501 bool qp=1 qp_id =
> > 3 [17066.397247] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0
> > failed to post RDMA res [17066.397251] QP 3 flush_issued
> > [17066.397252] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > [17066.397252] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea800
> > failed to post RDMA res [17066.397253] Got unknown fabric queue
> > status: -22 [17066.397254] QP 3 flush_issued [17066.397254]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397254] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ede48 failed to post RDMA res
> > [17066.397255] Got unknown fabric queue status: -22 [17066.397258] QP
> > 3 flush_issued [17066.397258] i40iw_post_send: qp 3 wr_opcode 0
> > ret_err -22 [17066.397259] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ec020 failed to post RDMA res [17066.397259] Got unknown
> > fabric queue status: -22 [17066.397267] QP 3 flush_issued
> > [17066.397267] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > [17066.397268] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8ea1f8
> > failed to post RDMA res [17066.397268] Got unknown fabric queue
> > status: -22 [17066.397287] QP 3 flush_issued [17066.397287]
> > i40iw_post_send: qp 3 wr_opcode 0 ret_err -22 [17066.397288] isert:
> > isert_rdma_rw_ctx_post: Cmd: ffff8817fb8e9bf0 failed to post RDMA res
> > [17066.397288] Got unknown fabric queue status: -22 [17066.397291] QP
> > 3 flush_issued [17066.397292] i40iw_post_send: qp 3 wr_opcode 0
> > ret_err -22 [17066.397292] isert: isert_rdma_rw_ctx_post: Cmd:
> > ffff8817fb8ecc30 failed to post RDMA res [17066.397292] Got unknown
> > fabric queue status: -22 [17066.397295] QP 3 flush_issued
> > [17066.397296] i40iw_post_send: qp 3 wr_opcode 0 ret_err -22
> > [17066.397296] isert: isert_rdma_rw_ctx_post: Cmd: ffff8817fb8f20a0
> > failed to post RDMA res [17066.397297] Got unknown fabric queue
> > status: -22 [17066.397307] QP 3 flush_issued [17066.397307]
> > i40iw_post_send: qp 3 wr_opcode 8 ret_err -22 [17066.397308] isert:
> > isert_post_response: ib_post_send failed with -22 [17066.397309] i40iw
> > i40iw_qp_disconnect Call close API [....]
> >
> > Shiraz
> > --
> > To unsubscribe from this list: send the line "unsubscribe
> > target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the
> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> http://vger.kernel.org/majordomo-info.html
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux