----- Original Message ----- > From: "Hannes Reinecke" <hare@xxxxxxx> > To: emilne@xxxxxxxxxx, "vasu dev" <vasu.dev@xxxxxxxxx>, "robert w love" <robert.w.love@xxxxxxxxx> > Cc: "Laurence Oberman" <loberman@xxxxxxxxxx>, "Linux SCSI Mailinglist" <linux-scsi@xxxxxxxxxxxxxxx>, > fcoe-devel@xxxxxxxxxxxxx, "Curtis Taylor (cjt@xxxxxxxxxx)" <cjt@xxxxxxxxxx>, "Bud Brown" <bubrown@xxxxxxxxxx> > Sent: Wednesday, October 12, 2016 11:46:16 AM > Subject: Re: [Open-FCoE] Issue with fc_exch_alloc failing initiated by fc_queuecommand on NUMA or large > configurations with Intel ixgbe running FCOE > > On 10/12/2016 05:26 PM, Ewan D. Milne wrote: > > On Tue, 2016-10-11 at 10:51 -0400, Ewan D. Milne wrote: > >> On Sat, 2016-10-08 at 19:35 +0200, Hannes Reinecke wrote: > >>> You might actually be hitting a limitation in the exchange manager code. > >>> The libfc exchange manager tries to be really clever and will assign a > >>> per-cpu exchange manager (probably to increase locality). However, we > >>> only have a limited number of exchanges, so on large systems we might > >>> actually run into a exchange starvation problem, where we have in theory > >>> enough free exchanges, but none for the submitting cpu. > >>> > >>> (Personally, the exchange manager code is in urgent need of reworking. > >>> It should be replaced by the sbitmap code from Omar). > >>> > >>> Do check how many free exchanges are actually present for the stalling > >>> CPU; it might be that you run into a starvation issue. > >> > >> We are still looking into this but one thing that looks bad is that > >> the exchange manager code rounds up the number of CPUs to the next > >> power of 2 before dividing up the exchange id space (and uses the lsbs > >> of the xid to extract the CPU when looking up an xid). We have a machine > >> with 288 CPUs, this code is just begging for a rewrite as it looks to > >> be wasting most of the limited xid space on ixgbe FCoE. > >> > >> Looks like we get 512 offloaded xids on this adapter and 4096-512 > >> non-offloaded xids. This would give 1 + 7 xids per CPU. However, I'm > >> not sure that even 4096 / 288 = 14 would be enough to prevent stalling. > >> > >> And, of course, potentially most of the CPUs aren't submitting I/O, so > >> the whole idea of per-CPU xid space is questionable. > >> > > > > fc_exch_alloc() used to try all the available exchange managers in the > > list for an available exchange id, but this was changed in 2010 so that > > if the first matched exchange manager couldn't allocate one, it fails > > and we end up returning host busy. This was due to commit: > > > > commit 3e22760d4db6fd89e0be46c3d132390a251da9c6 > > Author: Vasu Dev <vasu.dev@xxxxxxxxx> > > Date: Fri Mar 12 16:08:39 2010 -0800 > > > > [SCSI] libfc: use offload EM instance again instead jumping to next EM > > > > Since use of offloads is more efficient than switching > > to non-offload EM. However kept logic same to call em_match > > if it is provided in the list of EMs. > > > > Converted fc_exch_alloc to inline being now tiny a function > > and already not an exported libfc API any more. > > > > Signed-off-by: Vasu Dev <vasu.dev@xxxxxxxxx> > > Signed-off-by: Robert Love <robert.w.love@xxxxxxxxx> > > Signed-off-by: James Bottomley <James.Bottomley@xxxxxxx> > > > > --- > > > > Setting the ddp_min module parameter to fcoe to 128MB prevents the ->match > > function from permitting the use of the offload exchange manager for the > > frame, > > and we no longer see the problem with host busy status, since it uses the > > larger non-offloaded pool. > > > Yes, this is also the impression I got from reading the spec. > The offload pool is mainly designed for large read or write commands, so > using it for _every_ frame is probably not a good idea. > And limiting it by the size of the transfers solves the problem quite > nicely, as a large size typically is only used by read and writes. > So please send a patch to revert that. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke zSeries & Storage > hare@xxxxxxx +49 911 74053 688 > SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) > -- > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > I will revert the commit and test it here in the lab, and then submit the revert patch. Ewan can review. Thanks Laurence -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html