On Wed, Aug 10, 2016 at 02:53:45PM +0300, Yishai Hadas wrote: > On 8/8/2016 9:04 PM, Jarod Wilson wrote: > >The man page for ibv_post_send says: > > > > RETURN VALUE > > > > ibv_post_send() returns 0 on success, or the value of errno on failure > > (which indicates the failure reason). > > > >QEMU looks for the return value, and in the ENOMEM case, waits and > >retries, but with mlx5, it ends up dropping requests and hanging, because > >of the unexpected -1 return instead of ENOMEM. > > > >The fix is simple: set err = E<whatever> instead of -1, and eliminate use > >of errno = in _mlx5_post_send, just have mlx5_post_send return the err from > >_mlx5_post_send instead. This fix has been confirmed to resolves the issues > >seen with QEMU. > > > >While we're at it, fix the MW_DEBUG code paths to no muck with errno either. > > > >v2: per discussion with Jason Gunthorpe, don't set errno in mlx5_post_send > > Your patch missed few other flows that should be fixed as part of > mlx5_post_send. (e.g. set_data_inl_seg which returns -1, set_bind_wr > which touches errno, etc). > > Just fixed that with some other minor typos/changes to the commit > log, will send it shortly as V3, left you as the Author to give you > the credit for. > > In addition, > Both mlx5_post_recv and mlx5_post_srq_recv should be fixed in the > same manner, prepared another patch as some candidate fix will send > it shortly as well. > > Once the new candidate patches will pass our regression testing will > take them officially into libmlx5. Works for me, thanks much. -- Jarod Wilson jarod@xxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html