Re: [PATCH for-next 0/2] RDMA/erdma: Introduce custom implementation of drain_sq and drain_rq

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 8/31/22 2:45 AM, Tom Talpey wrote:
> On 8/29/2022 12:01 AM, Cheng Xu wrote:
>>
>>
>> On 8/26/22 9:57 PM, Jason Gunthorpe wrote:
>>> On Fri, Aug 26, 2022 at 09:11:25AM -0400, Tom Talpey wrote:
>>>
>>>> With your change, ERDMA will pre-emptively fail such a newly posted
>>>> request, and generate no new completion. The consumer is left in limbo
>>>> on the status of its prior requests. Providers must not override this.
>>>
>>> Yeah, I tend to agree with Tom.
>>>
>>> And I also want to point out that Linux RDMA verbs does not follow the
>>> SW specifications of either IBTA or the iWarp group. We have our own
>>> expectation for how these APIs work that our own ULPs rely on.
>>>
>>> So pedantically debating what a software spec we don't follow says is
>>> not relavent. The utility is to understand the intention and use cases
>>> and ensure we cover the same. Usually this means we follow the spec :)
>>>
>>
>> Yeah, I totally agree with this.
>>
>> Actually, I thought that ULPs do not concern about the details of how the
>> flushing and modify_qp being performed in the drivers. The drain flow is
>> handled by a single ib_drain_qp call for ULPs. While ib_drain_qp API allows
>> vendor-custom implementation, this is invisible to ULPs.
>>
>> For the ULPs which implement their own drain flow instead of using
>> ib_drain_qp  (I think it is rare in kernel), they will fail in erdma.
>>
>> Anyway, since our implementation is disputed, We'd like to keep the same
>> behavior with other vendors. Maybe firmware updating w/o driver changes or
>> software flushing in driver will fix this.
> 
> To be clear, my concern is about the ordering of CQE flushes with
> respect to the WR posting fails.Draining the CQs in whatever way
> you choose to optimize for your device is not the issue, although
> it seems odd to me that you need such a thing.
> 

Yeah, I understand what you concern about. I'm sorry that there may be
ambiguity in my last reply.

After discussed internally, we would like to drop this patch (e.g., failing
WRs before drain, or failing WRs in QP Error State), because it indeed has problem
in the cases you mentioned. And we are seeking for new solutions. New solutions
will not failing the WRs in drain cases, and by this erdma will have the same behavior
with other vendors.

More, the reason why we introduced this patch is that our hardware do not flush
newly WRs in QP Error State currently. So new solutions could be:
 - Let our hardware flush newly WRs, or
 - Flush WRs in our driver if hardware does not flush.
Either of them will eliminate the odd logic in this patch. For now, we tend the
first option.

Thanks,
Cheng Xu


> The problem is that your patch started failing the new requests
> _before_ the drain could be used to clean up. This introduced
> two new provider behaviors that consumers would not expect:
> 
> - first error detected in a post call (on the fast path!)
> - inability to determine if prior requests were complete
> 
> I'd really suggest getting a copy of the full IB spec and examining
> the difference between QP "Error" and "SQ Error" states. They are
> subtle but important.
> 
> Tom.



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux