Re: [PATCH RFC] Introduce verbs API for NVMe-oF target offload

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, May 8, 2018 at 5:05 PM, Radzi, Amit <Amit.Radzi@xxxxxxxxxx> wrote:
> General question, was there already an RFC for the kernel side that I missed or
> is it still pending?

Wasn't submitted. We want to go with the userspace first.

>
>> 1) Providing an RNIC the ability not to support non-offloaded operations on an
>> offloaded QP.
>> Once the QP will be modified to offloaded only a single SQ WQE with the
>> connect response will
>> Be executed and after that no additional SQ WQEs will be allowed from the
>> consumer.
>
> You didn't respond to the requirement of having the ability to not support
> Non-offloaded operations.
> Was this missed? Do you see an issue with that?

Responded below... I don't see an issue having a cap bit saying
support_non_offload. It should probably mean both ways: don't try to
post work, and don't expect getting RX completions.
I think the best would be to modify the QP to offload AFTER receiving
the first CONNECT message. So the support_non_offload will be easy to
explain as it applies only after the QP is in offload state.

>> Yes, thought about making this a capability bitmask, but then it
>> seemed to many options… besides the regular power-of-two block sizes
>> there are combinations of metadata. So for instance a device could
>> support 4096, 4160 (8x 512B blocks + 8B DIF each), 4104 (1x 4K block
>> with 1 8B DIF), or any other metadata size…
>>
>
> What about two bits maps, one for the block sizes supported (power of two) and one
> for whether the specific block size with metadata (8B DIF tag) is also supported?

Metadata size can be anything in bytes granularity, not necessarily
8B... The spec allocated 16 bits to describe the metadata size of a
namespace.
Think about a storage system built with this capability. I don't think
the system's software is going to query supported sizes of the RDMA
NIC and compare them with supported sizes of the NVMe drives, all in
runtime... I believe a system will be designed to support specific
block sizes, with the designer (or administrator) knowledge of the
specific devices in the system.

>
>> >> diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h
>> >> index 0785c77..3d32cf4 100644
>> >> --- a/libibverbs/verbs.h
>> >> +++ b/libibverbs/verbs.h
>> >> @@ -398,6 +420,8 @@ enum ibv_event_type {
>> >>         IBV_EVENT_CLIENT_REREGISTER,
>> >>         IBV_EVENT_GID_CHANGE,
>> >>         IBV_EVENT_WQ_FATAL,
>> >> +       IBV_EVENT_NVME_PCI_ERR,
>> >> +       IBV_EVENT_NVME_TIMEOUT
>> >>  };
>> >
>> > We need a new event to give in case a non-supported operation
>> > Arrives on an offloaded QP that doesn't support co-existence of
>> > Non-offloaded operations. E.g. IBV_EVENT_UNSUPPORTED_NVMF_OP
>> >
>>
>> You mean besides the QP going to error?
>>
>
> Yes, that is the way the application will be notified that the QP has moved to error state.

Right.



-- 
Oren
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux