RE: [PATCH RFC] Introduce verbs API for NVMe-oF target offload

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Oren Duer [mailto:oren.duer@xxxxxxxxx]

General question, was there already an RFC for the kernel side that I missed or
is it still pending?

> 1) Providing an RNIC the ability not to support non-offloaded operations on an
> offloaded QP.
> Once the QP will be modified to offloaded only a single SQ WQE with the
> connect response will
> Be executed and after that no additional SQ WQEs will be allowed from the
> consumer.

You didn't respond to the requirement of having the ability to not support
Non-offloaded operations.
Was this missed? Do you see an issue with that?

> >> +++ b/libibverbs/man/ibv_map_nvmf_nsid.3
> >> +.B ibv_map_nvmf_nsid()
> >> +adds a new NVMe-oF namespace mapping to a given \fInvme_ctrl\fR.
> >> +The mapping is from the fabric facing frontend namespace ID
> >> +.I fe_nsid
> >> +to namespace
> >> +.I nvme_nsid
> >> +on this NVMe subsystem.
> >> +.I fe_nsid
> >> +must be unique within the SRQ that
> >> +.I nvme_ctrl
> >> +belongs to, all ibv_nvme_ctrl objects attached to the same SRQ share
> >> the same number space.
> >> +.PP
> >> +.I lba_data_size
> >> +defines the block size this namespace is formatted to, in bytes. Only
> >> specific block sizes are supported by the device.
> >> +Mapping several namespaces of the same NVMe subsystem will be done by
> >> calling this function several times with the same
> >> +.I nvme_ctrl
> >> +while assigning different
> >> +.I fe_nsid
> >> +and
> >> +.I nvme_nsid
> >> +with each call.
> >
> > Many NVMe controllers require multiple QPs in order to get high bandwidth.
> Having
> > a single NVMe QP (nvme_ctrl object) for each NVMe-oF namespace might be a
> bottleneck.
> > Would it be worth getting a list of nvme_ctrl to be used by the device when
> accessing this
> > namespace?
> 
> You mean many NVMe controllers require multiple SQs in order to get high
> BW...
> nvme_ctrl should represent a single NVMe device. A single front-facing
> NSID will point to a single backend {nvme_ctrl,nsid}.
> If that is a requirement, we should probably add multiple SQs to a
> single nvme_ctrl, so that hardware can round-robin the requests
> between them.
> 

Yes, that is what I had in mind.

> >
> >> +.SH "RETURN VALUE"
> >> +.B ibv_map_nvmf_nsid()
> >> +and
> >> +.B ibv_unmap_nvmf_nsid
> >> +returns 0 on success, or the value of errno on failure (which
> >> indicates the failure reason).
> >> +.PP
> >> +failure reasons may be:
> >> +.IP EEXIST
> >> +Trying to map an already existing front-facing
> >> +.I fe_nsid
> >> +.IP ENOENT
> >> +Trying to delete a non existing front-facing
> >> +.I fe_nsid
> >> +.IP ENOTSUP
> >> +Given
> >> +.I lba_data_size
> >> +is not supported on this device. Check device release notes for
> >> supported sizes. Format the NVMe namespace to a LBA Format where the
> >> data + metadata size is supported by the device.
> >
> > Why should this not be a capability instead of checking in release notes of
> device?
> 
> Yes, thought about making this a capability bitmask, but then it
> seemed to many options… besides the regular power-of-two block sizes
> there are combinations of metadata. So for instance a device could
> support 4096, 4160 (8x 512B blocks + 8B DIF each), 4104 (1x 4K block
> with 1 8B DIF), or any other metadata size…
> 

What about two bits maps, one for the block sizes supported (power of two) and one 
for whether the specific block size with metadata (8B DIF tag) is also supported?

> >> diff --git a/libibverbs/verbs.h b/libibverbs/verbs.h
> >> index 0785c77..3d32cf4 100644
> >> --- a/libibverbs/verbs.h
> >> +++ b/libibverbs/verbs.h
> >> @@ -398,6 +420,8 @@ enum ibv_event_type {
> >>         IBV_EVENT_CLIENT_REREGISTER,
> >>         IBV_EVENT_GID_CHANGE,
> >>         IBV_EVENT_WQ_FATAL,
> >> +       IBV_EVENT_NVME_PCI_ERR,
> >> +       IBV_EVENT_NVME_TIMEOUT
> >>  };
> >
> > We need a new event to give in case a non-supported operation
> > Arrives on an offloaded QP that doesn't support co-existence of
> > Non-offloaded operations. E.g. IBV_EVENT_UNSUPPORTED_NVMF_OP
> >
> 
> You mean besides the QP going to error?
> 

Yes, that is the way the application will be notified that the QP has moved to error state.

��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux