Re: [RFC PATCH 00/28] INFINIBAND NETWORK BLOCK DEVICE (IBNBD)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Sagi,

On Mon, Mar 27, 2017 at 4:20 AM, Sagi Grimberg <sagi@xxxxxxxxxxx> wrote:
>
>> This series introduces IBNBD/IBTRS kernel modules.
>>
>> IBNBD (InfiniBand network block device) allows for an RDMA transfer of
>> block IO
>> over InfiniBand network. The driver presents itself as a block device on
>> client
>> side and transmits the block requests in a zero-copy fashion to the
>> server-side
>> via InfiniBand. The server part of the driver converts the incoming
>> buffers back
>> into BIOs and hands them down to the underlying block device. As soon as
>> IO
>> responses come back from the drive, they are being transmitted back to the
>> client.
>
>
> Hi Jack, Danil and Roman,
>
> I met Danil and Roman last week at Vault, and I think you guys
> are awesome, thanks a lot for open-sourcing your work! However,
> I have a couple of issues here, some are related to the code and
> some are fundamental actually.

Thanks for comments and suggestions, reply in inline.

>
> - Is there room for this ibnbd? If we were to take every block driver
>   that was submitted without sufficient justification, it'd be very
>   hard to maintain. What advantage (if any) does this buy anyone over
>   existing rdma based protocols (srp, iser, nvmf)? I'm really (*really*)
>   not sold on this one...
>
> - To me, the fundamental design that the client side owns a pool of
>   buffers that it issues writes too, seems inferior than the
>   one taken in iser/nvmf (immediate data). IMO, the ibnbd design has
>   scalability issues both in terms of server side resources, client
>   side contention and network congestion (on infiniband the latter is
>   less severe).
>
> - I suggest that for your next post, you provide a real-life use-case
>   where each of the existing drivers can't suffice, and by can't
>   suffice I mean that it has a fundamental issue with it, not something
>   that requires a fix. With that our feedback can be much more concrete
>   and (at least on my behalf) more open to accept it.
>
> - I'm not exactly sure why you would suggest that your implementation
>   supports only infiniband if you use rdma_cm for address resolution,
>   nor I understand why you emphasize feature (2) below, nor why even
>   in the presence of rdma_cm you have ibtrs_ib_path? (confused...)
>   iWARP needs a bit more attention if you don't use the new generic
>   interfaces though...
You reminds me, we also tested in rxe drivers in the past, but not iWARP.
Might work.
ibtrs_ib_path was leftover for APM feature, we used in house, will
remove next round.

>
> - I honestly do not understand why you need *19057* LOC to implement
>   a rdma based block driver. Thats almost larger than all of our
>   existing block drivers combined... First glance at the code provides
>   some explanations, (1) you have some strange code that has no business
>   in a block driver like ibtrs_malloc/ibtrs_zalloc (yikes) or
>   open-coding various existing logging routines, (2) you are for some
>   reason adding a second tag allocation scheme (why?), (3) you are open
>   coding a lot of stuff that we added to the stack in the past months...
>   (4) you seem to over-layer your code for reasons that I do not
>   really understand. And I didn't really look deep at all into the
>   code, just to get the feel of it, and it seems like it needs a lot
>   of work before it can even be considered upstream ready.
Agree, we will clean up the code further, that's why I sent it RFC to
get a early feedback.


>
>> We design and implement this solution based on our need for Cloud
>> Computing,
>> the key features are:
>> - High throughput and low latency due to:
>> 1) Only two rdma messages per IO
>
>
> Where exactly did you witnessed latency that was meaningful by having
> another rdma message on the wire? That's only for writes, anyway, and
> we have first data bursts for that..
Clearly, we need to benchmark on latest kernel.

>
>> 2) Simplified client side server memory management
>> 3) Eliminated SCSI sublayer
>
>
> That's hardly an advantage given all we are losing without it...
>
> ...
>
> Cheers,
> Sagi.

Thanks,
-- 
Jack Wang
Linux Kernel Developer

ProfitBricks GmbH
Greifswalder Str. 207
D - 10405 Berlin

Tel:       +49 30 577 008  042
Fax:      +49 30 577 008 299
Email:    jinpu.wang@xxxxxxxxxxxxxxxx
URL:      https://www.profitbricks.de

Sitz der Gesellschaft: Berlin
Registergericht: Amtsgericht Charlottenburg, HRB 125506 B
Geschäftsführer: Achim Weiss




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux