Re: [RFC PATCH 00/11] Rust null block driver

Andreas Hindborg <nmi@xxxxxxxxxxxx> · Fri, 05 May 2023 07:06:55 +0200

Jens Axboe <axboe@xxxxxxxxx> writes:

> On 5/4/23 1:59?PM, Andreas Hindborg wrote:
>> 
>> Jens Axboe <axboe@xxxxxxxxx> writes:
>> 
>>> On 5/4/23 12:52?PM, Keith Busch wrote:
>>>> On Thu, May 04, 2023 at 11:36:01AM -0700, Bart Van Assche wrote:
>>>>> On 5/4/23 11:15, Andreas Hindborg wrote:
>>>>>> If it is still unclear to you why this effort was started, please do let
>>>>>> me know and I shall try to clarify further :)
>>>>>
>>>>> It seems like I was too polite in my previous email. What I meant is that
>>>>> rewriting code is useful if it provides a clear advantage to the users of
>>>>> a driver. For null_blk, the users are kernel developers. The code that has
>>>>> been posted is the start of a rewrite of the null_blk driver. The benefits
>>>>> of this rewrite (making low-level memory errors less likely) do not outweigh
>>>>> the risks that this effort will introduce functional or performance regressions.
>>>>
>>>> Instead of replacing, would co-existing be okay? Of course as long as
>>>> there's no requirement to maintain feature parity between the two.
>>>> Actually, just call it "rust_blk" and declare it has no relationship to
>>>> null_blk, despite their functional similarities: it's a developer
>>>> reference implementation for a rust block driver.
>>>
>>> To me, the big discussion point isn't really whether we're doing
>>> null_blk or not, it's more if we want to go down this path of
>>> maintaining rust bindings for the block code in general. If the answer
>>> to that is yes, then doing null_blk seems like a great choice as it's
>>> not a critical piece of infrastructure. It might even be a good idea to
>>> be able to run both, for performance purposes, as the bindings or core
>>> changes.
>>>
>>> But back to the real question... This is obviously extra burden on
>>> maintainers, and that needs to be sorted out first. Block drivers in
>>> general are not super security sensitive, as it's mostly privileged code
>>> and there's not a whole lot of user visibile API. And the stuff we do
>>> have is reasonably basic. So what's the long term win of having rust
>>> bindings? This is a legitimate question. I can see a lot of other more
>>> user exposed subsystems being of higher interest here.
>> 
>> Even though the block layer is not usually exposed in the same way
>> that something like the USB stack is, absence of memory safety bugs is
>> a very useful property. If this is attainable without sacrificing
>> performance, it seems like a nice option to offer future block device
>> driver developers. Some would argue that it is worth offering even in
>> the face of performance regression.
>> 
>> While memory safety is the primary feature that Rust brings to the
>> table, it does come with other nice features as well. The type system,
>> language support stackless coroutines and error handling language
>> support are all very useful.
>
> We're in violent agreement on this part, I don't think anyone sane would
> argue that memory safety with the same performance [1] isn't something
> you'd want. And the error handling with rust is so much better than the
> C stuff drivers do now that I can't see anyone disagreeing on that being
> a great thing as well.
>
> The discussion point here is the price being paid in terms of people
> time.
>
>> Regarding maintenance of the bindings, it _is_ an amount extra work. But
>> there is more than one way to structure that work. If Rust is accepted
>> into the block layer at some point, maintenance could be structured in
>> such a way that it does not get in the way of existing C maintenance
>> work. A "rust keeps up or it breaks" model. That could work for a while.
>
> That potentially works for null_blk, but it would not work for anything
> that people actually depend on. In other words, anything that isn't
> null_blk. And I don't believe we'd be actively discussing these bindings
> if just doing null_blk is the end goal, because that isn't useful by
> itself, and at that point we'd all just be wasting our time. In the real
> world, once we have just one actual driver using it, then we'd be
> looking at "this driver regressed because of change X/Y/Z and that needs
> to get sorted before the next release". And THAT is the real issue for
> me. So a "rust keeps up or it breaks" model is a bit naive in my
> opinion, it's just not a viable approach. In fact, even for null_blk,
> this doesn't really fly as we rely on blktests to continually vet the
> sanity of the IO stack, and null_blk is an integral part of that.

Sure, once there are actual users, this model would not work. But during
an introduction period it might be a useful model. Having Rust around
without having to take care of it might give
maintainers,reviewers,contributors a no strings attached opportunity to
dabble with the language in a domain they are familiar with.

>
> So I really don't think there's much to debate between "rust people vs
> jens" here, as we agree on the benefits, but my end of the table has to
> stomach the cons. And like I mentioned in an earlier email, that's not
> just on me, there are other regular contributors and reviewers that are
> relevant to this discussion. This is something we need to discuss.
>
> [1] We obviously need to do real numbers here, the ones posted I don't
> consider stable enough to be useful in saying "yeah it's fully on part".
> If you have an updated rust nvme driver that uses these bindings I'd
> be happy to run some testing that will definitively tell us if there's a
> performance win, loss, or parity, and how much.

I do plan to rebase the NVMe driver somewhere in the next few months.
I'll let you know when that work is done.

Best regards
Andreas