Benno Lossin <benno.lossin@xxxxxxxxx> writes: > On 04.04.24 11:30, Andreas Hindborg wrote: >> Benno Lossin <benno.lossin@xxxxxxxxx> writes: >> >>> On 04.04.24 07:44, Andreas Hindborg wrote: >>>> Benno Lossin <benno.lossin@xxxxxxxxx> writes: >>>> >>>>> On 03.04.24 10:46, Andreas Hindborg wrote: >>>>>> Benno Lossin <benno.lossin@xxxxxxxxx> writes: >>>>>> >>>>>>> On 23.03.24 07:32, Andreas Hindborg wrote: >>>>>>>> Benno Lossin <benno.lossin@xxxxxxxxx> writes: >>>>>>>>> On 3/13/24 12:05, Andreas Hindborg wrote: >>>>>>>>>> +//! implementations of the `Operations` trait. >>>>>>>>>> +//! >>>>>>>>>> +//! IO requests are passed to the driver as [`Request`] references. The >>>>>>>>>> +//! `Request` type is a wrapper around the C `struct request`. The driver must >>>>>>>>>> +//! mark start of request processing by calling [`Request::start`] and end of >>>>>>>>>> +//! processing by calling one of the [`Request::end`], methods. Failure to do so >>>>>>>>>> +//! can lead to IO failures. >>>>>>>>> >>>>>>>>> I am unfamiliar with this, what are "IO failures"? >>>>>>>>> Do you think that it might be better to change the API to use a >>>>>>>>> callback? So instead of calling start and end, you would do >>>>>>>>> >>>>>>>>> request.handle(|req| { >>>>>>>>> // do the stuff that would be done between start and end >>>>>>>>> }); >>>>>>>>> >>>>>>>>> I took a quick look at the rnull driver and there you are calling >>>>>>>>> `Request::end_ok` from a different function. So my suggestion might not >>>>>>>>> be possible, since you really need the freedom. >>>>>>>>> >>>>>>>>> Do you think that a guard approach might work better? ie `start` returns >>>>>>>>> a guard that when dropped will call `end` and you need the guard to >>>>>>>>> operate on the request. >>>>>>>> >>>>>>>> I don't think that would fit, since the driver might not complete the >>>>>>>> request immediately. We might be able to call `start` on behalf of the >>>>>>>> driver. >>>>>>>> >>>>>>>> At any rate, since the request is reference counted now, we can >>>>>>>> automatically fail a request when the last reference is dropped and it >>>>>>>> was not marked successfully completed. I would need to measure the >>>>>>>> performance implications of such a feature. >>>>>>> >>>>>>> Are there cases where you still need access to the request after you >>>>>>> have called `end`? >>>>>> >>>>>> In general no, there is no need to handle the request after calling end. >>>>>> C drivers are not allowed to, because this transfers ownership of the >>>>>> request back to the block layer. This patch series defer the transfer of >>>>>> ownership to the point when the ARef<Request> refcount goes to zero, so >>>>>> there should be no danger associated with touching the `Request` after >>>>>> end. >>>>>> >>>>>>> If no, I think it would be better for the request to >>>>>>> be consumed by the `end` function. >>>>>>> This is a bit difficult with `ARef`, since the user can just clone it >>>>>>> though... Do you think that it might be necessary to clone requests? >>>>>> >>>>>> Looking into the details now I see that calling `Request::end` more than >>>>>> once will trigger UAF, because C code decrements the refcount on the >>>>>> request. When we have `ARef<Request>` around, that is a problem. It >>>>>> probably also messes with other things in C land. Good catch. >>>>>> >>>>>> I did implement `Request::end` to consume the request at one point >>>>>> before I fell back on reference counting. It works fine for simple >>>>>> drivers. However, most drivers will need to use the block layer tag set >>>>>> service, that allows conversion of an integer id to a request pointer. >>>>>> The abstraction for this feature is not part of this patch set. But the >>>>>> block layer manages a mapping of integer to request mapping, and drivers >>>>>> typically use this to identify the request that corresponds to >>>>>> completion messages that arrive from hardware. When drivers are able to >>>>>> turn integers into requests like this, consuming the request in the call >>>>>> to `end` makes little sense (because we can just construct more). >>>>> >>>>> How do you ensure that this is fine?: >>>>> >>>>> let r1 = tagset.get(0); >>>>> let r2 = tagset.get(0); >>>>> r1.end_ok(); >>>>> r2.do_something_that_would_only_be_done_while_active(); >>>>> >>>>> One thing that comes to my mind would be to only give out `&Request` >>>>> from the tag set. And to destroy, you could have a separate operation >>>>> that also removes the request from the tag set. (I am thinking of a tag >>>>> set as a `HashMap<u64, Request>`. >>>> >>>> This would be similar to >>>> >>>> let r1 = tagset.get(0)?; >>>> ler r2 = r1.clone(); >>>> r1.end_ok(); >>>> r2.do_something_requires_active(); >>>> >>>> but it is not a problem because we do not implement any actions that are >>>> illegal in that position (outside of `end` - that _is_ a problem). >>> >>> Makes sense, but I think it's a bit weird to still be able to access it >>> after `end`ing. >> >> Yes, that is true. >> >>> >>>> >>>> >>>>>> >>>>>> What I do now is issue the an `Option<ARef<Request>>` with >>>>>> `bindings::req_ref_inc_not_zero(rq_ptr)`, to make sure that the request >>>>>> is currently owned by the driver. >>>>>> >>>>>> I guess we can check the absolute value of the refcount, and only issue >>>>>> a request handle if the count matches what we expect. Then we can be certain >>>>>> that the handle is unique, and we can require transfer of ownership of >>>>>> the handle to `Request::end` to make sure it can never be called more >>>>>> than once. >>>>>> >>>>>> Another option is to error out in `Request::end` if the >>>>>> refcount is not what we expect. >>>>> >>>>> I am a bit confused, why does the refcount matter in this case? Can't >>>>> the user just have multiple `ARef`s? >>>> >>>> Because we want to assert that we are consuming the last handle to the >>>> request. After we do that, the user cannot call `Request::end` again. >>>> `TagSet::get` will not issue a request reference if the request is not >>>> in flight. Although there might be a race condition to watch out for. >>>> >>>> When the block layer hands over ownership to Rust, the reference count >>>> is 1. The first `ARef<Request>` we create increments the count to 2. To >>>> complete the request, we must have ownership of all reference counts >>>> above 1. The block layer takes the last reference count when it takes >>>> back ownership of the request. >>>> >>>>> I think it would be weird to use `ARef<Request>` if you expect the >>>>> refcount to be 1. >>>> >>>> Yes, that would require a custom smart pointer with a `try_into_unique` >>>> method that succeeds when the refcount is exactly 2. It would consume >>>> the instance and decrement the refcount to 1. But as I said, there is a >>>> potential race with `TagSet::get` when the refcount is 1 that needs to >>>> be handled. >>>> >>>>> Maybe the API should be different? >>>> >>>> I needs to change a little, yes. >>>> >>>>> As I understand it, a request has the following life cycle (please >>>>> correct me if I am wrong): >>>>> 1. A new request is created, it is given to the driver via `queue_rq`. >>>>> 2. The driver can now decide what to do with it (theoretically it can >>>>> store it somewhere and later do something with it), but it should at >>>>> some point call `Request::start`. >>>>> 3. Work happens and eventually the driver calls `Request::end`. >>>>> >>>>> To me this does not seem like something where we need a refcount (we >>>>> still might need one for safety, but it does not need to be exposed to >>>>> the user). >>>> >>>> It would not need to be exposed to the user, other than a) ending a request >>>> can fail OR b) `TagSet::get` can fail. >>>> >>>> a) would require that ending a request must be done with a unique >>>> reference. This could be done by the user by the user calling >>>> `try_into_unique` or by making the `end` method fallible. >>>> >>>> b) would make the reference handle `!Clone` and add a failure mode to >>>> `TagSet::get`, so it fails to construct a `Request` handle if there are >>>> already one in existence. >>>> >>>> I gravitate towards a) because it allows the user to clone the Request >>>> reference without adding an additional `Arc`. >>> >>> This confuses me a little, since I thought that `TagSet::get` returns >>> `Option<ARef<Request>>`. >> >> It does, but in the current implementation the failure mode returning >> `None` is triggered when the refcount is zero, meaning that the request >> corresponding to that tag is not currently owned by the driver. For >> solution b) we would change the type to be >> `Option<CustomSmartPointerHandleThing<Request>>`. >> >>> (I tried to find the abstractions in your >>> github, but I did not find them) >> >> It's here [1]. It was introduced in the `rnvme-v6.8` branch. > > Thanks for the pointer. > >>> I think that this could work: `queue_rq` takes a `OwnedRequest`, which >>> the user can store in a `TagSet`, transferring ownership. `TagSet::get` >>> returns `Option<&Request>` and you can call `TagSet::remove` to get >>> `Option<OwnedRequest>`. `OwnedRequest::end` consumes `self`. >>> With this pattern we also do not need to take an additional refcount. >> >> It would, but the `TagSet` is just a wrapper for the C block layer >> `strugt blk_mq_tag_set`. This is a highly optimized data structure and >> tag mapping is done before the driver sees the request. I would like to >> reuse that logic. >> >> We could implement what you suggest anyhow, but I would not want to that >> additional logic to the hot path. > > I overlooked an important detail: the `TagSet` is always stored in an > `Arc` (IIRC since you want to be able to share it between different > `Gendisk`s). This probably makes my suggestion impossible, since you > can't mutably borrow the `TagSet` for removal of `Request`s. > Depending on how `Request`s are associated to a `TagSet`, there might be > a way around this: I saw the `qid` parameter to the `tag_to_rq` > function, is that a unique identifier for a queue? A tag set services a number of request queues. Each queue has a number used to identify it within the tag set. It is unique within the tag set. > Because in that case > we might be able to have a unique `QueueTagSetRef` with > > fn remove(&mut self, tag: u32) -> OwnedRequest; We would not need exclusive access. The tag set remove are synchronized internally with some fancy atomic bit flipping [1]. > > fn get(&self, tag: u32) -> Option<&Request>; > > The `TagSet` would still be shared, only the ability to "remove" (I > don't know if you do that manually in C, if not, then this would just > remove it in the abstraction, but keep it on the C side) is unique to > the `QueueTagSetRef` struct. I would not advice removing tag->request associations from the driver. I understand your point and from the perspective of these patches it makes sense. But it would be a major layer violation of the current block layer architecture, as far as I can tell. I am having trouble enough trying to justify deferred free of the request structure as it is. > But feel free to use your proposed option a), it is simpler and we can > try to make this work when you send the `TagSet` abstractions. > I just think that we should try a bit harder to make it even better. I'll code it up a) and see how it looks (and what it costs in performance) 👍 BR Andreas [1] https://github.com/metaspace/linux/blob/bf25426ad5652319528c6f87b74dd026fbedb9e8/lib/sbitmap.c#L638