Ming Lei <ming.lei@xxxxxxxxxx> writes: > On Fri, Mar 15, 2024 at 06:49:39PM +0100, Andreas Hindborg wrote: >> Ming Lei <ming.lei@xxxxxxxxxx> writes: >> >> > On Fri, Mar 15, 2024 at 01:46:30PM +0100, Andreas Hindborg wrote: >> >> Ming Lei <ming.lei@xxxxxxxxxx> writes: >> >> > On Fri, Mar 15, 2024 at 08:52:46AM +0100, Andreas Hindborg wrote: >> >> >> Miguel Ojeda <miguel.ojeda.sandonis@xxxxxxxxx> writes: >> >> >> >> >> >> > On Thu, Mar 14, 2024 at 8:23 PM Andreas Hindborg <nmi@xxxxxxxxxxxx> wrote: >> >> >> >> >> >> >> >> The way the current code compiles, <kernel::block::mq::Request as >> >> >> >> kernel::types::AlwaysRefCounted>::dec_ref` is inlined into the `rnull` >> >> >> >> module. A relocation for `rust_helper_blk_mq_free_request_internal` >> >> >> >> appears in `rnull_mod.ko`. I didn't test it yet, but if >> >> >> >> `__blk_mq_free_request` (or the helper) is not exported, I don't think >> >> >> >> this would be possible? >> >> >> > >> >> >> > Yeah, something needs to be exported since there is a generic >> >> >> > involved, but even if you want to go the route of exporting only a >> >> >> > different symbol, you would still want to put it in the C header so >> >> >> > that you don't get the C missing declaration warning and so that we >> >> >> > don't have to write the declaration manually in the helper. >> >> >> >> >> >> That is what I did: >> >> >> >> >> >> @@ -703,6 +703,7 @@ int blk_mq_alloc_sq_tag_set(struct blk_mq_tag_set *set, >> >> >> unsigned int set_flags); >> >> >> void blk_mq_free_tag_set(struct blk_mq_tag_set *set); >> >> >> >> >> >> +void __blk_mq_free_request(struct request *rq); >> >> >> void blk_mq_free_request(struct request *rq); >> >> >> int blk_rq_poll(struct request *rq, struct io_comp_batch *iob, >> >> >> unsigned int poll_flags); >> >> > >> >> > Can you explain in detail why one block layer internal helper is >> >> > called into rnull driver directly? It never happens in C driver code. >> >> >> >> It is not the rust null block driver that calls this symbol directly. It >> >> is called by the Rust block device driver API. But because of inlining, >> >> the symbol is referenced from the loadable object. >> > >> > What is the exact Rust block device driver API? The key point is that how >> > the body of one exported kernel C API(EXPORT_SYMBOL) becomes inlined >> > with Rust driver. >> >> This happens when `ARef<Request<_>>` is dropped. The drop method >> (destructor) of this smart pointer decrements the refcount and >> potentially calls `__blk_mq_free_request`. >> >> >> >> >> The reason we have to call this symbol directly is to ensure proper >> >> lifetime of the `struct request`. For example in C, when a driver >> > >> > Sounds Rust API still calls into __blk_mq_free_request() directly, right? >> >> Yes, the Rust block device driver API will call this request if an >> `ARef<Request<_>>` is dropped and the refcount goes to 0. >> >> > If that is the case, the usecase need to be justified, and you need >> > to write one standalone patch with the exact story for exporting >> > __blk_mq_free_request(). >> >> Ok, I can do that. >> >> > >> >> converts a tag to a request, the developer makes sure to only ask for >> >> requests which are outstanding in the driver. In Rust, for the API to be >> >> sound, we must ensure that the developer cannot write safe code that >> >> obtains a reference to a request that is not owned by the driver. >> >> >> >> A similar issue exists in the null block driver when timer completions >> >> are enabled. If the request is cancelled and the timer fires after the >> >> request has been recycled, there is a problem because the timer holds a >> >> reference to the request private data area. >> >> >> >> To that end, I use the `atomic_t ref` field of the C `struct request` >> >> and implement the `AlwaysRefCounted` Rust trait for the request type. >> >> This is a smart pointer that owns a reference to the pointee. In this >> >> way, the request is not freed and recycled until the smart pointer is >> >> dropped. But if the smart pointer holds the last reference when it is >> >> dropped, it must be able to free the request, and hence it has to call >> >> `__blk_mq_free_request`. >> > >> > For callbacks(queue_rq, timeout, complete) implemented by driver, block >> > layer core guaranteed that the passed request reference is live. >> > >> > So driver needn't to worry about request lifetime, same with Rust >> > driver, I think smart pointer isn't necessary for using request in >> > Rust driver. >> >> Using the C API, there is nothing preventing a driver from using the >> request after the lifetime ends. > > Yes, it is true for C, so will Rust-for-linux need to add refcount for > most exported kernel C structure? such as by implementing AlwaysRefCounted > traits? Not for all types and not all the time. For instance the Rust block device driver API does not always use refcounting for `struct request`. Only when it cannot determine the lifetime of a request reference at compile time. > >> With Rust, we have to make it >> impossible.Without the refcount and associated call to >> `__blk_mq_free_request`, it would be possible to write Rust code that >> access the request after the lifetime ends. This is not sound, and it is >> something we need to avoid in the Rust abstractions. >> >> One concrete way to do write unsound code with a Rust API where lifetime >> is not tracked with refcount, is if the null block timer completion >> callback fires after the request is completed. Perhaps the driver >> cancels the request but forgets to cancel the timer. When the timer >> fires, it will access the request via the context pointer, but the >> request will be invalid. > > The issue is less serious for blk-mq request, which is pre-allocated, > and one freed request just means it can be re-allocated for other IO > in same queue, and the pointed memory won't be really freed. The issue here is not so much use after free as it is aliasing of mutable and shared references. This is illegal in Rust, and programs that allow this may exhibit undefined behavior. > Also as I mentioned, inside driver's ->timeout(), the request is > guaranteed to be live by block layer core(won't be re-allocated to other IO), > the passed-in request is referenced already, please see bt_iter() which > is called from blk_mq_timeout_work(). Here, block layer core just > borrows request, then passed the reference to ->timeout(), when > request is owned by driver actually. I understand. I am not referring to `blk_mq_opw.timeout`. The null block driver has a feature where requests complete after a delay. This is implemented via the `hrtimer` subsystem. > I understand Rust block driver still need to implement ->queue_rq(), > ->timeout(), ..., just like C driver, but maybe I am wrong? Or Rust block driver > will re-implement part of block layer core code? such as, get one extra > reference of request no matter block core has done that. The Rust block driver API implements the `blk_mq_ops` vtable on behalf of the driver. There is a little bit of glue code inserted between the C symbols called by the kernel and the Rust functions that the block device driver provides. >> In C we have to write the driver code so this >> cannot happen. In Rust, the API must prevent this from happening. So any >> driver written in the safe subset of Rust using this API can never >> trigger this behavior. >> >> By using the refcount, we ensure that the request is alive until all >> users who hold a reference to it are dropped. > > block layer has provided such guarantee if Rust driver follows current > block driver model. I understand that the driver has exclusive ownership of the request between `queue_rq()` and `complete()`. What we want to guarantee is that the author of a block device driver is not able to access a request after block layer has regained ownership of the request (after complete). The way we achieve this is by preventing block layer from calling `__blk_mq_free_request` if the Rust driver did not destroy all references to the request. I understand that this is not necessary for correct operation of a block device driver. However, it is necessary to uphold invariants of the Rust language for references. >> >> Another concrete example is when a driver calls `blk_mq_tag_to_rq` with >> an invalid tag. This can return a reference to an invalid tag, if the >> driver is not implemented correctly. By using `req_ref_inc_not_zero` we >> can assert that the request is live before we create a Rust reference to >> it, and even if the driver code has bugs, it can never access an invalid >> request, and thus it can be memory safe. >> >> We move the responsibility of correctness, in relation to memory safety, >> from the driver implementation to the API implementation. > > After queue_rq(req) is called, request ownership is actually transferred to > driver like Rust's move, then driver is free to call blk_mq_tag_to_rq(), and > finally return request to block core after the request is completed by driver. As I said, I understand and appreciate this design. But there is a possibility for a buggy driver to not obey. The Rust abstractions must prevent this buggy code from compiling at all. In Rust, it must be impossible to call `blk_mq_tag_to_rq()` for a tag that is not owned by the driver. `blk_mq_tag_to_rq` takes an integer, so the driver developer can call this function with an invalid tag like `blk_mq_tag_to_rq(tagset, 5)`. If 5 is not owned by the driver, this will return a request pointer that will alias with mutable access. When this pointer is converted to a Rust reference, this will give undefined behavior. For Rust, we must prevent code like this from compiling in the first place. > The biggest question should be how Rust block driver will be designed & > implemented? Will rust block driver follow current C driver's model, such > as implementing ->queue_rq(), ->timeout(), ->complete()...? The Rust driver API does follow the design of the C driver model. It does implement the `blk_mq_ops` vtable. Rust block drivers must implement a set of Rust functions that very closely follow this vtable. But when it is not possible to statically determine the lifetime of an object, to the point where the compiler will refuse to compile the code that violates the lifetime durations of the object, we must revert to reference counting. I understand that a correctly implemented block device driver will not violate lifetime of the request structure. Incorrect driver code might violate the lifetime though. With Rust, we must prevent incorrectly implemented drivers from compiling at all. We need the refcount for that. By only allowing access to the request through the refcounted pointer in certain cases, we prevent the safe driver code from misbehaving. It might seem cumbersome to do this dance at first. After all, we can just write the device drivers correctly without memory safety bugs. But the reward for all this work is that if a device driver is implemented in the safe subset of Rust, it _cannot_ exhibit memory safety related bugs. We don't have to review a device driver implemented in safe Rust for memory safety issues at all. Best regards, Andreas