On 18.07.24 14:20, Alice Ryhl wrote: > On Thu, Jul 18, 2024 at 12:12 AM Benno Lossin <benno.lossin@xxxxxxxxx> wrote: >> +Soundness >> +========= >> + >> +``unsafe`` operations (e.g. ``unsafe`` functions, dereferencing raw pointers etc.) have certain >> +conditions that need to be fulfilled in order for the operation to not be UB. >> +To evaluate if the ``unsafe`` code usage is correct, one needs to consider the API that wraps said >> +``unsafe`` code. If under all possible safe uses of the API, the conditions for the ``unsafe`` >> +operation are fulfilled, the API is *sound*. Otherwise it is *unsound*. Here is a simple example:: >> + >> + pub struct Data { >> + a: usize, >> + } >> + >> + pub fn access_a(data: *mut Data) -> usize { >> + unsafe { (*data).a } >> + } >> + >> + fn main() { >> + let mut d = Data { a: 42 }; >> + println!("{}", access_a(&mut d)); >> + } >> + >> +While this example has no UB, the function ``access_a`` is unsound. This is because one could just >> +write the following safe usage:: >> + >> + println!("{}", access_a(core::ptr::null_mut())); >> + >> +And this would result in a dereference of a null pointer. >> + >> +In its essence, a sound API means that if someone only writes safe code, they can never encounter UB >> +even if they call safe code that calls ``unsafe`` code behind the scenes. > > I think this section on soundness could be more clear. You never > really give a definition for what soundness is; you just jump into an > example immediately. I would argue that the sentence "If under all possible safe uses of the API, the conditions for the ``unsafe`` operation are fulfilled, the API is *sound*. Otherwise it is *unsound*." is a definition. Is this lacking anything, or do you have an idea to improve the wording? I am not really understanding what kind of additional definition you would like to see. > It may be nice to talk about how a sound API must prevent memory > safety issues, even if the safe code required to do so is silly or > unrealistic. Doing so is necessary to maintain the "safe code never > has memory safety bugs" guarantee. That is a good way to phrase things, I will add that. >> +Because unsoundness issues have the potential for allowing safe code to experience UB, they are >> +treated similarly to actual bugs with UB. Their fixes should also be included in the stable tree. >> + >> +Safety Documentation >> +==================== >> + >> +After trying to minimize and remove as much ``unsafe`` code as possible, there still is some left. >> +This is because some things are just not possible in only safe code. This last part of ``unsafe`` >> +code must still be correct. Helping with that is the safety documentation: it meticulously documents >> +the various requirements and justifications for every line of ``unsafe`` code. That way it can be >> +ensured that all ``unsafe`` code is sound without anyone needing to know the whole kernel at once. >> +The gist of the idea is this: every ``unsafe`` operation documents its requirements and every >> +location that uses an ``unsafe`` operation documents for every requirement a justification why they >> +are fulfilled. If now all requirements and justifications are correct, then there can only be sound >> +``unsafe`` code. >> + >> +The ``unsafe`` keywords has two different meanings depending on the context it is used in: >> + >> +* granting access to an unchecked operation, >> +* declaring that something is an unchecked operation. >> + >> +In both cases we have to add safety documentation. In the first case, we have to justify why we can >> +always guarantee that the requirements of the unchecked operation are fulfilled. In the second case, >> +we have to list the requirements that have to be fulfilled for the operation to be sound. >> + >> +In the following sections we will go over each location where ``unsafe`` can be used. >> + >> +.. _unsafe-Functions: >> + >> +``unsafe`` Functions >> +-------------------- >> + >> +``unsafe`` on function declarations is used to state that this function has special requirements >> +that callers have to ensure when calling the function:: >> + >> + unsafe fn foo() { >> + // ... >> + } >> + >> +These requirements are called the safety requirements of the function. These requirements can take >> +any shape and range from simple requirements like "``ptr_arg`` is valid" (``ptr_arg`` refers to some >> +argument with the type matching ``*mut T`` or ``*const T``) to more complex requirements like >> +"``ptr`` must be valid, point to a ``NUL``-terminated C string, and it must be valid for at least >> +``'a``. While the returned value is alive, the memory at ``ptr`` must not be mutated.". >> + >> +The safety requirements have to be documented in the so called safety section:: >> + >> + /// <oneline description of the function> >> + /// >> + /// <full description of the function> >> + /// >> + /// # Safety >> + /// >> + /// <safety requirements> >> + unsafe fn foo() { >> + // ... >> + } > > I would love to see a proper example here. Depending on how the discussion goes with this series, I plan to go through the tree and try to improve the safety comments. That way I will have a good source for examples, since otherwise they are extremely hard to come up with. (I am also open to any suggestions :) >> +.. _unsafe-Blocks: >> + >> +``unsafe`` Blocks >> +----------------- >> + >> +``unsafe`` code blocks are used to call ``unsafe`` functions and perform built-in ``unsafe`` >> +operations such as dereferencing a raw pointer:: >> + >> + unsafe { foo() }; >> + >> +In order to ensure that all safety requirements of ``unsafe`` operations are upheld, a safety >> +comment is mandatory for all ``unsafe`` blocks. This safety comment needs to provide a correct >> +justification for every safety requirements of every operation within the block:: >> + >> + // SAFETY: <justifications> >> + unsafe { foo() }; > > I think it is worth explicitly pointing out that the safety comment > must explain why the preconditions are satisfied, *not* what the > preconditions are. It's a really really common mistake to mix up > these, and it probably even makes sense to include two examples > showing the difference. Good idea, but I will put that into justifications.rst. >> +For transparency it is best practice to have only a single ``unsafe`` operation per ``unsafe`` >> +block, since then it is more clear what the justifications are trying to justify. Safe operations >> +should not be included in the block, since it adds confusion as to which operation is the ``unsafe`` >> +one. In certain cases however it makes it easier to understand if there is only a single ``unsafe`` >> +block. For example:: >> + >> + // SAFETY: `ptr` is valid for writes. >> + unsafe { >> + (*ptr).field1 = 42; >> + (*ptr).field2 = 24; >> + (*ptr).field3 = 2442; >> + } >> + >> +In this case it is more readable to not split the block into multiple parts. > > It would be nice with an example of a completely normal safety comment > example. Also would be nice to call out that the semicolon goes > outside the unsafe block to improve formatting, when it's a single > operation. Will add. > It would also be nice with an example that shows how to reference the > safety requirements of the current function. (That is, an example that > combines unsafe fn with unsafe block.) See above. --- Cheers, Benno