On Tue, Feb 11, 2025 at 2:34 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 2/8/25 2:32 AM, Jason Xing wrote: > > The "is_locked_tcp_sock" flag is added to indicate that the callback > > site has a tcp_sock locked. > > It should mention that the later TX timestamping callbacks will not own the > lock. This is what this patch is primarily for. We know the background, but > future code readers may not. We will eventually become the readers of this patch > in a few years' time. > > > > > Apply the new member is_locked_tcp_sock in the existing callbacks > > It is hard to read "Apply the new member....". "Apply" could mean a few things. > "Set to 1" is clearer. > > > > where is_fullsock is set to 1 can stop UDP socket accessing struct > > The UDP part is future proof. This set does not support UDP which has to be > clear in the commit message. This has been brought up before also. > > > tcp_sock and stop TCP socket without sk lock protecting does the > > similar thing, or else it could be catastrophe leading to panic. > > > > To keep it simple, instead of distinguishing between read and write > > access, users aren't allowed all read/write access to the tcp_sock > > through the older bpf_sock_ops ctx. The new timestamping callbacks > > can use newer helpers to read everything from a sk (e.g. bpf_core_cast), > > so nothing is lost. > > (Subject): > bpf: Prevent unsafe access to the sock fields in the BPF timestamping callback > > (Why): > The subsequent patch will implement BPF TX timestamping. It will call the > sockops BPF program without holding the sock lock. > > This breaks the current assumption that all sock ops programs will hold the sock > lock. The sock's fields of the uapi's bpf_sock_ops requires this assumption. > > (What and How): > To address this, > a new "u8 is_locked_tcp_sock;" field is added. This patch sets it in the current > sock_ops callbacks. The "is_fullsock" test is then replaced by the > "is_locked_tcp_sock" test during sock_ops_convert_ctx_access(). > > The new TX timestamping callbacks added in the subsequent patch will not have > this set. This will prevent unsafe access from the new timestamping callbacks. > > Potentially, we could allow read-only access. However, this would require > identifying which callback is read-safe-only and also requires additional BPF > instruction rewrites in the covert_ctx. Since the BPF program can always read > everything from a socket (e.g., by using bpf_core_cast), this patch keeps it > simple and disables all read and write access to any socket fields through the > bpf_sock_ops UAPI from the new TX timestamping callback. > > Moreover, note that some of the fields in bpf_sock_ops are specific to tcp_sock, > and sock_ops currently only supports tcp_sock. In the future, UDP timestamping > will be added, which will also break this assumption. The same idea used in this > patch will be reused. Considering that the current sock_ops only supports > tcp_sock, the variable is named is_locked_"tcp"_sock. Thanks so much for polishing the commit message. I appreciate it! Will adjust it. Thanks, Jason