On Sun, Apr 28, 2024 at 6:47 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > On Sat, Apr 27, 2024 at 12:51 AM Andrii Nakryiko > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > On Thu, Apr 25, 2024 at 10:05 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > On Fri, Apr 26, 2024 at 2:15 AM Andrii Nakryiko > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > On Wed, Apr 24, 2024 at 10:37 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > > > > On Thu, Apr 25, 2024 at 8:34 AM Andrii Nakryiko > > > > > <andrii.nakryiko@xxxxxxxxx> wrote: > > > > > > > > > > > > On Thu, Apr 11, 2024 at 6:51 AM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > > > > > > > > On Thu, Apr 11, 2024 at 9:11 PM Yafang Shao <laoar.shao@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > > Three new kfuncs, namely bpf_iter_bits_{new,next,destroy}, have been > > > > > > > > added for the new bpf_iter_bits functionality. These kfuncs enable the > > > > > > > > iteration of the bits from a given address and a given number of bits. > > > > > > > > > > > > > > > > - bpf_iter_bits_new > > > > > > > > Initialize a new bits iterator for a given memory area. Due to the > > > > > > > > limitation of bpf memalloc, the max number of bits to be iterated > > > > > > > > over is (4096 * 8). > > > > > > > > - bpf_iter_bits_next > > > > > > > > Get the next bit in a bpf_iter_bits > > > > > > > > - bpf_iter_bits_destroy > > > > > > > > Destroy a bpf_iter_bits > > > > > > > > > > > > > > > > The bits iterator can be used in any context and on any address. > > > > > > > > > > > > > > > > Changes: > > > > > > > > - v5->v6: > > > > > > > > - Add positive tests (Andrii) > > > > > > > > - v4->v5: > > > > > > > > - Simplify test cases (Andrii) > > > > > > > > - v3->v4: > > > > > > > > - Fix endianness error on s390x (Andrii) > > > > > > > > - zero-initialize kit->bits_copy and zero out nr_bits (Andrii) > > > > > > > > - v2->v3: > > > > > > > > - Optimization for u64/u32 mask (Andrii) > > > > > > > > - v1->v2: > > > > > > > > - Simplify the CPU number verification code to avoid the failure on s390x > > > > > > > > (Eduard) > > > > > > > > - bpf: Add bpf_iter_cpumask > > > > > > > > https://lwn.net/Articles/961104/ > > > > > > > > - bpf: Add new bpf helper bpf_for_each_cpu > > > > > > > > https://lwn.net/Articles/939939/ > > > > > > > > > > > > > > > > Yafang Shao (2): > > > > > > > > bpf: Add bits iterator > > > > > > > > selftests/bpf: Add selftest for bits iter > > > > > > > > > > > > > > > > kernel/bpf/helpers.c | 120 +++++++++++++++++ > > > > > > > > .../selftests/bpf/prog_tests/verifier.c | 2 + > > > > > > > > .../selftests/bpf/progs/verifier_bits_iter.c | 127 ++++++++++++++++++ > > > > > > > > 3 files changed, 249 insertions(+) > > > > > > > > create mode 100644 tools/testing/selftests/bpf/progs/verifier_bits_iter.c > > > > > > > > > > > > > > > > -- > > > > > > > > 2.39.1 > > > > > > > > > > > > > > > > > > > > > > It appears that the test case failed on s390x when the data is > > > > > > > a u32 value because we need to set the higher 32 bits. > > > > > > > will analyze it. > > > > > > > > > > > > > > > > > > > Hey Yafang, did you get a chance to debug and fix the issue? > > > > > > > > > > > > > > > > Hi Andrii, > > > > > > > > > > Apologies for the delay; I recently returned from an extended holiday. > > > > > > > > > > The issue stems from the limitations of bpf_probe_read_kernel() on > > > > > s390 architecture. The attachment provides a straightforward example > > > > > to illustrate this issue. The observed results are as follows: > > > > > > > > > > Error: #463/1 verifier_probe_read/probe read 4 bytes > > > > > 8897 run_subtest:PASS:obj_open_mem 0 nsec > > > > > 8898 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > 8899 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > 8900 run_subtest:FAIL:659 Unexpected retval: 2817064 != 512 > > > > > > > > > > Error: #463/2 verifier_probe_read/probe read 8 bytes > > > > > 8903 run_subtest:PASS:obj_open_mem 0 nsec > > > > > 8904 run_subtest:PASS:unexpected_load_failure 0 nsec > > > > > 8905 do_prog_test_run:PASS:bpf_prog_test_run 0 nsec > > > > > 8906 run_subtest:FAIL:659 Unexpected retval: 0 != 512 > > > > > > > > > > More details can be found at: https://github.com/kernel-patches/bpf/pull/6872 > > > > > > > > > > Should we consider this behavior of bpf_probe_read_kernel() as > > > > > expected, or is it something that requires fixing? > > > > > > > > > > > > > I might be missing something, but there is nothing wrong with > > > > bpf_probe_read_kernel() behavior. In "read 4" case you are overwriting > > > > only upper 4 bytes of u64, so lower 4 bytes are garbage. In "read 8" > > > > you are reading (upper) 4 bytes of garbage from uninitialized > > > > data_dst. > > > > > > The issue doesn't lie with the dst but rather with the src. Even after > > > initializing the destination, the operation still fails. You can find > > > > Are you sure the operation "fails"? If it would fail, you'd get a > > negative error code, but you are getting zero. Which actually makes > > sense. > > > > I think you are just getting confused by big endianness of s390x, and > > there is nothing wrong with bpf_probe_read_kernel(). > > > > In both of your tests (I pasted your code below, it would be better if > > you did it in your initial emails) you end up with 0x200 in *upper* 32 > > bits (on big endian) and lower bits are zeros. And __retval thing is > > 32-bit (despite BPF program returning long), so this return value is > > truncated to *lower* 32-bits, which are, expectedly, zeroes. > > Thank you for clarifying. The presence of the 32-bit __retval led to > my misunderstanding :( > > > > > So I think everything works as expected, but your tests (at least) > > don't handle the big-endian arch well. > > The issue arises when the dst and src have different sizes, causing > bpf_probe_read_kernel_common() to handle them poorly on big-endian > machines. To address this, we need to calculate the offset for > copying, as demonstrated by the following > > bpf_probe_read_kernel_common(&kit->bits_copy + offset, size, > unsafe_ptr__ign); > > One might wonder why this calculation is not incorporated directly > into the implementation of bpf_probe_read_kernel_common() ? Let's stop talking about bpf_probe_read_kernel/bpf_probe_read_kernel_common doing anything wrong or not handling anything wrong. It's not, it's correct. Your code is *using* it wrong, and that's what we'll have to fix. The contract of bpf_probe_read_kernel is in terms of memory addresses of source/destination *bytes* and the *common* size of both source and destination. bpf_probe_read_kernel() cannot know that you are passing a pointer to int or long or whatever, it's just a void * pointer. So if you are writing bytes over long, you need to take care of adjusting offsets to accommodate big-endian. It's a distraction to talk about bpf_probe_read_kernel() here (but it's useful to understand its contract). > > > > > __description("probe read 4 bytes") > > __success __retval(0x200) > > long probe_read_4(void) > > { > > int data = 0x200; > > long data_dst = 0; > > int err; > > > > err = bpf_probe_read_kernel(&data_dst, 4, &data); > > if (err) > > return err; > > > > return data_dst; > > } > > > > SEC("syscall") > > __description("probe read 8 bytes") > > __success __retval(0x200) > > long probe_read_8(void) > > { > > int data = 0x200; > > long data_dst = 0; > > int err; > > > > err = bpf_probe_read_kernel(&data_dst, 8, &data); > > if (err) > > return err; > > > > return data_dst; > > > > } > > > > > more details in the following link: > > > https://github.com/kernel-patches/bpf/pull/6882. It appears that > > > bpf_probe_read_kernel() encounters difficulties when dealing with > > > non-long-aligned source addresses. > > > > > > > > > > > So getting back to iter implementation. Make sure you are > > > > zero-initializing that u64 value you are reading into? > > > > > > > > > > It has been zero-initialized: > > > > > > + kit->nr_bits = 0; > > > + kit->bits_copy = 0; > > > > > > > ok, then the problem is somewhere else, but it doesn't seem to be in > > bpf_probe_read_kernel(). I'm forgetting what was the original test > > failure for your patch set, but please double check again, taking into > > account the big endianness of s390x. > > > > If we aim to make it compatible with s390, we need to introduce some > constraints regarding the bits iteration. > > 1. We must replace nr_bits with size: > > bpf_iter_bits_new(struct bpf_iter_bits *it, const void > *unsafe_ptr__ign, u32 size) > > 2. The size must adhere to alignment requirements: > > if (size <= sizeof(u64)) { > int offset = IS_ENABLED(CONFIG_S390) ? sizeof(u64) - size : 0; > > switch (size) { > case 1: > case 2: > case 4: > case 8: > break; > default: > return -EINVAL; > } > > err = bpf_probe_read_kernel_common(((char > *)&kit->bits_copy) + offset, size, unsafe_ptr__ign); > if (err) > return -EFAULT; > > kit->size = size; > kit->bit = -1; > return 0; > } > > /* Not long-aligned */ > if (size & (sizeof(unsigned long) - 1)) > return -EINVAL; > > .... > > Does this meet your expectations? > I don't think you need to add any restrictions or change anything to be byte-sized. You just need to calculate byte offset and do a bit shift to place N bits from the mask into upper N bits of long on big-endian. You might need to do some masking even for little endian as well. Which is why I suggested investing in simple but thorough tests. Write down a few bit mask patterns of variable length (not just multiple-of-8 sizes) and think about the sequence of bits that should be returned. And then codify that. Then check that both little- and big-endian arches work correctly. This is all a bit tricky because kernel's find_next_bit() makes the assumption that bit masks are long-based, while this bits iterator protocol is based on bit sizes, which are not necessarily multiples of 8 bits. So after you copy memory as bytes, you might need to clear (and shift, for big endian) some bits. Use simple but non-symmetrical bit masks to test everything. > -- > Regards > > > > Yafang