Re: [PATCH bpf-next v1 03/19] bpf: add bpf_map iterator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 4/29/20 8:06 PM, Alexei Starovoitov wrote:
On 4/29/20 1:15 PM, Yonghong Song wrote:

Even without lseek stop() will be called multiple times.
If I read seq_file.c correctly it will be called before
every copy_to_user(). Which means that for a lot of text
(or if read() is done with small buffer) there will be
plenty of start,show,show,stop sequences.


Right start/stop can be called multiple times, but seems like there
are clear indicators of beginning of iteration and end of iteration:
- start() with seq_num == 0 is start of iteration (can be called
multiple times, if first element overflows buffer);
- stop() with p == NULL is end of iteration (seems like can be called
multiple times as well, if user keeps read()'ing after iteration
completed).

There is another problem with stop(), though. If BPF program will
attempt to output anything during stop(), that output will be just
discarded. Not great. Especially if that output overflows and we need

The stop() output will not be discarded in the following cases:
    - regular show() objects overflow and stop() BPF program not called
    - regular show() objects not overflow, which means iteration is done,
      and stop() BPF program does not overflow.

The stop() seq_file output will be discarded if
    - regular show() objects not overflow and stop() BPF program output
      overflows.
    - no objects to iterate, BPF program got called, but its seq_file
      write/printf will be discarded.

Two options here:
   - implement Alexei suggestion to look ahead two elements to
     always having valid object and indicating the last element
     with a special flag.
   - Per Andrii's suggestion below to implement new way or to
     tweak seq_file() a little bit to resolve the above cases
     where stop() seq_file outputs being discarded.

Will try to experiment with both above options...


to re-allocate buffer.

We are trying to use seq_file just to reuse 140 lines of code in
seq_read(), which is no magic, just a simple double buffer and retry
piece of logic. We don't need lseek and traverse, we don't need all
the escaping stuff. I think bpf_iter implementation would be much
simpler if bpf_iter had better control over iteration. Then this whole
"end of iteration" behavior would be crystal clear. Should we maybe
reconsider again?

That's what I was advocating for some time now.

I think seq_file is barely usable as a /proc extension and completely
unusable for iterating.
All the discussions in the last few weeks are pointing out that
majority of use cases are in the iterating space instead of dumping.
Dumping human readable strings as unstable /proc extension is
a small subset. So I think we shouldn't use fs/seq_file.c.
The dance around double op->next() or introducing op->finish()
into seq_ops looks like fifth wheel to the car.
I think bpf_iter semantics and bpf prog logic would be much simpler
and easier to understand if op->read method was re-implemented
for the purpose of iterating the objects.
I mean seq_op->start/next/stop can be reused as-is to iterate
existing kernel objects like sockets, but seq_read() will not be
used. We should explicitly disable lseek and write on our
cat-able files and use new bpf_seq_read() as .read op.
This specialized bpf_seq_read() will still do a sequences of
start/show/show/stop for every copy_to_user, but we don't need to
add finish() to seq_op and hack existing seq_read().
We also will be able to provide precise seq_num into the program
instead of approximation.
bpf_seq_read wouldn't need to deal with ppos and traverse.
It wouldn't need fancy m->size<<=1 retries.
It can allocate fixed PAGE_SIZE and be done with it.
It's fine to restrict bpf progs to not dump more than 4k
chracters per object.
And we can call bpf_iter prog exactly once per element.
Plenty of pros and no real cons.

This may indeed be simpler and scalarable since it is specific for our use case compared to ouble next with tweaking the
existing target (ipv6_route, netlink, etc.). Will explore
this way.



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux