On 7/13/20 4:25 PM, Alexei Starovoitov wrote:
On Mon, Jul 13, 2020 at 09:17:42AM -0700, Yonghong Song wrote:
Two new readonly buffer PTR_TO_RDONLY_BUF or
PTR_TO_RDONLY_BUF_OR_NULL register states
are introduced. These new register states will be used
by later bpf map element iterator.
New register states share some similarity to
PTR_TO_TP_BUFFER as it will calculate accessed buffer
size during verification time. The accessed buffer
size will be later compared to other metrics during
later attach/link_create time.
Two differences between PTR_TO_TP_BUFFER and
PTR_TO_RDONLY_BUF[_OR_NULL].
PTR_TO_TP_BUFFER is for write only
and PTR_TO_RDONLY_BUF[_OR_NULL] is for read only.
In addition, a rdonly_buf_seq_id is also added to the
register state since it is possible for the same program
there could be two PTR_TO_RDONLY_BUF[_OR_NULL] ctx arguments.
For example, for bpf later map element iterator,
both key and value may be PTR_TO_TP_BUFFER_OR_NULL.
Similar to reg_state PTR_TO_BTF_ID_OR_NULL in bpf
iterator programs, PTR_TO_RDONLY_BUF_OR_NULL reg_type and
its rdonly_buf_seq_id can be set at
prog->aux->bpf_ctx_arg_aux, and bpf verifier will
retrieve the values during btf_ctx_access().
Later bpf map element iterator implementation
will show how such information will be assigned
during target registeration time.
...
struct bpf_ctx_arg_aux {
u32 offset;
enum bpf_reg_type reg_type;
+ u32 rdonly_buf_seq_id;
};
+#define BPF_MAX_RDONLY_BUF 2
+
struct bpf_prog_aux {
atomic64_t refcnt;
u32 used_map_cnt;
@@ -693,6 +699,7 @@ struct bpf_prog_aux {
u32 attach_btf_id; /* in-kernel BTF type id to attach to */
u32 ctx_arg_info_size;
const struct bpf_ctx_arg_aux *ctx_arg_info;
+ u32 max_rdonly_access[BPF_MAX_RDONLY_BUF];
I think PTR_TO_RDONLY_BUF approach is too limiting.
I think the map value should probably be writable from the beginning,
but I don't see how this RDONLY_BUF support can be naturally extended.
Agreed. Let me try to make map value read/write-able.
One thing we discussed earlier is whether and how we could make
map element deletable during iterator traversal. I will explore
this as well.
Also key and value can be large, so just load/store is going to be
limiting pretty quickly. People would want to use helpers to access
key/value areas. I think any existing helper that accepts ARG_PTR_TO_MEM
should be usable with data from this key/value.
This is a useful suggestion. I actually indeed hacked trying to
allow
bpf_seq_write(seq, buf, buf_size) accepts rdonly_buf register state
so bpf iterator can also copy key/value to user space through seq_file.
The bpf_seq_write 2nd arg is ARG_PTR_TO_MEM. This actually works.
I originally planned to have this as a followup. Since you mentioned
this, I will incorporate it in the next revision.
PTR_TO_TP_BUFFER was a quick hack for tiny scratch area.
Here I think the verifier should be smart from the start. >
The next patch populates bpf_ctx_arg_aux with hardcoded 0 and 1.
imo that's too hacky. Helper definitions shouldn't be in business
of poking into such verifier internals.
The reason I am using 0/1 so later on I can easily correlate
which rdonly_buf access size corresponds to key or value. I guess
I can have a verifier callback to given an ctx argument index to
get the access size.