> On Jul 27, 2019, at 12:09 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: > > On Sat, Jul 27, 2019 at 11:59 AM Song Liu <songliubraving@xxxxxx> wrote: >> >> >> >>> On Jul 26, 2019, at 11:11 PM, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote: >>> >>> On Thu, Jul 25, 2019 at 12:32 PM Song Liu <songliubraving@xxxxxx> wrote: >>>> >>>> >>>> >>>>> On Jul 24, 2019, at 12:27 PM, Andrii Nakryiko <andriin@xxxxxx> wrote: >>>>> >>>>> This patch implements the core logic for BPF CO-RE offsets relocations. >>>>> All the details are described in code comments. >>>> >>>> Some description in the change log is still useful. Please at least >>>> copy-paste key comments here. >>> >>> OK, will add some more. >>> >>>> >>>> And, this is looooong. I think it is totally possible to split it into >>>> multiple smaller patches. >>> >>> I don't really know how to split it further without hurting reviewing >>> by artificially splitting related code into separate patches. Remove >>> any single function and algorithm will be incomplete. >>> >>> Let me give you some high-level overview of how pieces are put >>> together. There are 9 non-trivial functions, let's go over their >>> purpose in the orderd in which they are defined in file: >>> >>> 1. bpf_core_spec_parse() >>> >>> This one take bpf_offset_reloc's type_id and accessor string >>> ("0:1:2:3") and parses it into more convenient bpf_core_spec >>> datastructure, which has calculated offset and high-level spec >>> "steps": either named field or array access. >>> >>> 2. bpf_core_find_cands() >>> >>> Given local type name, finds all possible target BTF types with same >>> name (modulo "flavor" differences, ___flavor suffix is just ignored). >>> >>> 3. bpf_core_fields_are_compat() >>> >>> Given local and target field match, checks that their types are >>> compatible (so that we don't accidentally match, e.g., int against >>> struct). >>> >>> 4. bpf_core_match_member() >>> >>> Given named local field, find corresponding field in target struct. To >>> understand why it's not trivial, here's an example: >>> >>> Local type: >>> >>> struct s___local { >>> int a; >>> }; >>> >>> Target type: >>> >>> struct s___target { >>> struct { >>> union { >>> int a; >>> }; >>> }; >>> }; >>> >>> For both cases you can access a as s.a, but in local case, field a is >>> immediately inside s___local, while for s___target, you have to >>> traverse two levels deeper into anonymous fields to get to an `a` >>> inside anonymous union. >>> >>> So this function find that `a` by doing exhaustive search across all >>> named field and anonymous struct/unions. But otherwise it's pretty >>> straightforward recursive function. >>> >>> bpf_core_spec_match() >>> >>> Just goes over high-level spec steps in local spec and tries to figure >>> out both high-level and low-level steps for targe type. Consider the >>> above example. For both structs accessing s.a is one high-level step, >>> but for s___local it's single low-level step (just another :0 in spec >>> string), while for s___target it's three low-level steps: ":0:0:0", >>> one step for each BTF type we need to traverse. >>> >>> Array access is simpler, it's always one high-level and one low-level step. >>> >>> bpf_core_reloc_insn() >>> >>> Once we match local and target specs and have local and target >>> offsets, do the relocations - check that instruction has expected >>> local offset and replace it with target offset. >>> >>> bpf_core_find_kernel_btf() >>> >>> This is the only function that can be moved into separate patch, but >>> it's also very simple. It just iterates over few known possible >>> locations for vmlinux image and once found, tries to parse .BTF out of >>> it, to be used as target BTF. >>> >>> bpf_core_reloc_offset() >>> >>> It combines all the above functions to perform single relocation. >>> Parse spec, get candidates, for each candidate try to find matching >>> target spec. All candidates that matched are cached for given local >>> root type. >> >> Thanks for these explanation. They are really helpful. >> >> I think some example explaining each step of bpf_core_reloc_offset() >> will be very helpful. Something like: >> >> Example: >> >> struct s { >> int a; >> struct { >> int b; >> bool c; >> }; >> }; >> >> To get offset for c, we do: >> >> bpf_core_reloc_offset() { >> >> /* input data: xxx */ >> >> /* first step: bpf_core_spec_parse() */ >> >> /* data after first step */ >> >> /* second step: bpf_core_find_cands() */ >> >> /* candidate A and B after second step */ >> >> ... >> } >> >> Well, it requires quite some work to document this way. Please let me >> know if you feel this is an overkill. > > Yeah :) And it's not just work, but I think it's bad if comments > become too specific and document very low-level steps, because code > might evolve and comments can quickly get out of sync and just add to > confusion. Which is why I tried to document high-level ideas, leaving > it up to the source code to be the ultimate reference of minutia > details. Fair enough. > >> >>> >>> bpf_core_reloc_offsets() >>> >>> High-level coordination. Iterate over all per-program .BTF.ext offset >>> reloc sections, each relocation within them. Find corresponding >>> program and try to apply relocations one by one. >>> >>> >>> I think the only non-obvious part here is to understand that >>> relocation records local raw spec with every single anonymous type >>> traversal, which is not that useful when we try to match it against >>> target type, which can have very different composition, but still the >>> same field access pattern, from C language standpoint (which hides all >>> those anonymous type traversals from programmer). >>> >>> But it should be pretty clear now, plus also check tests, they have >>> lots of cases showing what's compatible and what's not. >> >> I see. I will review the tests. >> >>>>> >>>>> static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, >>>>> - __u32 id) >>>>> + __u32 id, >>>>> + __u32 *res_id) >>>>> { >>>>> const struct btf_type *t = btf__type_by_id(btf, id); >>>> >>>> Maybe have a local "__u32 rid;" >>>> >>>>> >>>>> + if (res_id) >>>>> + *res_id = id; >>>>> + >>>> >>>> and do "rid = id;" here >>>> >>>>> while (true) { >>>>> switch (BTF_INFO_KIND(t->info)) { >>>>> case BTF_KIND_VOLATILE: >>>>> case BTF_KIND_CONST: >>>>> case BTF_KIND_RESTRICT: >>>>> case BTF_KIND_TYPEDEF: >>>>> + if (res_id) >>>>> + *res_id = t->type; >>>> and here >>>> >>>>> t = btf__type_by_id(btf, t->type); >>>>> break; >>>>> default: >>>> and "*res_id = rid;" right before return? >>> >>> Sure, but why? >> >> I think it is cleaner that way. But feel free to ignore if you >> think otherwise. >> >>> >>>> >>>>> @@ -1041,7 +1049,7 @@ static const struct btf_type *skip_mods_and_typedefs(const struct btf *btf, >>>>> static bool get_map_field_int(const char *map_name, const struct btf *btf, >>>>> const struct btf_type *def, >>>>> const struct btf_member *m, __u32 *res) { >>> >>> [...] >>> >>>>> +struct bpf_core_spec { >>>>> + const struct btf *btf; >>>>> + /* high-level spec: named fields and array indicies only */ >>>> >>>> typo: indices >>> >>> thanks! >>> >>>> >>>>> + struct bpf_core_accessor spec[BPF_CORE_SPEC_MAX_LEN]; >>>>> + /* high-level spec length */ >>>>> + int len; >>>>> + /* raw, low-level spec: 1-to-1 with accessor spec string */ >>>>> + int raw_spec[BPF_CORE_SPEC_MAX_LEN]; >>>>> + /* raw spec length */ >>>>> + int raw_len; >>>>> + /* field byte offset represented by spec */ >>>>> + __u32 offset; >>>>> +}; >>> >>> [...] >>> >>>>> + * >>>>> + * int x = &s->a[3]; // access string = '0:1:2:3' >>>>> + * >>>>> + * Low-level spec has 1:1 mapping with each element of access string (it's >>>>> + * just a parsed access string representation): [0, 1, 2, 3]. >>>>> + * >>>>> + * High-level spec will capture only 3 points: >>>>> + * - intial zero-index access by pointer (&s->... is the same as &s[0]...); >>>>> + * - field 'a' access (corresponds to '2' in low-level spec); >>>>> + * - array element #3 access (corresponds to '3' in low-level spec). >>>>> + * >>>>> + */ >>>> >>>> IIUC, high-level points are subset of low-level points. How about we introduce >>>> "anonymous" high-level points, so that high-level points and low-level points >>>> are 1:1 mapping? >>> >>> No, that will just hurt and complicate things. See above explanation >>> about why we need high-level points (it's what you as C programmer try >>> to achieve vs low-level spec is what C-language does in reality, with >>> all the anonymous struct/union traversal). >>> >>> What's wrong with this separation? Think about it as recording >>> "intent" (high-level spec) vs "mechanics" (low-level spec, how exactly >>> to achieve that intent, in excruciating details). >> >> There is nothing wrong with separation. I just personally think it is >> cleaner the other way. That's why I raised the question. >> >> I will go with your assessment, as you looked into this much more than >> I did. :-) > > For me it's a machine view of the problem (raw spec) vs human view of > the problem (high-level spec, which resembles how you think about this > in C code). I'll keep it separate unless it proves to be problematic > going forward. [...] >>> >>> No. spec->offset is carefully maintained across multiple low-level >>> steps, as we traverse down embedded structs/unions. >>> >>> Think about, e.g.: >>> >>> struct s { >>> int a; >>> struct { >>> int b; >>> }; >>> }; >>> >>> Imagine you are trying to match s.b access. With what you propose >>> you'll end up with offset 0, but it should be 4. >> >> Hmm... this is just for i == 0, right? Which line updated spec->offset >> after "memset(spec, 0, sizeof(*spec));"? > > Ah, I missed that you are referring to the special i == 0 case. I can > do assignment, yes, you are right. I'll probably also extract it out > of the loop to make it less confusing. Yes, please. > >>>>> + continue; >>>>> + } >>>> >>>> Maybe pull i == 0 case out of the for loop? As I mentioned earlier. ;-) >>>> >>>>> + >>>>> + if (btf_is_composite(t)) { >>> >>> [...] >>> >>>>> + >>>>> + if (spec->len == 0) >>>>> + return -EINVAL; >>>> >>>> Can this ever happen? >>> >>> Not really, because I already check raw_len == 0 and exit with error. >>> I'll remove. >>> >>>> >>>>> + >>>>> + return 0; >>>>> +} >>>>> + >>> [...]