Alan Maguire <alan.maguire@xxxxxxxxxx> writes: > On 12/02/2025 00:49, Stephen Brennan wrote: >> [...] >> to feedback or suggestions regarding this. I implemented it in pahole >> because I'm most familiar with that code, and because it seems to me like >> it's reasonable for libbpf to expect that the input variable information is >> already deduplicated. >> > > I've been thinking about whether core BTF deduplication should handle > this case - I'll try and lay out the process and maybe we can think > about whether it's better to solve in core dedup or within pahole. > > The deduplication of VARs should be straightforward - they are > considered reference types and since in cases like this they share a > name and refer to the same type, the reference type deduplication could > be extended to cover them. I wonder if there's a hidden catch here... DECL_TAGs can refer to VARs, but as far as I can tell, the semantics are that the DECL_TAG actually *modifies* the VAR - it says "this variable declaration had this annotation / tag / whatever". This is fundamentally because the var_secinfo doesn't point at a chain of DECL_TAGs, it points directly at the VAR. (Compare that to, e.g. const/volatile/restrict qualifiers) So, in a hypothetical situation where there are two variables of the same name & type, but one of them has a DECL_TAG referring to it, would the deduplication keep these separate? It seems to me that would be the correct behavior. > However, the DATASEC references to such > variables are a bit trickier. > > As seen above, the kernel currently disallows DATASEC btf_var_secinfo > references that are overlapping, i.e. if I have > > [123520] DATASEC '.data..percpu' size=229632 vlen=392 > type_id=7708 offset=0 size=48 (VAR 'fixed_percpu_data') > type_id=5559 offset=4096 size=4096 (VAR 'cpu_debug_store') > type_id=7060 offset=8192 size=16384 (VAR 'irq_stack_backing_store') > > > ..the kernel enforces that irq_stack_backing_store starts at >= the > offset of the previous var (cpu_debug_store) + its size (4096+4096 in > this case). This is why the kernel rejects BTF with multiple instances > of the same variable, since they overlap. > > So if we consider the case of deduplicating variables; before dedup we > would have something like this in the DATASEC > > type_id=8188 offset=256 size=48 (VAR 'foo') > type_id=9190 offset=256 size=48 (VAR 'foo') > > > ...and after dedup + remapping it would be: > > type_id=8188 offset=256 size=48 (VAR 'foo') > type_id=8188 offset=256 size=48 (VAR 'foo') > > This still violates the overlap check. So there are a few options here: > > - change the kernel to relax overlap check when multiple references have > same type id/offset/size; i.e. they refer to the same variable > - have pahole weed out such occurrences > - something else? > > Ideally we don't want to have to resize DATASECs with such duplicate > entries as that would add more complexity to dedup. Do you think you could share a bit more about the added complexity & cost? I admittedly don't have any knowledge about the deduplicator, but naively it seems like the ideal solution would be to delete the duplicated var_secinfo and resize the DATASEC. > Anyway I'd be interested to hear what others think about whether solving > this in BTF dedup itself or pahole makes most sense. Thanks! Thanks for taking a look into this! Hoping to hear from other experienced voices on this question as well. Thanks, Stephen