Re: [PATCH v2 dwarves 5/5] btf_encoder: delay function addition to check for function prototype inconsistencies

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Em Mon, Jan 30, 2023 at 09:20:37AM -0800, Alexei Starovoitov escreveu:
> On Mon, Jan 30, 2023 at 02:29:45PM +0000, Alan Maguire wrote:
> > There are multiple sources of inconsistency that can result in
> > functions of the same name having multiple prototypes:
> > 
> > - multiple static functions in different CUs share the same name
> > - static and external functions share the same name
> > 
> > Here we attempt to catch such cases by finding inconsistencies
> > across CUs using the save/compare/merge mechanisms that were
> > previously introduced to handle optimized-out parameters,
> > using it for all functions.
> > 
> > For two instances of a function to be considered consistent:
> > 
> > - number of parameters must match
> > - parameter names must match
> > 
> > The latter is a less strong method than a full type
> > comparison but suffices to match functions.
> > 
> > With these changes, we see 278 functions removed due to
> > protoype inconsistency.  For example, wakeup_show()
> > has two distinct prototypes:
> > 
> > static ssize_t wakeup_show(struct kobject *kobj,
> >                            struct kobj_attribute *attr, char *buf)
> > (from kernel/irq/irqdesc.c)
> > 
> > static ssize_t wakeup_show(struct device *dev, struct device_attribute *attr,
> >                            char *buf)
> > (from drivers/base/power/sysfs.c)
> > 
> > In some other cases, the parameter comparisons weed out additional
> > inconsistencies in "."-suffixed functions across CUs.
> > 
> > We also see a large number of functions eliminated due to
> > optimized-out parameters; 2542 functions are eliminated for this
> > reason, both "."-suffixed (1007) and otherwise (1535).
> 
> imo it's a good thing.
> 
> > Because the save/compare/merge process occurs for all functions
> > it is important to assess performance effects.  In addition,
> > prior to these changes the number of functions ultimately
> > represented in BTF was non-deterministic when pahole was
> > run with multiple threads.  This was due to the fact that
> > functions were marked as generated on a per-encoder basis
> > when first added, and as such the same function could
> > be added multiple times for different encoders, and if they
> > encountered inconsistent function prototypes, deduplication
> > could leave multiple entries in place for the same name.
> > When run in a single thread, the "generated" state associated
> > with the name would prevent this.
> > 
> > Here we assess both BTF encoding performance and determinism
> > of the function representation in baseline compared to with
> > these changes.  Determinism is assessed by counting the
> > number of functions in BTF.  Comparisons are done for 1,
> > 4 and 8 threads.
> > 
> > Baseline
> > 
> > $ time LLVM_OBJCOPY=objcopy pahole -J vmlinux
> > 
> > real	0m18.160s
> > user	0m17.179s
> > sys	0m0.757s
> > 
> > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|wc -l
> > 51150
> > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|uniq|wc -l
> > 51150
> > 
> > $ time LLVM_OBJCOPY=objcopy pahole -J -j4 vmlinux
> > 
> > real	0m8.078s
> > user	0m17.978s
> > sys	0m0.732s
> > 
> > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|wc -l
> > 51592
> > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|uniq|wc -l
> > 51150
> > 
> > $ time LLVM_OBJCOPY=objcopy pahole -J -j8 vmlinux
> > 
> > real	0m7.075s
> > user	0m19.010s
> > sys	0m0.587s
> > 
> > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|wc -l
> > 51683
> > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|uniq|wc -l
> > 51150
> 
> Ouch. I didn't realize it is so random currently.
> 
> > Test:
> > 
> > $ time LLVM_OBJCOPY=objcopy pahole -J  vmlinux
> > 
> > real	0m19.039s
> > user	0m17.617s
> > sys	0m1.419s
> > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|wc -l
> > 49871
> > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|uniq|wc -l
> > 49871
> > 
> > $ time LLVM_OBJCOPY=objcopy pahole -J -j4 vmlinux
> > 
> > real	0m8.482s
> > user	0m18.233s
> > sys	0m2.412s
> > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|wc -l
> > 49871
> > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|uniq|wc -l
> > 49871
> > 
> > $ time LLVM_OBJCOPY=objcopy pahole -J -j8 vmlinux
> > 
> > real	0m7.614s
> > user	0m19.384s
> > sys	0m3.739s
> > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|wc -l
> > 49871
> > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|uniq|wc -l
> > 
> > So there is a small cost in performance, but we improve determinism
> > and the consistency of representation.
> 
> This is a great fix.
> 
> I'm not an expert in this code base, but patches look good to me.
> Thank you for fixing it.

And all the description of the problem and of the solution, limitations,
together with a summary of the review comments, its a pleasure to
process a patch series like this one :-)

Doing that now and performing the usual tests,

Thanks,

- Arnaldo
 
> > For now it is better to have an incomplete representation
> > that more accurately reflects the actual function parameters
> > used, removing inconsistencies that could otherwise do harm.
> 
> +1

-- 

- Arnaldo



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux