On Mon, Jan 30, 2023 at 02:29:45PM +0000, Alan Maguire wrote: > There are multiple sources of inconsistency that can result in > functions of the same name having multiple prototypes: > > - multiple static functions in different CUs share the same name > - static and external functions share the same name > > Here we attempt to catch such cases by finding inconsistencies > across CUs using the save/compare/merge mechanisms that were > previously introduced to handle optimized-out parameters, > using it for all functions. > > For two instances of a function to be considered consistent: > > - number of parameters must match > - parameter names must match > > The latter is a less strong method than a full type > comparison but suffices to match functions. > > With these changes, we see 278 functions removed due to > protoype inconsistency. For example, wakeup_show() > has two distinct prototypes: > > static ssize_t wakeup_show(struct kobject *kobj, > struct kobj_attribute *attr, char *buf) > (from kernel/irq/irqdesc.c) > > static ssize_t wakeup_show(struct device *dev, struct device_attribute *attr, > char *buf) > (from drivers/base/power/sysfs.c) > > In some other cases, the parameter comparisons weed out additional > inconsistencies in "."-suffixed functions across CUs. > > We also see a large number of functions eliminated due to > optimized-out parameters; 2542 functions are eliminated for this > reason, both "."-suffixed (1007) and otherwise (1535). imo it's a good thing. > Because the save/compare/merge process occurs for all functions > it is important to assess performance effects. In addition, > prior to these changes the number of functions ultimately > represented in BTF was non-deterministic when pahole was > run with multiple threads. This was due to the fact that > functions were marked as generated on a per-encoder basis > when first added, and as such the same function could > be added multiple times for different encoders, and if they > encountered inconsistent function prototypes, deduplication > could leave multiple entries in place for the same name. > When run in a single thread, the "generated" state associated > with the name would prevent this. > > Here we assess both BTF encoding performance and determinism > of the function representation in baseline compared to with > these changes. Determinism is assessed by counting the > number of functions in BTF. Comparisons are done for 1, > 4 and 8 threads. > > Baseline > > $ time LLVM_OBJCOPY=objcopy pahole -J vmlinux > > real 0m18.160s > user 0m17.179s > sys 0m0.757s > > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|wc -l > 51150 > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|uniq|wc -l > 51150 > > $ time LLVM_OBJCOPY=objcopy pahole -J -j4 vmlinux > > real 0m8.078s > user 0m17.978s > sys 0m0.732s > > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|wc -l > 51592 > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|uniq|wc -l > 51150 > > $ time LLVM_OBJCOPY=objcopy pahole -J -j8 vmlinux > > real 0m7.075s > user 0m19.010s > sys 0m0.587s > > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|wc -l > 51683 > $ grep " FUNC " /tmp/vmlinux.btf.base |awk '{print $3}'|sort|uniq|wc -l > 51150 Ouch. I didn't realize it is so random currently. > Test: > > $ time LLVM_OBJCOPY=objcopy pahole -J vmlinux > > real 0m19.039s > user 0m17.617s > sys 0m1.419s > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|wc -l > 49871 > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|uniq|wc -l > 49871 > > $ time LLVM_OBJCOPY=objcopy pahole -J -j4 vmlinux > > real 0m8.482s > user 0m18.233s > sys 0m2.412s > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|wc -l > 49871 > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|uniq|wc -l > 49871 > > $ time LLVM_OBJCOPY=objcopy pahole -J -j8 vmlinux > > real 0m7.614s > user 0m19.384s > sys 0m3.739s > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|wc -l > 49871 > $ bpftool btf dump file vmlinux | grep ' FUNC ' |sort|uniq|wc -l > > So there is a small cost in performance, but we improve determinism > and the consistency of representation. This is a great fix. I'm not an expert in this code base, but patches look good to me. Thank you for fixing it. > For now it is better to have an incomplete representation > that more accurately reflects the actual function parameters > used, removing inconsistencies that could otherwise do harm. +1