On Fri, Mar 17, 2023 at 11:08:44PM -0700, Andrii Nakryiko wrote: SNIP > > > That does depend upon the load, obviously, but it's not hard to collect - > > > you already have more than enough hooks inserted in the relevant places. > > > That might give a better appreciation of the reactions... > > > > One possibility would be a bit stolen from inode flags + hash keyed by > > struct inode address (middle bits make for a decent hash function); > > inode eviction would check that bit and kick the corresponding thing > > from hash if the bit is set. > > > > Associating that thing with inode => hash lookup/insert + set the bit. > > This is an interesting idea, but now we are running into a few > unnecessary problems. We need to have a global dynamically sized hash > map in the system. If we fix the number of buckets, we risk either > wasting memory on an underutilized system (if we oversize), or > performance problems due to collisions (if we undersize) if we have a > busy system with lots of executables mapped in memory. If we don't > pre-size, then we are talking about reallocations, rehashing, and > doing that under global lock or something like that. Further, we'd > have to take locks on buckets, which causes further problems for > looking up build ID from this hashmap in NMI context for perf events > and BPF programs, as locks can't be safely taken under those > conditions, and thus fetching build ID would still be unreliable > (though less so than it is today, of course). > > All of this is solvable to some degree (but not perfectly and not with > simple and elegant approaches), but seems like an unnecessarily > overcomplication compared to the amount of memory that we hope to > save. It still feels like a Kconfig-guarded 8 byte field per struct > file is a reasonable price for gaining reliable build ID information > for profiling/tracing tools. > > > [0] https://drgn.readthedocs.io/en/latest/index.html > > [1] Script I used: > > from drgn.helpers.linux.pid import for_each_task > from drgn.helpers.linux.fs import for_each_file > > task_cnt = 0 > file_set = set() > > for task in for_each_task(prog): > task_cnt += 1 > try: > for (fd, file) in for_each_file(task): > file_set.add(file.value_()) > except: > pass > > uniq_file_cnt = len(file_set) > print(f"task_cnt={task_cnt} uniq_file_cnt={uniq_file_cnt}") great you beat me to this, I wouldn't have thought of using drgn for this ;-) I'll see if I can install it to some of our test servers thanks, jirka