On Fri, 14 Jul 2023, Reinette Chatre wrote: > On 7/14/2023 3:35 AM, Ilpo Järvinen wrote: > > On Thu, 13 Jul 2023, Reinette Chatre wrote: > >> On 7/13/2023 6:19 AM, Ilpo Järvinen wrote: > >>> Perf event fd (fd_lm) is not closed on some error paths. > >>> > >>> Always close fd_lm in get_llc_perf() and add close into an error > >>> handling block in cat_val(). > >>> > >>> Fixes: 790bf585b0ee ("selftests/resctrl: Add Cache Allocation Technology (CAT) selftest") > >>> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> > >>> --- > >>> tools/testing/selftests/resctrl/cache.c | 10 +++++----- > >>> 1 file changed, 5 insertions(+), 5 deletions(-) > >>> > >>> diff --git a/tools/testing/selftests/resctrl/cache.c b/tools/testing/selftests/resctrl/cache.c > >>> index 8a4fe8693be6..ced47b445d1e 100644 > >>> --- a/tools/testing/selftests/resctrl/cache.c > >>> +++ b/tools/testing/selftests/resctrl/cache.c > >>> @@ -87,21 +87,20 @@ static int reset_enable_llc_perf(pid_t pid, int cpu_no) > >>> static int get_llc_perf(unsigned long *llc_perf_miss) > >>> { > >>> __u64 total_misses; > >>> + int ret; > >>> > >>> /* Stop counters after one span to get miss rate */ > >>> > >>> ioctl(fd_lm, PERF_EVENT_IOC_DISABLE, 0); > >>> > >>> - if (read(fd_lm, &rf_cqm, sizeof(struct read_format)) == -1) { > >>> + ret = read(fd_lm, &rf_cqm, sizeof(struct read_format)); > >>> + close(fd_lm); > >>> + if (ret == -1) { > >>> perror("Could not get llc misses through perf"); > >>> - > >>> return -1; > >>> } > >>> > >>> total_misses = rf_cqm.values[0].value; > >>> - > >>> - close(fd_lm); > >>> - > >>> *llc_perf_miss = total_misses; > >>> > >>> return 0; > >>> @@ -253,6 +252,7 @@ int cat_val(struct resctrl_val_param *param) > >>> memflush, operation, resctrl_val)) { > >>> fprintf(stderr, "Error-running fill buffer\n"); > >>> ret = -1; > >>> + close(fd_lm); > >>> break; > >>> } > >>> > >> > >> Instead of fixing these existing patterns I think it would make the code > >> easier to understand and maintain if it is made symmetrical. > >> Having the perf event fd opened in one place but its close() > >> scattered elsewhere has the potential for confusion and making later > >> mistakes easy to miss. > >> > >> What if perf event fd is closed in a new "disable_llc_perf()" that > >> is matched with "reset_enable_llc_perf()" and called > >> from cat_val()? > >> > >> I think this raises another issue with the test trickery where > >> measure_cache_vals() has some assumptions about state based on the > >> test name. > > > > I very much agree on the principle here, and thus I already have created > > patches which will do a major cleanup on this area. The cleaned-up code > > has pe_fd local var to cat_val() and handles closing it in cat_val() with > > the usual patterns. > > > > However, the patch is currently resides post L3 CAT test rewrite. > > Backporting the cleanups/refactors into this series would require > > considerable effort due to how convoluted all those n-step cleanup patches > > and L3 CAT test rewrite are in this area. There's just very much to > > cleanup here and L3 rewrite will touch the same areas so its a net > > full of conflicts. > > > > Do you want me to spend the effort to backport them into this series > > (I expect will take some time)? > > Considering the "Fixes" tag, having a smaller fix that can easily > be backported would be ideal so I am ok with deferring a bigger > rework. > > I do think this fix can be made more robust with a couple of small > changes that should not introduce significant conflicts: > * initialize fd_lm to -1 > * do not close() fd_lm in get_llc_perf() but instead move its > close() to at exit of cat_val(). I changed the test to only close the fd in cat_val() which is the direction the later refactor/cleanup changes (not in this series) was moving anyway. > * add check in get_llc_perf() that it does not attempt ioctl() > on "fd_lm == -1" (later addition would be error checking of > the ioctl()) The other two things suggested seem unnecessary and I've not implemented them, I don't thinkg fd_lm can be -1 at ioctl(). Given this code is going to be replaced soonish, putting any extra "safety" effort into it now seems waste of time. -- i.