On Thu, Aug 20, 2020 at 10:39:19PM +0200, Martin Wilck wrote: > Hi Ben, > > I need to get back to this old discussion. I didn't resend this patch > last year, because I tried to figure out how to solve the memory leaks > you mentioned. > > On Thu, 2019-01-17 at 10:59 +0100, Martin Wilck wrote: > > On Wed, 2019-01-16 at 17:40 -0600, Benjamin Marzinski wrote: > > > On Fri, Jan 04, 2019 at 06:59:12PM +0100, Martin Wilck wrote: > > > > Cancel the other threads before taking vecs->lock. This avoids > > > > delays during shutdown caused e.g. by the checker thread holding > > > > the vecs lock. > > > > > > Before this change, multipathd was guaranteed that once a thread > > > had > > > locked the vecs lock, and checked if it had been cancelled, it > > > could > > > not > > > be cancelled after that until it unlocked the vecs lock. Undoing > > > this > > > guarantee will likely make it possible for multipathd to leak > > > memory > > > where it wasn't possible before. > > > > Thanks for pointing that out. > > > > I wasn't aware of this guarantee. In my latest valgrind tests, > > valgrind > > reported no leaks, but multipathd was also not "clean" in the sense > > that every chunk of memory malloc()d had been explicitly free()d at > > exit. IIRC that hadn't been caused by any patches added recently. > > I haven't had time to look at that further, I was satisfied with no > > real leaks being reported. > > > > We do free the global data structures in "vecs" when we exit. So any > > possible leaks caused by this patch must be cases where temporary > > memory is allocated without proper pthread_cleanup_push(), or where a > > thread was cancelled between allocation of memory and setting a > > reference to it in the global data structures - e.g. between > > allocation > > of a path and adding it to vecs->pathvec. > > > > I haven't audited either class of leaks. I believe the first class > > should have been eliminated by earlier phthread_cancel audits. Fixing > > the second class would require designing some really clever helpers, > > I > > guess. But as argued above, I really don't think it matters much if > > it > > concerns only leaks-at-shutdown. > > > > I'll put this on my todo list, but not at the highest prio. > > For this RFC, we need to decide whether it's more important to be > > leak- > > free on shutdown, or to react quickly on shutdown requests. > > Last year, I had started looking into this, and produced a first patch, > bca3729 ("libmultipath: alias.c: prepare for cancel-safe allocation"). > I've just revisited this. I think what I did back then was broken. > > out: > + pthread_cleanup_push(free, alias); > fclose(f); > + pthread_cleanup_pop(0); > > > is confusing and hard to read. When I just saw it, several months after > having written it myself, I thought it was a bug. Actually, it *is* > broken, as you pointed out already in > https://www.redhat.com/archives/dm-devel/2019-October/msg00211.html, > because we need to avoid leaking the fd, too. > > How would code look like that would protect both from the memory and fd > leak due to cancellation? Maybe like this: > > char *get_user_friendly_alias(...) > { > char *alias = NULL; > ... > > /* simplified, we need a wrapper for fclose */ > pthread_cleanup_push(fclose, f); > > id = lookup_binding(f, wwid, &alias, prefix); > if (id < 0) > goto out_fclose; > > pthread_cleanup_push(free, alias); > if (fflush(f) != 0) { > condlog(0, "cannot fflush bindings file stream : %s", > strerror(errno)); > free(alias); > alias = NULL; > goto out_free; > } else if (can_write && !bindings_read_only && !alias) > alias = allocate_binding(fd, wwid, id, prefix); > > out_free: > /* This is necessary to preserve nesting */ > pthread_cleanup_pop(0); /* free */ > out_fclose: > pthread_cleanup_pop(0); /* fclose */ > /* This is necessary because fclose() is a cancellation point */ > pthread_cleanup_push(free, alias); > fclose(f); > pthread_cleanup_pop(0); /* free */ > > return alias; > } > > I hope you concur that this is awfully ugly. Everyone is invited to > find a solution that doesn't require 3x pthread_cleanup_push()/pop(), > without completely rewriting the code. I assume you would do it something like: static void free_ptr(void *arg) { void *ptr; if (!arg) return ptr = *((void **) arg); if (ptr) free(ptr); } char *get_user_friendly_alias(...) { char *alias = NULL; pthread_cleanup_push(free_ptr, &alias); ... /* simplified, we need a wrapper for fclose */ pthread_cleanup_push(fclose, f); ... out: pthread_cleanup_pop(1); /* fclose */ pthread_cleanup_pop(0); /* free_ptr */ return alias; } > IMO avoiding the fd leak is more important than avoiding the memory > leak of "alias". I'm going to submit a patch that does exactly that. > > In general: I think completely avoiding memory leaks in multithreaded > code that allows almost arbitrary cancellation is not a worthwhile > goal. After all, except for the waiter and checker threads, > all other threads are only cancelled when multipathd exits. Yes, this > makes it harder to assess potential memory leaks with valgrind, because > we can't easily distinguish real memory leaks from leaks caused by > cancellation. But that's about it, and I think the distinction is > actually not that hard, because the leaks caused by cancellation would > be sporadic, and wouldn't pile up during longer runs. > > So, I propose not to go further in this direction. IOW, we shouldn't > write code like bca3729 any more. We don't have to avoid it at all cost > (for example, it's always good to link allocated memory to some global > data structure as soon as it makes sense to do so). But I think that > "pthread_cleanup_push(free, xyz)" is often not worth the code > uglification it causes. If it conflicts with other cleanup actions, and > can't be cleanly nested like above, we should definitely not do it. Sure. For threads that are only cancelled for shutdown, I'm fine with reasonable effort, balanced against other code considerations. > Moreover, I believe that reacting quickly on cancellation / exit > requests is more important than avoiding cancellation-caused memory > leaks. Therefore I plan to resend the "cancel threads early" patch, > unless you come up with more strong reasons not to do so. go ahead. > Another possibility to "fix" cancellation issues and get rid of ugly > pthread_cleanup_push() calls would be changing our cancellation policy > to PTHREAD_CANCEL_DISABLE, and check for cancellation only at certain > points during runtime (basically, before and after blocking / waiting > on something). But I don't think that's going to work well, for the > same reason - we'd run high risk to get those multipathd shutdown > issues back which we've overcome only recently. I'm not a fan either. I'd rather deal with occasionally seeing fake memory leaks. -Ben > Thoughts? > > Martin > > > -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel