Hi Andrew, On Thu, Jun 04, 2015 at 12:29:36AM +0100, Andrew Morton wrote: > On Wed, 3 Jun 2015 16:42:56 +0100 Catalin Marinas <catalin.marinas@xxxxxxx> wrote: > > > With the current implementation, if kmemleak is disabled because of an > > error condition (e.g. fails to allocate metadata), alloc/free calls are > > no longer tracked. Usually this is not a problem since the kmemleak > > metadata is being removed via kmemleak_do_cleanup(). However, if the > > scanning thread is running at the time of disabling, kmemleak would no > > longer notice a potential vfree() call and the freed/unmapped object may > > still be accessed, causing a fault. > > > > This patch separates the kmemleak_free() enabling/disabling from the > > overall kmemleak_enabled nob so that we can defer the disabling of the > > object freeing tracking until the scanning thread completed. The > > kmemleak_free_part() is deliberately ignored by this patch since this is > > only called during boot before the scanning thread started. > > I'm having trouble with this. afacit, kmemleak_free() can still be > called while kmemleak_scan() is running on another CPU. > kmemleak_free_enabled hasn't been cleared yet so the races remain. It's not about kmemleak_free() racing with kmemleak_scan(). kmemleak_free() (and __delete_object()) is meant to race with the scanning thread (which can run for minutes at a time), the locking is done on individual kmemleak_object items. > However your statement "if the scanning thread is running at the time > of disabling" implies that the race is between kmemleak_scan() and > kmemleak_disable(). Yet the race avoidance code is placed in > kmemleak_free(). The race is indeed between kmemleak_disable() and kmemleak_scan(). Since the former may be called in atomic contexts, we cannot issue a kthread_stop() and wait for the scanning thread to finish. This is deferred to the kmemleak_do_cleanup() work queue. > All confused. A more detailed description of the race would help. I'll try to improve it and re-post. > Also, the words "kmemleak would no longer notice a potential vfree() > call" aren't sufficiently specific. kmemleak is a big place - what > *part* of kmemleak are you referring to here? What I meant is that without any patch, kmemleak_free() simply returns on !kmemleak_enabled and kmemleak_scan() does not notice that objects it is scanning are being freed (__delete_object() no longer called). This is worse with vfree() as the object is no longer mapped. There are other ways of fixing this like adding heavier locking but I found that simply allowing kmemleak_free() to get through to __delete_object() until the kmemleak_scan stopped is the simplest. > Finally, I'm concerned that a bare > > kmemleak_free_enabled = 0; > > lacks sufficient synchronization with respect to the > kmemleak_free_enabled readers from a locking/reordering point of view. I thought about this as well and I didn't see an issue initially. From an atomicity perspective, I'm not sure using atomic_t has any more benefits (we used to have atomics here until commit 8910ae896c8c "kmemleak: change some global variables to int"). As for the ordering, we need to ensure the visibility of the kmemleak_free_enabled = 0 update to other CPUs in two cases: 1. after kmemleak_scan() is stopped. This is done by calling kthread_stop() -> put_task_struct() -> atomic_dec_and_test(). The latter implies barriers on each side of the operation, so I think this case is safe. 2. before __kmemleak_do_cleanup(). The risk here is that a delete_object_full() call from __kmemleak_do_cleanup() races with the same call from kmemleak_free(). The object_tree_root look-up (via find_get_object) is covered by the kmemleak_lock. However, it looks to me like two delete_object_full() calls on the same object can race to __delete_object() and call rb_erase() and list_del_rcu() twice on the same object. Second case is trickier. A barrier after kmemleak_free_enabled = 0 does not help, we need locking from object look-up all the way to removing the object from the object list and tree. Alternatively, I could take the object->lock for the whole __delete_object() function and use the OBJECT_ALLOCATED flag to decide whether to call rb_erase and list_del_rcu. Something like below, but untested yet (I'm off most of the day); I would need to pass it through lock proving as well: diff --git a/mm/kmemleak.c b/mm/kmemleak.c index 11d6f8015896..27e2e0b688a9 100644 --- a/mm/kmemleak.c +++ b/mm/kmemleak.c @@ -607,20 +607,25 @@ static void __delete_object(struct kmemleak_object *object) { unsigned long flags; + /* + * Locking here ensures that the corresponding memory block cannot be + * freed when it is being scanned. It also avoids __delete_object() + * race when being called form __kmemleak_do_cleanup(). + */ + spin_lock_irqsave(&object->lock, flags); + if (!(object->flags & OBJECT_ALLOCATED)) + goto out; + write_lock_irqsave(&kmemleak_lock, flags); rb_erase(&object->rb_node, &object_tree_root); list_del_rcu(&object->object_list); write_unlock_irqrestore(&kmemleak_lock, flags); - WARN_ON(!(object->flags & OBJECT_ALLOCATED)); WARN_ON(atomic_read(&object->use_count) < 2); - /* - * Locking here also ensures that the corresponding memory block - * cannot be freed when it is being scanned. - */ - spin_lock_irqsave(&object->lock, flags); object->flags &= ~OBJECT_ALLOCATED; + +out: spin_unlock_irqrestore(&object->lock, flags); put_object(object); } -- Catalin -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>