On Thu, Feb 29, 2024 at 10:55:38AM -0500, Waiman Long wrote: > On 2/29/24 10:25, Catalin Marinas wrote: > > On Wed, Feb 28, 2024 at 02:14:44PM -0500, Waiman Long wrote: > > > When some error conditions happen (like OOM), some kmemleak functions > > > call printk() to dump out some useful debugging information while holding > > > the kmemleak_lock. This may cause deadlock as the printk() function > > > may need to allocate additional memory leading to a create_object() > > > call acquiring kmemleak_lock again. > > > > > > Fix this deadlock issue by making sure that printk() is only called > > > after releasing the kmemleak_lock. > > I can't say I'm familiar with the printk() code but I always thought it > > uses some ring buffers as it can be called from all kind of contexts and > > allocation is not guaranteed. > > > > If printk() ends up taking kmemleak_lock through the slab allocator, I > > wonder whether we have bigger problems. The lock order is always > > kmemleak_lock -> object->lock but if printk() triggers a callback into > > kmemleak, we can also get object->lock -> kmemleak_lock ordering, so > > another potential deadlock. > > object->lock is per object whereas kmemleak_lock is global. When taking > object->lock and doing a data dump leading to a call that takes the > kmemlock, it is highly unlikely the it will need to take that particular > object->lock again. I do agree that lockdep may still warn about it if that > happens as all the object->lock's are likely to be treated to be in the same > class. Yeah, it's unlikely. I think it can only happen if there's a bug in kmemleak (or slab) and the insertion fails because of the same object we try to dump. But I suspect lockdep will complain either way. > I should probably clarify in the change log that the lockdep splat is > actually, > > [ 3991.452558] Chain exists of: [ 3991.452559] console_owner -> &port->lock > --> kmemleak_lock > > So if kmemleak calls printk() acquiring either console_owner or port->lock. > It may cause deadlock. Could you please share the whole lockdep warning? IIUC, it's not the printk() code allocating memory but somewhere down the line in the tty layer. Anyway, I had a look again at the kmemleak locking (I've been meaning to simplify it for some time, drop the object->lock altogether). The only time we nest object->lock within kmemleak_lock is during scan_block(). If we are unlucky to get some error on another CPU and dump that exact object with printk(), it could lead to deadlock. There's the dump_str_object_info() case as well triggered by a sysfs write but luckily this takes the scan_mutex (same as during scan_block()), so it solves the nesting problem. I think in those error cases we can even ignore the object->lock when dumping the info. Yeah, it can race, maybe not showing exactly the precise data in some rare cases, but in those OOM scenarios it's probably the least of our problem. -- Catalin