Re: [PATCH] kernfs: support kernfs notify in memory recliam context

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/14/23 12:24 PM, Tejun Heo wrote:

Hello,

On Tue, Nov 14, 2023 at 12:09:19PM -0800, junxiao.bi@xxxxxxxxxx wrote:
On 11/14/23 11:06 AM, Tejun Heo wrote:
On Tue, Nov 14, 2023 at 10:59:47AM -0800, Junxiao Bi wrote:
kernfs notify is used in write path of md (md_write_start) to wake up
userspace daemon, like "mdmon" for updating md superblock of imsm raid,
md write will wait for that update done before issuing the write, if this
How is forward progress guarnateed for that userspace daemon? This sounds
like a really fragile setup.
For imsm raid, userspace daemon "mdmon" is responsible for updating raid
metadata, kernel will use kernfs_notify to wake up the daemon anywhere
metadata update is required. If the daemon can't move forward, write may
hung, but that will be a bug in the daemon?
I see. That sounds very fragile and I'm not quite sure that can ever be made
to work reliably. While memlocking everything needed and being really
judicious about which syscalls to make will probably get you pretty far,
there are things like task_work which gets scheduled and executed when a
task is about to return to userspace. Those things are allowed to grab e.g.
mutexes in the kernel and allocate memory and so on, and can deadlock under
memory pressure.

Even just looking at kernfs_notify, it down_reads()
root->kernfs_supers_rwsem which might already be write-locked by somebody
who's waiting on memory allocation.

The patch you're proposing removes one link in the dependency chain but
there are many on there. I'm not sure this is fixable. Nobody writes kernel
code thinking that userspace code can be on the memory reclaim dependency
chain.

Understood, thanks. Sound like depending on Userspace on memory reclaim path is really bad idea and the only option for fixing it is to remove that dependency, but i am not sure that is possible without breaking the consistency of metadata.

Thanks,

Junxiao.


Thanks.




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux