On Tue, May 28, 2024 at 05:13:04PM +0800, Gao Xiang wrote: > Hi Christian, > > On 2024/5/28 16:43, Christian Brauner wrote: > > On Tue, May 28, 2024 at 12:02:46PM +0800, Gao Xiang wrote: > > > > > > > > > On 2024/5/28 11:08, Jingbo Xu wrote: > > > > > > > > > > > > On 5/28/24 10:45 AM, Jingbo Xu wrote: > > > > > > > > > > > > > > > On 5/27/24 11:16 PM, Miklos Szeredi wrote: > > > > > > On Fri, 24 May 2024 at 08:40, Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > 3. I don't know if a kernel based recovery mechanism is welcome on the > > > > > > > community side. Any comment is welcome. Thanks! > > > > > > > > > > > > I'd prefer something external to fuse. > > > > > > > > > > Okay, understood. > > > > > > > > > > > > > > > > > Maybe a kernel based fdstore (lifetime connected to that of the > > > > > > container) would a useful service more generally? > > > > > > > > > > Yeah I indeed had considered this, but I'm afraid VFS guys would be > > > > > concerned about why we do this on kernel side rather than in user space. > > > > > > Just from my own perspective, even if it's in FUSE, the concern is > > > almost the same. > > > > > > I wonder if on-demand cachefiles can keep fds too in the future > > > (thus e.g. daemonless feature could even be implemented entirely > > > with kernel fdstore) but it still has the same concern or it's > > > a source of duplication. > > > > > > Thanks, > > > Gao Xiang > > > > > > > > > > > > > I'm not sure what the VFS guys think about this and if the kernel side > > > > > shall care about this. > > > > Fwiw, I'm not convinced and I think that's a big can of worms security > > wise and semantics wise. I have discussed whether a kernel-side fdstore > > would be something that systemd would use if available multiple times > > and they wouldn't use it because it provides them with no benefits over > > having it in userspace. > > As far as I know, currently there are approximately two ways to do > failover mechanisms in kernel. > > The first model much like a fuse-like model: in this mode, we should > keep and pass fd to maintain the active state. And currently, > userspace should be responsible for the permission/security issues > when doing something like passing fds. > > The second model is like one device-one instance model, for example > ublk (If I understand correctly): each active instance (/dev/ublkbX) > has their own unique control device (/dev/ublkcX). Users could > assign/change DAC/MAC for each control device. And failover > recovery just needs to reopen the control device with proper > permission and do recovery. > > So just my own thought, kernel-side fdstore pseudo filesystem may > provide a DAC/MAC mechanism for the first model. That is a much > cleaner way than doing some similar thing independently in each > subsystem which may need DAC/MAC-like mechanism. But that is > just my own thought. The failover mechanism for /dev/ublkcX could easily be implemented using the fdstore. The fact that they rolled their own thing is orthogonal to this imho. Implementing retrieval policies like this in the kernel is slowly advancing into /proc/$pid/fd/ levels of complexity. That's all better handled with appropriate policies in userspace. And cachefilesd can similarly just stash their fds in the fdstore.