Re: Is is reasonable to support quota in fuse?

Jan Kara <jack@xxxxxxx> · Thu, 6 Jun 2024 18:10:16 +0200

On Thu 06-06-24 11:14:48, JunChao Sun wrote:
> Jan Kara <jack@xxxxxxx> 于2024年6月5日周三 18:29写道：
> >
> > On Tue 04-06-24 21:49:20, JunChao Sun wrote:
> > > Jan Kara <jack@xxxxxxx> 于2024年6月4日周二 17:27写道：
> > > > On Tue 04-06-24 14:54:01, JunChao Sun wrote:
> > > > > Miklos Szeredi <miklos@xxxxxxxxxx> 于2024年6月4日周二 14:40写道：
> > > > > >
> > > > > > On Mon, 3 Jun 2024 at 13:37, JunChao Sun <sunjunchao2870@xxxxxxxxx> wrote:
> > > > > >
> > > > > > > Given these challenges, I would like to inquire about the community's
> > > > > > > perspective on implementing quota functionality at the FUSE kernel
> > > > > > > part. Is it feasible to implement quota functionality in the FUSE
> > > > > > > kernel module, allowing users to set quotas for FUSE just as they
> > > > > > > would for ext4 (e.g., using commands like quotaon /mnt/fusefs or
> > > > > > > quotaset /mnt/fusefs)?  Would the community consider accepting patches
> > > > > > > for this feature?
> > > > > >
> > > > > >
> > > > > > > I would say yes, but I have no experience with quota in any way, so
> > > > > > > cannot help with the details.
> > > > >
> > > > > Thanks for your reply. I'd like try to implement this feature.
> > > >
> > > > Nice idea! But before you go and spend a lot of time trying to implement
> > > > something, I suggest that you write down a design how you imagine all this
> > > > to work and we can talk about it. Questions like: Do you have particular
> > > > usecases in mind? Where do you plan to perform the accounting /
> > > > enforcement? Where do you want to store quota information? How do you want
> > > > to recover from unclean shutdowns? Etc...
> > >
> > > Thanks a lot for your suggestions.
> > >
> > > I am reviewing the quota code of ext4 and the fuse code to determine
> > > if the implementation method used in ext4 can be ported to fuse. Based
> > > on my current understanding, the key issue is that ext4 reserves
> > > several inodes for quotas and can manage the disk itself, allowing it
> > > to directly flush quota data to the disk blocks corresponding to the
> > > quota inodes within the kernel.
> >
> > Yes.
> >
> > > However, fuse does not seem to manage
> > > the disk itself; it sends all read and write requests to user space
> > > for completion. Therefore, it may not be possible to directly flush
> > > the data in the quota inode to the disk in fuse.
> >
> > Yes, ext4 uses journalling to keep filesystem state consistent with quota
> > information. Doing this within FUSE would be rather difficult (essentially
> > you would have to implement journal within FUSE with will have rather high
> > performace overhead).
> >
> >
> > > But that's why I'm asking for usecases. For some usecases it may be fine
> > > that in case of unclean shutdown you run quotacheck program to update quota
> > > information based on current usage - non-journalling filesystems use this
> > > method. So where do you want to use quotas on a FUSE filesystem?
> 
> Please allow me to ask a silly question. I'm not sure if I correctly
> understand what you mean by 'unclean shutdown'. Do you mean an
> inconsistent state that requires using fsck to repair, like in ext4
> after a sudden power loss, or is it something else only about quota?

No, I mean cases like sudden power loss or kernel crash or similar. However
note that journalling filesystems (such as ext4 or xfs or many others) do
not require fsck after such event. The journal allows them to recover
automatically.

> In my scenario, FUSE (both the kernel and user space parts) acts
> merely as a proxy. FUSE is based on multiple file systems, and a
> user's file and directory exists in only one of these file systems. It
> does not even have its own superblock or inode metadata. When a user
> performs read or write operations on a specific file, FUSE checks the
> directory corresponding to this file on each file system to see if the
> user's file is there; if one is not, it continues to check the next
> file system.

I see. So your usecase is kind of a filesystem unioning solution and you
want to add quotas on top of that?

> > > I am considering whether it would be feasible to implement the quota
> > > inode in user space in a similar manner. For example, users could
> > > reserve a few regular files that are invisible to actual file system
> > > users to store the contents of quota. When updating the quota, the
> > > user would be notified to flush the quota data to the disk. The
> > > benefit of this approach is that it can directly reuse the quota
> > > metadata format from the kernel, users do not need to redesign
> > > metadata. However, performance might be an issue with this approach.
> >
> > Yes, storing quota data in some files inside the filesystem is probably the
> > easiest way to go. I'd just not bother with flushing because as you say
> > the performance would suck in that case.
> 
> What about using caching and asynchronous updates? For example, in
> FUSE, allocate some pages to cache the quota data. When updating quota
> data, write to the cache first and then place the task in a work
> queue. The work queue will then send the request to user space to
> complete the actual disk write operation. When there are read
> requests, the content is read directly from the cache.

So how quota works for filesystems without journaling is that we keep quota
information for cached inodes in memory (struct dquot - this is per ID
(uid/gid/projid) structure). The quota information is written back to quota
file on events like sync(2) (which also handles unmount) or when last inode
referencing particular dquot structure is reclaimed from memory. There is
no periodic background writeback for quota structures.

> The problem with this approach is that asynchronous updates might lead
> to loss of quota data in the event of a sudden power failure. This
> seems acceptable to me, but I am not sure if it aligns with the
> definition of quota. Additionally, this assumes that the quota file
> will not be very large, which I believe is a reasonable
> assumption.Perhaps there are some drawbacks I haven't considered?

Yes, quota files are pretty small (for today's standards) as they scale
with the number of filesystem users which isn't generally too big. As you
observe, quota information will not be uptodate in the event of powerfail
or similar. That is the reason why administrator (or init scripts) are
responsible for calling quotacheck(8) for filesystems when unclean shutdown
is detected. Quotacheck(8) scans the whole filesystem, summarizes disk
usage for each user, group, etc. and updates the information in the quota
file.

> Regarding the enforcement of quota limits, I plan to perform this in
> the kernel. For project quotas, the kernel can know how much space and
> how many inodes are being used by the corresponding project ID. For
> now, I only want to implement project quota because I believe that
> user and group quotas can be simulated using project quotas.

This is not true. First and formost, owner of a file can arbitrarily change
its projid while unpriviledged user cannot set file's owner. So there is no
way for user to escape user quota accounting while project quota accounting
is more or less cooperative space tracking feature (this is different with
user namespaces but your usecase does not sound like it depends on them).
Similarly file's group can be set only to groups user is a member of.
Finally you can have smaller user limits and bigger group limits which
constrain a group of users which is not possible to do just with project
quotas.

> Additionally, users' definitions of file system users and groups might
> differ from file UID and GID. Users can freely use project IDs to
> define file system users and groups.

Well, if UIDs in the filesystem do not match current system view of users,
you have a larger problem be permission checking etc. So I'm not sure I
understand your comment here. But anyway if you are convinced project
quotas are the right solution for your usecase then I don't object. From
kernel POV there's no fundamental difference.

								Honza
-- 
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR