Jan Kara <jack@xxxxxxx> 于2024年6月7日周五 00:10写道: > > On Thu 06-06-24 11:14:48, JunChao Sun wrote: > > Jan Kara <jack@xxxxxxx> 于2024年6月5日周三 18:29写道: > > > > > > On Tue 04-06-24 21:49:20, JunChao Sun wrote: > > > > Jan Kara <jack@xxxxxxx> 于2024年6月4日周二 17:27写道: > > > > > On Tue 04-06-24 14:54:01, JunChao Sun wrote: > > > > > > Miklos Szeredi <miklos@xxxxxxxxxx> 于2024年6月4日周二 14:40写道: > > > > > > > > > > > > > > On Mon, 3 Jun 2024 at 13:37, JunChao Sun <sunjunchao2870@xxxxxxxxx> wrote: > > > > > > > > > > > > > > > Given these challenges, I would like to inquire about the community's > > > > > > > > perspective on implementing quota functionality at the FUSE kernel > > > > > > > > part. Is it feasible to implement quota functionality in the FUSE > > > > > > > > kernel module, allowing users to set quotas for FUSE just as they > > > > > > > > would for ext4 (e.g., using commands like quotaon /mnt/fusefs or > > > > > > > > quotaset /mnt/fusefs)? Would the community consider accepting patches > > > > > > > > for this feature? > > > > > > > > > > > > > > > > > > > > > > I would say yes, but I have no experience with quota in any way, so > > > > > > > > cannot help with the details. > > > > > > > > > > > > Thanks for your reply. I'd like try to implement this feature. > > > > > > > > > > Nice idea! But before you go and spend a lot of time trying to implement > > > > > something, I suggest that you write down a design how you imagine all this > > > > > to work and we can talk about it. Questions like: Do you have particular > > > > > usecases in mind? Where do you plan to perform the accounting / > > > > > enforcement? Where do you want to store quota information? How do you want > > > > > to recover from unclean shutdowns? Etc... > > > > > > > > Thanks a lot for your suggestions. > > > > > > > > I am reviewing the quota code of ext4 and the fuse code to determine > > > > if the implementation method used in ext4 can be ported to fuse. Based > > > > on my current understanding, the key issue is that ext4 reserves > > > > several inodes for quotas and can manage the disk itself, allowing it > > > > to directly flush quota data to the disk blocks corresponding to the > > > > quota inodes within the kernel. > > > > > > Yes. > > > > > > > However, fuse does not seem to manage > > > > the disk itself; it sends all read and write requests to user space > > > > for completion. Therefore, it may not be possible to directly flush > > > > the data in the quota inode to the disk in fuse. > > > > > > Yes, ext4 uses journalling to keep filesystem state consistent with quota > > > information. Doing this within FUSE would be rather difficult (essentially > > > you would have to implement journal within FUSE with will have rather high > > > performace overhead). > > > > > > > > > > But that's why I'm asking for usecases. For some usecases it may be fine > > > > that in case of unclean shutdown you run quotacheck program to update quota > > > > information based on current usage - non-journalling filesystems use this > > > > method. So where do you want to use quotas on a FUSE filesystem? > > > > Please allow me to ask a silly question. I'm not sure if I correctly > > understand what you mean by 'unclean shutdown'. Do you mean an > > inconsistent state that requires using fsck to repair, like in ext4 > > after a sudden power loss, or is it something else only about quota? > > > > No, I mean cases like sudden power loss or kernel crash or similar. However > > note that journalling filesystems (such as ext4 or xfs or many others) do > > not require fsck after such event. The journal allows them to recover > > automatically. Thanks for your clarification. I understand. > > > In my scenario, FUSE (both the kernel and user space parts) acts > > merely as a proxy. FUSE is based on multiple file systems, and a > > user's file and directory exists in only one of these file systems. It > > does not even have its own superblock or inode metadata. When a user > > performs read or write operations on a specific file, FUSE checks the > > directory corresponding to this file on each file system to see if the > > user's file is there; if one is not, it continues to check the next > > file system. > > > > I see. So your usecase is kind of a filesystem unioning solution and you > > want to add quotas on top of that? Exactly. And all files were written by root, the underlying filesystem(btrfs) does't support project quota also. > > > > > I am considering whether it would be feasible to implement the quota > > > > inode in user space in a similar manner. For example, users could > > > > reserve a few regular files that are invisible to actual file system > > > > users to store the contents of quota. When updating the quota, the > > > > user would be notified to flush the quota data to the disk. The > > > > benefit of this approach is that it can directly reuse the quota > > > > metadata format from the kernel, users do not need to redesign > > > > metadata. However, performance might be an issue with this approach. > > > > > > Yes, storing quota data in some files inside the filesystem is probably the > > > easiest way to go. I'd just not bother with flushing because as you say > > > the performance would suck in that case. > > > > What about using caching and asynchronous updates? For example, in > > FUSE, allocate some pages to cache the quota data. When updating quota > > data, write to the cache first and then place the task in a work > > queue. The work queue will then send the request to user space to > > complete the actual disk write operation. When there are read > > requests, the content is read directly from the cache. > > > > So how quota works for filesystems without journaling is that we keep quota > > information for cached inodes in memory (struct dquot - this is per ID > > (uid/gid/projid) structure). The quota information is written back to quota > > file on events like sync(2) (which also handles unmount) or when last inode > > referencing particular dquot structure is reclaimed from memory. There is > > no periodic background writeback for quota structures. Thanks a lot for your explanation. Got it. I saw that the f2fs_quota_write() function in f2fs does exactly what you described; it just writes the data into the page cache. And ioctl Q_SYNC command or umount syncs all quota data to disk. Maybe using this method in Fuse is also appropriate. > > > The problem with this approach is that asynchronous updates might lead > > to loss of quota data in the event of a sudden power failure. This > > seems acceptable to me, but I am not sure if it aligns with the > > definition of quota. Additionally, this assumes that the quota file > > will not be very large, which I believe is a reasonable > > assumption.Perhaps there are some drawbacks I haven't considered? > > > > Yes, quota files are pretty small (for today's standards) as they scale > > with the number of filesystem users which isn't generally too big. As you > > observe, quota information will not be uptodate in the event of powerfail > > or similar. That is the reason why administrator (or init scripts) are > > responsible for calling quotacheck(8) for filesystems when unclean shutdown > > is detected. Quotacheck(8) scans the whole filesystem, summarizes disk > > usage for each user, group, etc. and updates the information in the quota > > file. So the time it takes to execute quotacheck is also directly proportional to the size of the file system, right? The larger the file system, the longer quotacheck takes to run, regardless of the number of users or groups, because quotacheck needs to scan the entire file system. > > > Regarding the enforcement of quota limits, I plan to perform this in > > the kernel. For project quotas, the kernel can know how much space and > > how many inodes are being used by the corresponding project ID. For > > now, I only want to implement project quota because I believe that > > user and group quotas can be simulated using project quotas. > > > > This is not true. First and formost, owner of a file can arbitrarily change > > its projid while unpriviledged user cannot set file's owner. So there is no > > way for user to escape user quota accounting while project quota accounting > > is more or less cooperative space tracking feature (this is different with > > user namespaces but your usecase does not sound like it depends on them). > > Similarly file's group can be set only to groups user is a member of. > > Finally you can have smaller user limits and bigger group limits which > > constrain a group of users which is not possible to do just with project > > quotas. Yes, you're right. They work in conjunction. User quotas cannot be replaced by project quotas. > > > Additionally, users' definitions of file system users and groups might > > differ from file UID and GID. Users can freely use project IDs to > > define file system users and groups. > > > > Well, if UIDs in the filesystem do not match current system view of users, > > you have a larger problem be permission checking etc. So I'm not sure I > > understand your comment here. But anyway if you are convinced project > > quotas are the right solution for your usecase then I don't object. From > > kernel POV there's no fundamental difference. Yes, the permission checking was done by upper applications. Thanks again for your comments! They are really helpful. > > Honza > -- > Jan Kara <jack@xxxxxxxx> > SUSE Labs, CR Best regards, -- Junchao Sun <sunjunchao2870@xxxxxxxxx>