On Sun, Dec 15, 2024 at 06:31:35AM +0100, Christoph Hellwig wrote: > On Fri, Dec 13, 2024 at 01:01:40PM -0800, Darrick J. Wong wrote: > > > +#define XFS_ZR_GREEDY (1U << 0) > > > +#define XFS_ZR_NOWAIT (1U << 1) > > > +#define XFS_ZR_RESERVED (1U << 2) > > > > What do these flag values mean? Can we put that into comments? > > Sure. > > > > + * For XC_FREE_RTAVAILABLE only the smaller reservation required for GC and > > > + * block zeroing is excluded from the user capacity, while XC_FREE_RTEXTENTS > > > + * is further restricted by at least one zone as well as the optional > > > + * persistently reserved blocks. This allows the allocator to run more > > > + * smoothly by not always triggering GC. > > > > Hmm, so _RTAVAILABLE really means _RTNOGC? That makes sense. > > Yes, it means block available without doing further work. > I can't say _RTNOGC is very descriptive either, but I would not mind > a better name if someone came up with a good one :) Hrmm, they're rt extents that are available "now", or "for cheap"... XC_FREE_NOW_RTEXTENTS XC_FREE_RTEXTENTS_IMMED XC_FREE_RTEXTENTS_CHEAP Eh, I'm not enthusiastic about any of those. The best I can think of is: XC_FREE_RTEXTENTS_NOGC, /* space available without gc */ > > > + spin_unlock(&zi->zi_reservation_lock); > > > + schedule(); > > > + spin_lock(&zi->zi_reservation_lock); > > > + } > > > + list_del(&reservation.entry); > > > + spin_unlock(&zi->zi_reservation_lock); > > > > Hmm. So if I'm understanding correctly, threads wanting to write to a > > file try to locklessly reserve space from RTAVAILABLE. > > At least if there are no waiters yet, yes. > > > If they can't > > get space because the zone is nearly full / needs gc / etc then everyone > > gets to wait FIFO style in the reclaim_reservations list. > > Yes (In a way modelled after the log grant waits). > > > They can be > > woken up from the wait if either (a) someone gives back reserved space > > or (b) the copygc empties out this zone. > > > > Or if the thread isn't willing to wait, we skip the fifo and either fail > > up to userspace > > Yes. > > > or just move on to the next zone? > > No other zone to move to. <nod> > > I think I understand the general idea, but I don't quite know when we're > > going to use the greedy algorithm. Later I see XFS_ZR_GREEDY gets used > > from the buffered write path, but there doesn't seem to be an obvious > > reason why? > > Posix/Linux semantics for buffered writes require us to implement > short writes. That is if a single (p)write(v) syscall for say 10MB > only find 512k of space it should write those instead of failing > with ENOSPC. The XFS_ZR_GREEDY implements that by backing down to > what we can allocate (and the current implementation for that is > a little ugly, I plan to find some time for changes to the core > percpu_counters to improve this after the code is merged). Ah, ok. Can you put that in the comments defining XFS_ZR_GREEDY? --D