On Fri, Dec 13, 2024 at 01:01:40PM -0800, Darrick J. Wong wrote: > > +#define XFS_ZR_GREEDY (1U << 0) > > +#define XFS_ZR_NOWAIT (1U << 1) > > +#define XFS_ZR_RESERVED (1U << 2) > > What do these flag values mean? Can we put that into comments? Sure. > > + * For XC_FREE_RTAVAILABLE only the smaller reservation required for GC and > > + * block zeroing is excluded from the user capacity, while XC_FREE_RTEXTENTS > > + * is further restricted by at least one zone as well as the optional > > + * persistently reserved blocks. This allows the allocator to run more > > + * smoothly by not always triggering GC. > > Hmm, so _RTAVAILABLE really means _RTNOGC? That makes sense. Yes, it means block available without doing further work. I can't say _RTNOGC is very descriptive either, but I would not mind a better name if someone came up with a good one :) > > + spin_unlock(&zi->zi_reservation_lock); > > + schedule(); > > + spin_lock(&zi->zi_reservation_lock); > > + } > > + list_del(&reservation.entry); > > + spin_unlock(&zi->zi_reservation_lock); > > Hmm. So if I'm understanding correctly, threads wanting to write to a > file try to locklessly reserve space from RTAVAILABLE. At least if there are no waiters yet, yes. > If they can't > get space because the zone is nearly full / needs gc / etc then everyone > gets to wait FIFO style in the reclaim_reservations list. Yes (In a way modelled after the log grant waits). > They can be > woken up from the wait if either (a) someone gives back reserved space > or (b) the copygc empties out this zone. > > Or if the thread isn't willing to wait, we skip the fifo and either fail > up to userspace Yes. > or just move on to the next zone? No other zone to move to. > I think I understand the general idea, but I don't quite know when we're > going to use the greedy algorithm. Later I see XFS_ZR_GREEDY gets used > from the buffered write path, but there doesn't seem to be an obvious > reason why? Posix/Linux semantics for buffered writes require us to implement short writes. That is if a single (p)write(v) syscall for say 10MB only find 512k of space it should write those instead of failing with ENOSPC. The XFS_ZR_GREEDY implements that by backing down to what we can allocate (and the current implementation for that is a little ugly, I plan to find some time for changes to the core percpu_counters to improve this after the code is merged).