On Thu, Oct 18, 2018 at 06:44:54PM +0300, Avi Kivity wrote: > > On 18/10/2018 14.00, Avi Kivity wrote: > > > > > >This can happen, and indeed I see our default hint is 1MB, so our > >small files use a 1MB hint. Looks like we should remove that 1MB > >hint since it's reducing allocation flexibility for XFS without a > >good return. > > > I convinced myself that this is the root cause, it fits perfectly > with your explanation. I still think that XFS should allocate > *something* rather than ENOSPC, but I can also understand someone > wanting a guarantee. Yup, it's a classic catch 22. > >On the other hand, I worry that because we bypass the page cache, > >XFS doesn't get to see the entire file at one time and so it will > >get fragmented. > > > That's what happens. I write 1000 4k writes to 400 files, in > parallel, AIO+DIO. I got 400 perfectly-fragmented files, each had > 1000 extents. Yup, you wrote them all in the one directory, didn't you? :) > So I'll remove the default hint for small files, and replace it with > larger buffer sizes so we batch more and don't get 8k-sized extents > (which is our default buffer size). Or you could just mount with the "noalign" mount option to turn off stripe alignment. After all, you don't need stripe alignment for a single spindle.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx