On Sun, Oct 21, 2018 at 12:00:16PM +0300, Avi Kivity wrote: > > On 19/10/2018 04.24, Dave Chinner wrote: > >On Thu, Oct 18, 2018 at 06:44:54PM +0300, Avi Kivity wrote: > >>On 18/10/2018 14.00, Avi Kivity wrote: > >>> > >>>This can happen, and indeed I see our default hint is 1MB, so our > >>>small files use a 1MB hint. Looks like we should remove that 1MB > >>>hint since it's reducing allocation flexibility for XFS without a > >>>good return. > >> > >>I convinced myself that this is the root cause, it fits perfectly > >>with your explanation. I still think that XFS should allocate > >>*something* rather than ENOSPC, but I can also understand someone > >>wanting a guarantee. > >Yup, it's a classic catch 22. > > > >>>On the other hand, I worry that because we bypass the page cache, > >>>XFS doesn't get to see the entire file at one time and so it will > >>>get fragmented. > >> > >>That's what happens. I write 1000 4k writes to 400 files, in > >>parallel, AIO+DIO. I got 400 perfectly-fragmented files, each had > >>1000 extents. > >Yup, you wrote them all in the one directory, didn't you? :) > > > Yes :( > > But if I have more concurrently-written files than AGs, I'd get the > same behavior with multiple directories, no? Up to a point. At which point, I'd say you're doing it wrong and tell you to use extent size hints or buffered IO so the filesystem can turn the small random writes in nicely formed large IOs via delayed allocation. :) Remember the first rule of storage: Garbage In, Garbage Out. With direct IO, it's the responsibility of the application to give the fileystem and storage layers well formed IOs. If the app doesn't play nice, there's nothing the filesystem or storage layers can do to make it better.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx