On Tue, Feb 06, 2018 at 04:10:12PM +0200, Avi Kivity wrote: > > On 01/29/2018 11:56 PM, Dave Chinner wrote: > > On Mon, Jan 29, 2018 at 01:44:14PM +0200, Avi Kivity wrote: > > > > There's many reasons this can happen, but the most common is the > > > > working files in a directory (or subset of directories in the same > > > > AG) have a combined space usage of larger than an AG .... > > > That's certainly possible, even likely (one huge directory with all of the > > > files). > > > > > > This layout is imposed on us by the compatibility gods. Is there a way to > > > tell XFS to change its policy of on-ag-per-directory? > > mount with inode32. That rotors files around all AGs in a round > > robin fashion instead of trying to keep directory locality for a > > working set. i.e. it distributes the files evenly across the > > filesystem. > > http://xfs.org/docs/xfsdocs-xml-dev/XFS_User_Guide/tmp/en-US/html/ch06s09.html > says: > > "When 32 bit inode numbers are used on a volume larger than 1TB in size, > several changes occur. > > A 100TB volume using 256 byte inodes mounted in the default inode32 mode has > just one percent of its space available for allocating inodes. > > XFS will reserve the first 1TB of disk space exclusively for inodes to > ensure that the imbalance is no worse than this due to file data > allocations." s/exclusively// > Does this mean that a 1.1TB disk has 1TB reserved for inodes and 0.1TB left > over for data? No, that would be silly. > Or is it driven by the "one percent" which is mentioned > above, so it would be 0.011TB? No, you're inferring behavioural rules that don't exist from a simple example. Maximum number of inodes is controlled by min(imaxpct, free space). For inode32, "free space" is what's in the first 32 bits of the inode address space. For inode64, it's global free space. To enable this, inode32 sets the AGs wholly within the first 32 bits of the inode address space to be "metadata prefered" and "inode capable". Important things to note: - "first 32 bits of inode address space" means the range of space that inode32 reserves for inodes changes according to inode size. 256 byte inodes = 1TB, 2kB inodes = 8TB. If the filesystem is smaller than this threshold, then it will silently use the inode64 allocation policy until the filesystem is grown beyond 32 bit inode address space size. - "inode capable" means inodes can be allocated in the AG - "metadata preferred" means user data will not get allocated in this AG unless all non-prefered AGs are full. So, assuming 256 byte inodes, you 1.1TB fs will have a imaxpct of ~25%, allowing a maximum of 256GB of inodes or about a billion inodes. But once you put more than 0.1TB of data into the filesystem, data will start filling up the inode capable AGs as well, and then your limit for inodes looks just like inode64 (i.e. depedent on free space). IOWs, inode32 limits where and how many inodes you can create, not how much user data you can write inode the filesystem. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html