Hi Dave! >> In the standards space, the allocation concept was mainly aimed at >> protecting filesystem internals against out-of-space conditions on >> devices that dedup identical blocks and where simply zeroing the blocks >> therefore is ineffective. > Um, so we're supposed to use space allocation before overwriting > existing metadata in the filesystem? Not before overwriting, no. Once you have allocated an LBA it remains allocated until you discard it. > So that the underlying storage can reserve space for it before we > write it? Which would mean we have to issue a space allocation before > we dirty the metadata, which means before we dirty any metadata in a > transaction. Which means we'll basically have to redesign the > filesystems from the ground up, yes? My understanding is that this facility was aimed at filesystems that do not dynamically allocate metadata. The intent was that mkfs would preallocate the metadata LBA ranges, not the filesystem. For filesystems that allocate metadata dynamically, then yes, an additional step is required if you want to pin the LBAs. > You might be talking about filesystem metadata and block devices, > but this patchset ends up connecting ext4's user data fallocate() to > the block device, thereby allowing users to reserve space directly > in the underlying block device and directly exposing this issue to > userspace. I missed that Chaitanya's repost of this series included the ext4 patch. Sorry! >> How XFS decides to enforce space allocation policy and potentially >> leverage this plumbing is entirely up to you. > > Do I understand this correctly? i.e. that it is the filesystem's > responsibility to prevent users from preallocating more space than > exists in an underlying storage pool that has been intentionally > hidden from the filesystem so it can be underprovisioned? No. But as an administrative policy it is useful to prevent runaway applications from writing a petabyte of random garbage to media. My point was that it is up to you and the other filesystem developers to decide how you want to leverage the low-level allocation capability and how you want to provide it to processes. And whether CAP_SYS_ADMIN, ulimit, or something else is the appropriate policy interface for this. In terms of thin provisioning and space management there are various thresholds that may be reported by the device. In past discussions there haven't been much interest in getting these exposed. It is also unclear to me whether it is actually beneficial to send low space warnings to hundreds or thousands of hosts attached to an array. In many cases the individual server admins are not even the right audience. The most common notification mechanism is a message to the storage array admin saying "click here to buy more disk". If you feel there is merit in having the kernel emit the threshold warnings you could use as a feedback mechanism, I can absolutely look into that. -- Martin K. Petersen Oracle Linux Engineering