On Thu, Jan 20, 2011 at 08:45:03AM -0600, Geoffrey Wehrman wrote: > On Thu, Jan 20, 2011 at 12:33:46PM +1100, Dave Chinner wrote: > | Given that XFS is aimed towards optimising for the large file/large > | IO/high throughput type of application, I'm comfortable with saying > | that avoiding sub-page writes for optimal throughput IO is an > | application problem and going from there. Especially considering > | that stuff like rsync and untarring kernel tarballs are all > | appending writes so won't take any performance hit at all... > > I agree. I do not expect systems with 64K pages to be used for single > bit manipulations. However, I do see a couple of potential problems. > > The one place where page size I/O may not work though is for an DMAPI/HSM > (DMF) managed filesystem where some blocks are managed on near-line media. > The DMAPI needs to be able to remove and restore extents on a filesystem > block boundary, not a page boundary. DMAPI is still free to remove extents at whatever boundary it wants. The only difference is that it would be asked to restore extents to the page boundary the write covers rather than a block boundary. The allocation and direct IO boundaries do not change, so the only thing that needs to change is the range that DMAPI is told that the read/write is going to cover.... > The other downside is that for sparse files, we could end up allocating > space for zero filled blocks. There may be some workloads where > significant quantities of space are wasted. Yes, that is possible, though on the other hand is will reduce worst case fragmentation of pathological sparse file filling applications. e.g. out-of-core solvers that do strided writes across the file to write the first column in the result matrix as it is calculated, then the second column, then the third ... until all columns are written. ---- Realistically, for every disadvantage or advantage we can enumerate for specific workloads, I think one of us will be able to come up with a counter example that shows the opposite of the original point. I don't think this sort of argument is particularly productive. :/ Instead, I look at it from the point of view that a 64k IO is little slower than a 4k IO so such a change would not make much difference to performance. And given that terabytes of storage capacity is _cheap_ these days (and getting cheaper all the time), the extra space of using 64k instead of 4k for sparse blocks isn't a big deal. When I combine that with my experience from SGI where we always recommended using filesystems block size == page size for best IO performance on HPC setups, there's a fair argument that using page size extents for small sparse writes isn't a problem we really need to care about. Ð'd prefer to design for where we expect storage to be in the next few years e.g. 10TB spindles. Minimising space usage is not a big priority when we consider that in 2-3 years 100TB of storage will cost less than $5000 (it's about $15-20k right now). Even on desktops we're going to have more capacity that we know what to do with, so trading off storage space for lower memory overhead, lower metadata IO overhead and lower potential fragmentation seems like the right way to move forward to me. Does that seem like a reasonable position to take, or are there other factors that you think I should be considering? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs