Theodore Tso wrote:
On Fri, Nov 07, 2008 at 09:26:49AM -0500, Ric Wheeler wrote:
One more consideration that I should have mentioned is that we can also
make our file system allocation policies "thin provisioned LUN" friendly.
Basically, we need to try to re-allocate blocks instead of letting the
allocations happily progress across the entire block range. This might
be the inverse of an SSD friendly allocation policy, but would seem to
be fairly trivial to implement :-)
I would think that most non log-structured filesystems do this by
default.
I am not sure - it would be interesting to use blktrace to build a
visual map of how we allocate/free blocks as a file system ages.
The one thing we might need for SSD-friendly allocation policies is to
tell the allocators to not try so hard to make sure allocations are
contiguous, but there are other reasons why you want contiguous
extents anyway (such as reducing the size of your extent tree and
reducing the number of block allocation data structures that need to
be updated). And, I think to some extent SSD's do care to some level
about contiguous extents, from the point of view of reducing scatter
gather operations if nothing else, right?
- Ted
I think that contiguous allocations are still important (especially
since the big arrays really like to have contiguous, large chunks of
space freed up at once so their unmap/TRIM support works better :-)) For
SSD's, streaming writes are still faster than scattered small block
writes, so I think contiguous allocation would help them as well.
The type of allocation that would help most is something that tries to
keep the lower block ranges "hot" for allocation, second best policy
would simply keep the allocated blocks in each block group hot and
re-allocate them.
One other interesting feature is that the thin luns have a high water
mark which can be used to send an out of band (i.e., to some user space
app) notification when you hit a specified percentage of your physically
allocated blocks. The key is to set this so that a human can have time
to react by trying to expand the size of the physical pool (throw in
another disk).
We could trigger some file system clean up at this point as well if we
could try to repack our allocated blocks and then update the array. Of
course, this would only help when the array's concept of used data is
wildly out of sync with our concept of allocated blocks which happens
when it drops the unmap commands or we don't send them.
Ric
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html