Re: thin provisioned LUN support & file system allocation policy

Ric Wheeler <rwheeler@xxxxxxxxxx> · Fri, 07 Nov 2008 09:54:02 -0500

Theodore Tso wrote:
On Fri, Nov 07, 2008 at 09:26:49AM -0500, Ric Wheeler wrote:

One more consideration that I should have mentioned is that we can also  
make our file system allocation policies "thin provisioned LUN" friendly.

Basically, we need to try to re-allocate blocks instead of letting the  
allocations happily progress across the entire block range. This might  
be the inverse of an SSD friendly allocation policy, but would seem to  
be fairly trivial to implement :-)

I would think that most non log-structured filesystems do this by
default.  

I am not sure - it would be interesting to use blktrace to build a 
visual map of how we allocate/free blocks as a file system ages.

The one thing we might need for SSD-friendly allocation policies is to
tell the allocators to not try so hard to make sure allocations are
contiguous, but there are other reasons why you want contiguous
extents anyway (such as reducing the size of your extent tree and
reducing the number of block allocation data structures that need to
be updated).  And, I think to some extent SSD's do care to some level
about contiguous extents, from the point of view of reducing scatter
gather operations if nothing else, right?

					- Ted

I think that contiguous allocations are still important (especially 
since the big arrays really like to have contiguous, large chunks of 
space freed up at once so their unmap/TRIM support works better :-)) For 
SSD's, streaming writes are still faster than scattered small block 
writes, so I think contiguous allocation would help them as well.

The type of allocation that would help most is something that tries to 
keep the lower block ranges "hot" for allocation, second best policy 
would simply keep the allocated blocks in each block group hot and 
re-allocate them.

One other interesting feature is that the thin luns have a high water 
mark which can be used to send an out of band (i.e., to some user space 
app) notification when you hit a specified percentage of your physically 
allocated blocks. The key is to set this so that a human can have time 
to react by trying to expand the size of the physical pool (throw in 
another disk).

We could trigger some file system clean up at this point as well if we 
could try to repack our allocated blocks and then update the array. Of 
course, this would only help when the array's concept of used data is 
wildly out of sync with our concept of allocated blocks which happens 
when it drops the unmap commands or we don't send them.

Ric

--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html