On Wed, 2010-04-21 at 17:47 -0400, Greg Freemyer wrote: > Adding James Bottomley because high-end scsi is entering the > discussion. James, I have a couple scsi questions for you at the end. > > On Wed, Apr 21, 2010 at 5:03 PM, Ric Wheeler <rwheeler@xxxxxxxxxx> wrote: > > On 04/21/2010 05:01 PM, Eric Sandeen wrote: > >> > >> On 04/21/2010 03:44 PM, Greg Freemyer wrote: > >> > >> > >>> > >>> Mark's benchmarks showed this as doable in seconds which seems like a > >>> reasonable amount of time for a mount time operation. > >>> > >> > >> All the other things aside, mount-time is interesting, but it's an > >> infrequent operation, at least in my world. I think we need something > >> that can be done runtime. > >> > >> For anything with uptime, I don't think it's acceptable to wait until > >> the next mount to trim unused blocks. So what's wrong with using wiper.sh? It can do online discard of filesystems that support delayed allocation (ext4, xfs etc.)? > >> But as long as the mechanism can be called either at mount time and/or > >> kicked off runtime somehow, I'm happy. > >> > >> -Eric > >> > > > > That makes sense to me. Most enterprise servers will go without remounting > > a file system for (hopefully!) a very long time. > > > > It is really important to keep in mind that this is not just a laptop > > feature for laptop SSD's, this is also used by high end arrays and *could* > > be useful for virt IO, etc as well :-) > > > > ric > > I'm not arguing that a runtime solution is not needed. > > I'm arguing that at least for SSD backed filesystems Mark's userspace > implementation shows how the mount time initialization of the runtime > bitmap can be accomplished in a few seconds by leveraging the hardware > and using vector'ed trims as opposed to having to build an additional > on-disk structure. > > At least for SSDs, the primary purpose of the proposed on-disk > structure seems to be to overcome the current lack of a vector'ed > discard implementation. > > If it is too difficult to implement a fully functional vector'ed > discard in the block layer due to locking issues, possibly a special > purpose version could be written that is only used at mount time when > one can be assured no other i/o is occurring to the filesystem. > > James, > > The ATA-8 spec. supports vectored trims and requires a minimum of 255 > sectors worth of range payload be supported. That equates to a single > trim being able to trim thousands of ranges in one command. > > Mark Lord has benchmarked in found a vectored trim to be drastically > faster than calling trim individually for each of those ranges. > > Does scsi support vector'ed discard? (ie. write-same commands) only with UNMAP. WRITE SAME is effectively single range. > Or are high-end scsi arrays so fast they can process tens of thousands > of discard commands in a reasonable amount of time, unlike the SSDs > have so far proven to do. No ... they actually have two problems: firstly they can only use discard ranges which align with their internal block size (usually something huge like 3/4MB) and then a trim operation tends to be O(1) and slow, so they'd actually like discard accumulation. > It would be interesting to find out that a SSD can discard thousands > of ranges drastically faster than a high-end scsi device can. But if > true, that might argue for the on-disk bitmap to track previously > discarded blocks/extents. I think SSDs and Arrays both have discard problems, arrays more to do with the time and expense of the operation, SSDs because the TRIM command isn't queued. James -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html