On Tue, Nov 10, 2009 at 5:56 PM, Martin K. Petersen <martin.petersen@xxxxxxxxxx> wrote: >>>>>> "Greg" == Greg Freemyer <greg.freemyer@xxxxxxxxx> writes: > > Greg> I'm not sure where it ended up, but the big SSD / discard > Greg> discussion of a few months ago talked about 3 kinds of solutions, > Greg> and I thought the plan was to support all 3. > > We don't design for the past. > > > Greg> 1) optimization 1 - A white-listed instant discard feature. In > Greg> this methodology, the filesystems would immediately send > Greg> discard calls down to the block layer would send them on down > Greg> the block stack to the physical devices with very minimal > Greg> buffering. > > There's no whitelist. That's just how it works. > > Yes, there were a few crappy devices out there. Windows 7 issuing TRIM > commands in realtime made them instantly obsolete. If future devices > suck with Windows 7 nobody will buy them. > > > Greg> 2) optimization 2 - The block layer would accept those small > Greg> discards, but accumulate them for a short period. (less than a > Greg> second was my impression). Then coalesce them into larger > Greg> discards and send them down the block stack and eventually to > Greg> the physical device. > > SSDs are special in that they actually track map state on a per-logical > block basis. Other thinly provisioned devices track space in units > ranging from 16-32-64KB up to megabytes. > > It's up to each block device to track the map space. The way most > arrays work is that they'll ignore the portions of the request that are > not aligned to and a multiple of their internal allocation unit. > > The same applies to MD. IOW, MD would only unmap the portions of the > discard request that constitute entire stripes. No keeping state > required. > > Jens just queued my patch which allows block devices to communicate > their unmap granularity and alignment to the filesystems. This means we > can potentially use this to influence filesystem allocators. For SCSI > arrays these values are queried and passed up the stack. MD can choose > to manually set the granularity to its stripe size. > > > Greg> 3) optimization 3 - a background freespace scanner would run from > Greg> time to time that scanned a filesystem for free blocks and send a > Greg> discard / trim command down to the device. This is what Mark Lord > Greg> was working on. His solution was primarily in user space and was > Greg> controlled by cron. > > I think that's a fine approach for legacy devices. But as I said I > think Windows 7 will root out all devices with poor TRIM performance > pretty quickly. > > -- > Martin K. Petersen Oracle Linux Engineering > Martin, So for a workload mostly composed of small files residing on a MD raid 4/5/6 setup, how is this supposed to work. (ie. Tiffs, small word docs, pdfs, individual emails, etc.) Most of the individual files will be less than one stripe wide, so when they are deleted I gather the discard range will be less than a stripe and therefore MD would ignore it in the simplest of implementations. ie. Without coalescence at some point, MD will never forward discards to the hardware. Thus I would think for that workload, the nightly full freespace scan and discard would be the best solution. Thanks Greg -- Greg Freemyer Head of EDD Tape Extraction and Processing team Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer Preservation and Forensic processing of Exchange Repositories White Paper - <http://www.norcrossgroup.com/forms/whitepapers/tng_whitepaper_fpe.html> The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html