On Tue, Apr 20, 2010 at 10:45 PM, Eric Sandeen <sandeen@xxxxxxxxxx> wrote: > Mark Lord wrote: >> On 20/04/10 05:21 PM, Greg Freemyer wrote: >>> Mark, >>> >>> This is the patch implementing the new discard logic. >> .. >>> Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx> >> .. >>>> +void ext4_trim_extent(struct super_block *sb, int start, int count, >>>> + ext4_group_t group, struct ext4_buddy *e4b) >>>> +{ >>>> + ext4_fsblk_t discard_block; >>>> + struct ext4_super_block *es = EXT4_SB(sb)->s_es; >>>> + struct ext4_free_extent ex; >>>> + >>>> + assert_spin_locked(ext4_group_lock_ptr(sb, group)); >>>> + >>>> + ex.fe_start = start; >>>> + ex.fe_group = group; >>>> + ex.fe_len = count; >>>> + >>>> + mb_mark_used(e4b,&ex); >>>> + ext4_unlock_group(sb, group); >>>> + >>>> + discard_block = (ext4_fsblk_t)group * >>>> + EXT4_BLOCKS_PER_GROUP(sb) >>>> + + start >>>> + + le32_to_cpu(es->s_first_data_block); >>>> + trace_ext4_discard_blocks(sb, >>>> + (unsigned long long)discard_block, >>>> + count); >>>> + sb_issue_discard(sb, discard_block, count); >>>> + >>>> + ext4_lock_group(sb, group); >>>> + mb_free_blocks(NULL, e4b, start, ex.fe_len); >>>> +} >>> >>> Mark, unless I'm missing something, sb_issue_discard() above is going >>> to trigger a trim command for just the one range. I thought the >>> benchmarks you did showed that a collection of ranges needed to be >>> built, then a single trim command invoked that trimmed that group of >>> ranges. >> .. >> >> Mmm.. If that's what it is doing, then this patch set would be a >> complete disaster. >> It would take *hours* to do the initial TRIM. >> >> Lukas ? > > I'm confused; do we have an interface to send a trim command for multiple ranges? > > I didn't think so ... Lukas' patch is finding free ranges (above a size threshold) > to discard; it's not doing it a block at a time, if that's the concern. > > -Eric Eric, I don't know what kernel APIs have been created to support discard, but the ATA8 draft spec. allows for specifying multiple ranges in one trim command. See section 7.10.3.1 and .2 of the latest draft spec. Both talk about multiple trim ranges per trim command (think thousands of ranges per command). Recent hdparm versions accept a trim command argument that causes multiple ranges to be trimmed per command. --trim-sector-ranges Tell SSD firmware to discard unneeded data sectors: lba:count .. --trim-sector-ranges-stdin Same as above, but reads lba:count pairs from stdin As I understand it, this is critical from a performance perspective for the SSDs Mark tested with. ie. He found a single trim command with 1000 ranges takes much less time than 1000 discrete trim commands. Per Mark's comment's in wiper.sh, a trim command can have a minimum of 128KB of associated range information, so it is thousands of ranges that can be discarded in a single command ie. hdparm can accept extremely large lists of ranges on stdin, but it parses the list into discrete trim commands with thousands of ranges per command. A kernel implementation which is trying to implement after that fact discards as this patch is doing, also needs to somehow craft trim commands with a large payload of ranges if it is going to be efficient. If the block layer cannot do this yet, then in my opinion this type of batched discarding needs to stay in user space as done with Mark's wiper.sh script and enhanced hdparm until the block layer grows that ability. Greg -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html