Re: SSD - TRIM command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> it´s just a discussion, right? no implementation yet, right?

Of course...
 
> what i think....
> if device accept TRIM, we can use TRIM.
> if not, we must translate TRIM to something similar (maybe many WRITES
> ?), and when we READ from disk we get the same information

TRIM is not about writing at all. TRIM tells the
device that the addressed block is not anymore used,
so it (the SSD) can do whatever it wants with it.

The only software layer having the same "knowledge"
is the filesystem, the other layers, do not have
any decisional power about the block allocation.
Except for metadata, of course.

So, IMHO, a software TRIM can only be in the FS.

bye,

pg

> the translation coulbe be done by kernel (not md) maybe options on
> libata, nbd device....
> other option is do it with md, internal (md) TRIM translate function
> 
> who send trim?
> internal md information: md can generate it (if necessary, maybe it´s
> not...) for parity disks (not data disks)
> filesystem/or another upper layer program (database with direct device
> access), we could accept TRIM from filesystem/database, and send it to
> disks/mirrors, when necessary translate it (internal or kernel
> translate function)
> 
> 
> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>:
> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
> >> nice =)
> >> but check that parity block is a raid information, not a filesystem information
> >> for raid we could implement trim when possible (like swap)
> >> and implement a trim that we receive from filesystem, and send to all
> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
> >
> > To all disk also in case of RAID-5?
> >
> > What if the TRIM belongs only to a single SDD block
> > belonging to a single chunk of a stripe?
> > That is a *single* SSD of the RAID-5.
> >
> > Should md re-read the block and re-write (not TRIM)
> > the parity?
> >
> > I think anything that has to do with checking &
> > repairing must be carefully considered...
> >
> > bye,
> >
> > pg
> >
> >> i don´t know what trim do very well, but i think it´s a very big write
> >> with only some bits for example:
> >> set sector1='00000000000000000000000000000000000000000000000000'
> >> could be replace by:
> >> trim sector1
> >> it´s faster for sata communication, and it´s a good information for
> >> hard disk (it can put a single '0' at the start of the sector and know
> >> that all sector is 0, if it try to read any information it can use
> >> internal memory (don´t read hard disk), if a write is done it should
> >> write 0000 to bits, and after after the write operation, but it´s
> >> internal function of hard disk/ssd, not a problem of md raid... md
> >> raid should need know how to optimize and use it =] )
> >>
> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>:
> >> >> ext4 send trim commands to device (disk/md raid/nbd)
> >> >> kernel swap send this commands (when possible) to device too
> >> >> for internal raid5 parity disk this could be done by md, for data
> >> >> disks this should be done by ext4
> >> >
> >> > That's an interesting point.
> >> >
> >> > On which basis should a parity "block" get a TRIM?
> >> >
> >> > If you ask me, I think the complete TRIM story is, at
> >> > best, a temporary patch.
> >> >
> >> > IMHO the wear levelling should be handled by the filesystem
> >> > and, with awarness of this, by the underlining device drivers.
> >> > Reason is that the FS knows better what's going on with the
> >> > blocks and what will happen.
> >> >
> >> > bye,
> >> >
> >> > pg
> >> >
> >> >>
> >> >> the other question... about resync with only write what is different
> >> >> this is very good since write and read speed can be different for ssd
> >> >> (hd don´t have this 'problem')
> >> >> but i´m sure that just write what is diff is better than write all
> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
> >> >>
> >> >>
> >> >> 2011/2/9 Eric D. Mudama <edmudama@xxxxxxxxxxxxxxxx>:
> >> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
> >> >> >>
> >> >> >> Who sends this command? If md can assume that determinate mode is
> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
> >> >> >> consistency of the parity information depends on the determinate
> >> >> >> pattern used and the number of disks. If you used determinate
> >> >> >> all-zero, then parity information would always be consistent, but this
> >> >> >> is probably not preferable since every TRIM command would incur an
> >> >> >> extra write for each bit in each page of the block.
> >> >> >
> >> >> > True, and there are several solutions.  Maybe track space used via
> >> >> > some mechanism, such that when you trim you're only trimming the
> >> >> > entire stripe width so no parity is required for the trimmed regions.
> >> >> > Or, trust the drive's wear leveling and endurance rating, combined
> >> >> > with SMART data, to indicate when you need to replace the device
> >> >> > preemptive to eventual failure.
> >> >> >
> >> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
> >> >> > you could expect wear leveling to wear all the devices evenly, since
> >> >> > on average, the # of writes to all devices will be the same.  Only a
> >> >> > RAID4 setup would see a lopsided amount of writes to a single device.
> >> >> >
> >> >> > --eric
> >> >> >
> >> >> > --
> >> >> > Eric D. Mudama
> >> >> > edmudama@xxxxxxxxxxxxxxxx
> >> >> >
> >> >> > --
> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> Roberto Spadim
> >> >> Spadim Technology / SPAEmpresarial
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> > --
> >> >
> >> > piergiorgio
> >> > --
> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >>
> >>
> >>
> >> --
> >> Roberto Spadim
> >> Spadim Technology / SPAEmpresarial
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> > --
> >
> > piergiorgio
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> 
> 
> 
> -- 
> Roberto Spadim
> Spadim Technology / SPAEmpresarial

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux