Re: SSD - TRIM command

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



the other question...
checked and repair
i don´t know the today resync implementation (i need read source code)
but, a read check diferences and after write if any diference is
found, is better than write without check diferences
why better?
to SSD: it will have a bigger life
to HDD: i think it will have a bigger life too (I THINK)
the problem: more operations
without check:
READ from source, WRITE to mirror
with check:
READ from source, READ from mirror, check diff, WRITE to mirror if diff

maybe a option to mdadm could set the md device to RESYNC WITH CHECK,
and RESYNC WITHOUT CHECK
it´s a user option, not a md option, right? if user want a fast resync
it can use without check or with check, but we can give user
options... that´s very nice (to user), the default option? i think
WITHOUT CHECK should be the default option, without check is a feature
like default chuck size...


2011/2/9 Roberto Spadim <roberto@xxxxxxxxxxxxx>:
> it´s just a discussion, right? no implementation yet, right?
>
> what i think....
> if device accept TRIM, we can use TRIM.
> if not, we must translate TRIM to something similar (maybe many WRITES
> ?), and when we READ from disk we get the same information
> the translation coulbe be done by kernel (not md) maybe options on
> libata, nbd device....
> other option is do it with md, internal (md) TRIM translate function
>
> who send trim?
> internal md information: md can generate it (if necessary, maybe it´s
> not...) for parity disks (not data disks)
> filesystem/or another upper layer program (database with direct device
> access), we could accept TRIM from filesystem/database, and send it to
> disks/mirrors, when necessary translate it (internal or kernel
> translate function)
>
>
> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>:
>> On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote:
>>> nice =)
>>> but check that parity block is a raid information, not a filesystem information
>>> for raid we could implement trim when possible (like swap)
>>> and implement a trim that we receive from filesystem, and send to all
>>> disks (if it´s a raid1 with mirrors, we should sent to all mirrors)
>>
>> To all disk also in case of RAID-5?
>>
>> What if the TRIM belongs only to a single SDD block
>> belonging to a single chunk of a stripe?
>> That is a *single* SSD of the RAID-5.
>>
>> Should md re-read the block and re-write (not TRIM)
>> the parity?
>>
>> I think anything that has to do with checking &
>> repairing must be carefully considered...
>>
>> bye,
>>
>> pg
>>
>>> i don´t know what trim do very well, but i think it´s a very big write
>>> with only some bits for example:
>>> set sector1='00000000000000000000000000000000000000000000000000'
>>> could be replace by:
>>> trim sector1
>>> it´s faster for sata communication, and it´s a good information for
>>> hard disk (it can put a single '0' at the start of the sector and know
>>> that all sector is 0, if it try to read any information it can use
>>> internal memory (don´t read hard disk), if a write is done it should
>>> write 0000 to bits, and after after the write operation, but it´s
>>> internal function of hard disk/ssd, not a problem of md raid... md
>>> raid should need know how to optimize and use it =] )
>>>
>>> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>:
>>> >> ext4 send trim commands to device (disk/md raid/nbd)
>>> >> kernel swap send this commands (when possible) to device too
>>> >> for internal raid5 parity disk this could be done by md, for data
>>> >> disks this should be done by ext4
>>> >
>>> > That's an interesting point.
>>> >
>>> > On which basis should a parity "block" get a TRIM?
>>> >
>>> > If you ask me, I think the complete TRIM story is, at
>>> > best, a temporary patch.
>>> >
>>> > IMHO the wear levelling should be handled by the filesystem
>>> > and, with awarness of this, by the underlining device drivers.
>>> > Reason is that the FS knows better what's going on with the
>>> > blocks and what will happen.
>>> >
>>> > bye,
>>> >
>>> > pg
>>> >
>>> >>
>>> >> the other question... about resync with only write what is different
>>> >> this is very good since write and read speed can be different for ssd
>>> >> (hd don´t have this 'problem')
>>> >> but i´m sure that just write what is diff is better than write all
>>> >> (ssd life will be bigger, hd maybe... i think that will be bigger too)
>>> >>
>>> >>
>>> >> 2011/2/9 Eric D. Mudama <edmudama@xxxxxxxxxxxxxxxx>:
>>> >> > On Wed, Feb  9 at 11:28, Scott E. Armitage wrote:
>>> >> >>
>>> >> >> Who sends this command? If md can assume that determinate mode is
>>> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5,
>>> >> >> consistency of the parity information depends on the determinate
>>> >> >> pattern used and the number of disks. If you used determinate
>>> >> >> all-zero, then parity information would always be consistent, but this
>>> >> >> is probably not preferable since every TRIM command would incur an
>>> >> >> extra write for each bit in each page of the block.
>>> >> >
>>> >> > True, and there are several solutions.  Maybe track space used via
>>> >> > some mechanism, such that when you trim you're only trimming the
>>> >> > entire stripe width so no parity is required for the trimmed regions.
>>> >> > Or, trust the drive's wear leveling and endurance rating, combined
>>> >> > with SMART data, to indicate when you need to replace the device
>>> >> > preemptive to eventual failure.
>>> >> >
>>> >> > It's not an unsolvable issue.  If the RAID5 used distributed parity,
>>> >> > you could expect wear leveling to wear all the devices evenly, since
>>> >> > on average, the # of writes to all devices will be the same.  Only a
>>> >> > RAID4 setup would see a lopsided amount of writes to a single device.
>>> >> >
>>> >> > --eric
>>> >> >
>>> >> > --
>>> >> > Eric D. Mudama
>>> >> > edmudama@xxxxxxxxxxxxxxxx
>>> >> >
>>> >> > --
>>> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> >> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >> >
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Roberto Spadim
>>> >> Spadim Technology / SPAEmpresarial
>>> >> --
>>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>> > --
>>> >
>>> > piergiorgio
>>> > --
>>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> >
>>>
>>>
>>>
>>> --
>>> Roberto Spadim
>>> Spadim Technology / SPAEmpresarial
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>> --
>>
>> piergiorgio
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
>



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux