the other question... checked and repair i don´t know the today resync implementation (i need read source code) but, a read check diferences and after write if any diference is found, is better than write without check diferences why better? to SSD: it will have a bigger life to HDD: i think it will have a bigger life too (I THINK) the problem: more operations without check: READ from source, WRITE to mirror with check: READ from source, READ from mirror, check diff, WRITE to mirror if diff maybe a option to mdadm could set the md device to RESYNC WITH CHECK, and RESYNC WITHOUT CHECK it´s a user option, not a md option, right? if user want a fast resync it can use without check or with check, but we can give user options... that´s very nice (to user), the default option? i think WITHOUT CHECK should be the default option, without check is a feature like default chuck size... 2011/2/9 Roberto Spadim <roberto@xxxxxxxxxxxxx>: > it´s just a discussion, right? no implementation yet, right? > > what i think.... > if device accept TRIM, we can use TRIM. > if not, we must translate TRIM to something similar (maybe many WRITES > ?), and when we READ from disk we get the same information > the translation coulbe be done by kernel (not md) maybe options on > libata, nbd device.... > other option is do it with md, internal (md) TRIM translate function > > who send trim? > internal md information: md can generate it (if necessary, maybe it´s > not...) for parity disks (not data disks) > filesystem/or another upper layer program (database with direct device > access), we could accept TRIM from filesystem/database, and send it to > disks/mirrors, when necessary translate it (internal or kernel > translate function) > > > 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>: >> On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: >>> nice =) >>> but check that parity block is a raid information, not a filesystem information >>> for raid we could implement trim when possible (like swap) >>> and implement a trim that we receive from filesystem, and send to all >>> disks (if it´s a raid1 with mirrors, we should sent to all mirrors) >> >> To all disk also in case of RAID-5? >> >> What if the TRIM belongs only to a single SDD block >> belonging to a single chunk of a stripe? >> That is a *single* SSD of the RAID-5. >> >> Should md re-read the block and re-write (not TRIM) >> the parity? >> >> I think anything that has to do with checking & >> repairing must be carefully considered... >> >> bye, >> >> pg >> >>> i don´t know what trim do very well, but i think it´s a very big write >>> with only some bits for example: >>> set sector1='00000000000000000000000000000000000000000000000000' >>> could be replace by: >>> trim sector1 >>> it´s faster for sata communication, and it´s a good information for >>> hard disk (it can put a single '0' at the start of the sector and know >>> that all sector is 0, if it try to read any information it can use >>> internal memory (don´t read hard disk), if a write is done it should >>> write 0000 to bits, and after after the write operation, but it´s >>> internal function of hard disk/ssd, not a problem of md raid... md >>> raid should need know how to optimize and use it =] ) >>> >>> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>: >>> >> ext4 send trim commands to device (disk/md raid/nbd) >>> >> kernel swap send this commands (when possible) to device too >>> >> for internal raid5 parity disk this could be done by md, for data >>> >> disks this should be done by ext4 >>> > >>> > That's an interesting point. >>> > >>> > On which basis should a parity "block" get a TRIM? >>> > >>> > If you ask me, I think the complete TRIM story is, at >>> > best, a temporary patch. >>> > >>> > IMHO the wear levelling should be handled by the filesystem >>> > and, with awarness of this, by the underlining device drivers. >>> > Reason is that the FS knows better what's going on with the >>> > blocks and what will happen. >>> > >>> > bye, >>> > >>> > pg >>> > >>> >> >>> >> the other question... about resync with only write what is different >>> >> this is very good since write and read speed can be different for ssd >>> >> (hd don´t have this 'problem') >>> >> but i´m sure that just write what is diff is better than write all >>> >> (ssd life will be bigger, hd maybe... i think that will be bigger too) >>> >> >>> >> >>> >> 2011/2/9 Eric D. Mudama <edmudama@xxxxxxxxxxxxxxxx>: >>> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >>> >> >> >>> >> >> Who sends this command? If md can assume that determinate mode is >>> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, >>> >> >> consistency of the parity information depends on the determinate >>> >> >> pattern used and the number of disks. If you used determinate >>> >> >> all-zero, then parity information would always be consistent, but this >>> >> >> is probably not preferable since every TRIM command would incur an >>> >> >> extra write for each bit in each page of the block. >>> >> > >>> >> > True, and there are several solutions. Maybe track space used via >>> >> > some mechanism, such that when you trim you're only trimming the >>> >> > entire stripe width so no parity is required for the trimmed regions. >>> >> > Or, trust the drive's wear leveling and endurance rating, combined >>> >> > with SMART data, to indicate when you need to replace the device >>> >> > preemptive to eventual failure. >>> >> > >>> >> > It's not an unsolvable issue. If the RAID5 used distributed parity, >>> >> > you could expect wear leveling to wear all the devices evenly, since >>> >> > on average, the # of writes to all devices will be the same. Only a >>> >> > RAID4 setup would see a lopsided amount of writes to a single device. >>> >> > >>> >> > --eric >>> >> > >>> >> > -- >>> >> > Eric D. Mudama >>> >> > edmudama@xxxxxxxxxxxxxxxx >>> >> > >>> >> > -- >>> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >>> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Roberto Spadim >>> >> Spadim Technology / SPAEmpresarial >>> >> -- >>> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > >>> > -- >>> > >>> > piergiorgio >>> > -- >>> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> > the body of a message to majordomo@xxxxxxxxxxxxxxx >>> > More majordomo info at http://vger.kernel.org/majordomo-info.html >>> > >>> >>> >>> >>> -- >>> Roberto Spadim >>> Spadim Technology / SPAEmpresarial >>> -- >>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> -- >> >> piergiorgio >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > > > > -- > Roberto Spadim > Spadim Technology / SPAEmpresarial > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html