yeah =) a question... if i send a TRIM to a sector if i read from it what i have? 0x00000000000000000000000000000000000 ? if yes, we could translate TRIM to WRITE on devices without TRIM (hard disks) just to have the same READ information 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>: >> it´s just a discussion, right? no implementation yet, right? > > Of course... > >> what i think.... >> if device accept TRIM, we can use TRIM. >> if not, we must translate TRIM to something similar (maybe many WRITES >> ?), and when we READ from disk we get the same information > > TRIM is not about writing at all. TRIM tells the > device that the addressed block is not anymore used, > so it (the SSD) can do whatever it wants with it. > > The only software layer having the same "knowledge" > is the filesystem, the other layers, do not have > any decisional power about the block allocation. > Except for metadata, of course. > > So, IMHO, a software TRIM can only be in the FS. > > bye, > > pg > >> the translation coulbe be done by kernel (not md) maybe options on >> libata, nbd device.... >> other option is do it with md, internal (md) TRIM translate function >> >> who send trim? >> internal md information: md can generate it (if necessary, maybe it´s >> not...) for parity disks (not data disks) >> filesystem/or another upper layer program (database with direct device >> access), we could accept TRIM from filesystem/database, and send it to >> disks/mirrors, when necessary translate it (internal or kernel >> translate function) >> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>: >> > On Wed, Feb 09, 2011 at 04:30:15PM -0200, Roberto Spadim wrote: >> >> nice =) >> >> but check that parity block is a raid information, not a filesystem information >> >> for raid we could implement trim when possible (like swap) >> >> and implement a trim that we receive from filesystem, and send to all >> >> disks (if it´s a raid1 with mirrors, we should sent to all mirrors) >> > >> > To all disk also in case of RAID-5? >> > >> > What if the TRIM belongs only to a single SDD block >> > belonging to a single chunk of a stripe? >> > That is a *single* SSD of the RAID-5. >> > >> > Should md re-read the block and re-write (not TRIM) >> > the parity? >> > >> > I think anything that has to do with checking & >> > repairing must be carefully considered... >> > >> > bye, >> > >> > pg >> > >> >> i don´t know what trim do very well, but i think it´s a very big write >> >> with only some bits for example: >> >> set sector1='00000000000000000000000000000000000000000000000000' >> >> could be replace by: >> >> trim sector1 >> >> it´s faster for sata communication, and it´s a good information for >> >> hard disk (it can put a single '0' at the start of the sector and know >> >> that all sector is 0, if it try to read any information it can use >> >> internal memory (don´t read hard disk), if a write is done it should >> >> write 0000 to bits, and after after the write operation, but it´s >> >> internal function of hard disk/ssd, not a problem of md raid... md >> >> raid should need know how to optimize and use it =] ) >> >> >> >> 2011/2/9 Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx>: >> >> >> ext4 send trim commands to device (disk/md raid/nbd) >> >> >> kernel swap send this commands (when possible) to device too >> >> >> for internal raid5 parity disk this could be done by md, for data >> >> >> disks this should be done by ext4 >> >> > >> >> > That's an interesting point. >> >> > >> >> > On which basis should a parity "block" get a TRIM? >> >> > >> >> > If you ask me, I think the complete TRIM story is, at >> >> > best, a temporary patch. >> >> > >> >> > IMHO the wear levelling should be handled by the filesystem >> >> > and, with awarness of this, by the underlining device drivers. >> >> > Reason is that the FS knows better what's going on with the >> >> > blocks and what will happen. >> >> > >> >> > bye, >> >> > >> >> > pg >> >> > >> >> >> >> >> >> the other question... about resync with only write what is different >> >> >> this is very good since write and read speed can be different for ssd >> >> >> (hd don´t have this 'problem') >> >> >> but i´m sure that just write what is diff is better than write all >> >> >> (ssd life will be bigger, hd maybe... i think that will be bigger too) >> >> >> >> >> >> >> >> >> 2011/2/9 Eric D. Mudama <edmudama@xxxxxxxxxxxxxxxx>: >> >> >> > On Wed, Feb 9 at 11:28, Scott E. Armitage wrote: >> >> >> >> >> >> >> >> Who sends this command? If md can assume that determinate mode is >> >> >> >> always set, then RAID 1 at least would remain consistent. For RAID 5, >> >> >> >> consistency of the parity information depends on the determinate >> >> >> >> pattern used and the number of disks. If you used determinate >> >> >> >> all-zero, then parity information would always be consistent, but this >> >> >> >> is probably not preferable since every TRIM command would incur an >> >> >> >> extra write for each bit in each page of the block. >> >> >> > >> >> >> > True, and there are several solutions. Maybe track space used via >> >> >> > some mechanism, such that when you trim you're only trimming the >> >> >> > entire stripe width so no parity is required for the trimmed regions. >> >> >> > Or, trust the drive's wear leveling and endurance rating, combined >> >> >> > with SMART data, to indicate when you need to replace the device >> >> >> > preemptive to eventual failure. >> >> >> > >> >> >> > It's not an unsolvable issue. If the RAID5 used distributed parity, >> >> >> > you could expect wear leveling to wear all the devices evenly, since >> >> >> > on average, the # of writes to all devices will be the same. Only a >> >> >> > RAID4 setup would see a lopsided amount of writes to a single device. >> >> >> > >> >> >> > --eric >> >> >> > >> >> >> > -- >> >> >> > Eric D. Mudama >> >> >> > edmudama@xxxxxxxxxxxxxxxx >> >> >> > >> >> >> > -- >> >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> -- >> >> >> Roberto Spadim >> >> >> Spadim Technology / SPAEmpresarial >> >> >> -- >> >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > >> >> > -- >> >> > >> >> > piergiorgio >> >> > -- >> >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> > >> >> >> >> >> >> >> >> -- >> >> Roberto Spadim >> >> Spadim Technology / SPAEmpresarial >> >> -- >> >> To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx >> >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> > -- >> > >> > piergiorgio >> > -- >> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in >> > the body of a message to majordomo@xxxxxxxxxxxxxxx >> > More majordomo info at http://vger.kernel.org/majordomo-info.html >> > >> >> >> >> -- >> Roberto Spadim >> Spadim Technology / SPAEmpresarial > > -- > > piergiorgio > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Roberto Spadim Spadim Technology / SPAEmpresarial -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html