On Tue, Mar 27, 2012 at 03:48:25PM -0500, Ben Myers wrote: > On Thu, Mar 22, 2012 at 04:15:12PM +1100, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > xfs_ioc_fstrim() doesn't treat the incoming offset and length > > correctly. It treats them as a filesystem block address, rather than > > a disk address. This is wrong because the range passed in is a > > linear representation , while the filesystem block address notiation > > is a sparse representation. Hence we cannot convert the range direct > > to filesystem block units and then use that for calculating the > > range to trim. > > > > While this sounds dangerous, the problem is limited to calculting > > what AGs need to be trimmed. The code that calcuates the actual > > ranges to trim gets the right result (i.e. only ever discards free > > space), even though it uses the wrong ranges to limit what is > > trimmed. Hence this is not a bug that endangers user data. > > Yep, I can see that the calculation of what we pass to blkdev_issue_discard() > is correct and always a free extent. I am having a hard time seeing the > problem related to calculating which AGs to trim. Can you give an example? I don't have the debug traces anymore, but the problem is this sort of thing. Take a 80MB filesystem with 4 AGs, each AG is 20MB, which is ~5000 filesystem blocks. That means we need 13 bits to store the block count per AG. i.e. agblklogi = 13. Now, the FSB addressing format is sparse, and the calculation is this: FSBNO = (AGNO << agblklog) | AGBNO Note the terminology? FSBNO != FSB. FSB is just a range converted to filesystem block units. FSBNO is the filesystem block number, an address. offset offset + length +-------------------------------------------+ range: 0 80MB daddr: 0 160k FSB: 0 20k AG: +----------+----------+----------+----------+ 0 1 2 3 AGBNO: 0 5k 0 5k 0 5k 0 5k FSBNO: 0 5k 8k 13k 16k 21k 24k 29k IOWs, the FSBNO range looks like this: +----------+ +----------+ +----------+ +----------+ 0 5k 8k 13k 16k 21k 24k 29k And there are regions that are simple invalid (the empty, sparse bits). This is done to make all the mathematics easy within each AG as you can convert from the FSBNO straight to the AGBNO (and vice versa) without needing to know the address of the first block of the AG. It means it is easy for AGs to manage their own space without needing to care about where they exist in the larger disk address space - that is complete abstracted away from the internal freespace and inode management as they all use AGBNO notation ot reference blocks within the AG. As a result, the FSBNO range of the filesystem is quite a bit larger than the FSB range of the filesystem. So, if we trim a byte range of 0 to 80MB, but treat that as a FSBNO and then convert it to an AGNO, 80MB = 20k FSBs = AG 2. Hence rather than trimming the entire range of AGs (0-3), we trim 0-2. Hence we need to convert the byte range to a daddr range, and from there extract the AGNO according to FSBNO encoding. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs