> Il giorno 3 ott 2022, alle ore 00:33, Damien Le Moal <Damien.LeMoal@xxxxxxx> ha scritto: > > On 2022/10/03 1:20, Paolo Valente wrote: >> >> >>> Il giorno 1 ott 2022, alle ore 02:50, Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> ha scritto: >>> >>> On 10/1/22 00:59, Paolo Valente wrote: >>>> Hi Jens, Damien, all other possibly interested people, this is to raise >>>> attention on a mistake that has emerged in a thread on a bfq extension >>>> for multi-actuary drives [1]. >>>> >>>> The mistake is apparently in the macro bio_end_sector (defined in >>>> include/linux/bio.h), which seems to be translated (incorrectly) as >>>> sector+size, and not as sector+size-1. >>> >>> This has been like this for a long time, I think. >>> >>>> >>>> For your convenience, I'm pasting a detailed description of the >>>> problem, by Tyler (description taken from the above thread [1]). >>>> >>>> The drive reports the actuator ranges as a starting LBA and a count of >>>> LBAs for the range. If the code reading the reported values simply does >>>> startingLBA + range, this is an incorrect ending LBA for that actuator. >>> >>> Well, yes. LBA 0 + drive capacity is also an incorrect LBA. If the code >>> assumes that it is, you have a classic off-by-one bug. >>> >>>> This is because LBAs are zero indexed and this simple addition is not >>>> taking that into account. The proper way to get the endingLBA is >>>> startingLBA + range - 1 to get the last LBA value for where to issue a >>>> final IO read/write to account for LBA values starting at zero rather >>>> than one. >>> >>> Yes. And ? Where is the issue ? >>> >>>> >>>> Here is an example from the output in SeaChest/openSeaChest: >>>> ====Concurrent Positioning Ranges==== >>>> >>>> Range# #Elements Lowest LBA # of LBAs 0 >>>> 1 0 >>>> 17578328064 1 1 17578328064 >>>> 17578328064 >>>> >>>> If using the incorrect formula to get the final LBA for actuator 0, you >>>> would get 17578328064, but this is the starting LBA reported by the >>>> drive for actuator 1. So to be consistent for all ranges, the final LBA >>>> for a given actuator should be calculated as starting LBA + range - 1. >>>> >>>> I had reached out to Seagate's T10 and T13 representatives for >>>> clarification and verification and this is most likely what is causing >>>> the error is a missing - 1 somewhere after getting the information >>>> reported by the device. They agreed that the reporting from the drive >>>> and the SCSI to ATA translation is correct. >>>> >>>> I'm not sure where this is being read and calculated, but it is not an >>>> error in the low-level libata or sd level of the kernel. It may be in >>>> bfq, or it may be in some other place after the sd layer. I know there >>>> were some additions to read this and report it up the stack, but I did >>>> not think those were wrong as they seemed to pass the drive reported >>>> information up the stack. >>>> >>>> Jens, Damien, can you shed a light on this? >>> >>> I am not clear on what the problem is exactly. This all sound like a >>> simple off-by-one issue if bfq support code. No ? >> >> Yes, it's (also) in bfq code now; no, it's not only in bfq code. >> >> The involved bfq code is a replica of the following original function >> (living in block/blk-iaranges.c): >> >> static struct blk_independent_access_range * >> disk_find_ia_range(struct blk_independent_access_ranges *iars, >> sector_t sector) >> { >> struct blk_independent_access_range *iar; >> int i; >> >> for (i = 0; i < iars->nr_ia_ranges; i++) { >> iar = &iars->ia_range[i]; >> if (sector >= iar->sector && >> sector < iar->sector + iar->nr_sectors) >> return iar; >> } >> >> return NULL; >> } >> >> bfq's replica simply contains also this warning, right before the return NULL: >> >> WARN_ONCE(true, >> "bfq_actuator_index: bio sector out of ranges: end=%llu\n", >> end); >> >> So, if this warning is triggered, then the sector does not belong to >> any range. >> >> That warning gets actually triggered [1], for a sector number that is >> larger than the largest extreme (iar->sector + iar->nr_sectors) in the >> iar data structure. The offending value resulted to be simply equal >> to the largest possible value accepted by the iar data structure, plus >> one. >> >> So, yes, this is an off-by-one error. More precisely, there is a >> mismatch between the above code (or the values stored the iar data >> structure) and the value of bio_end_sector (the latter is passed as >> input to the above function). bio_end_sector does not seem to give >> the end sector of a request (equal to begin+size-1), as apparently >> expected by the above code, but the sector after the last one (namely, >> begin+size). >> >> Should I express an opinion on where the error is, I would say that >> the mistake is in bio_end_sector. But I could be totally wrong, as >> I'm not a expert of either of the involved parts. In addition, >> bio_end_sector is already in use, with its current formula, by other >> parts of the block layer. If you guys (Damien, Jens, Tyler?, ...) >> give us some guidance on what to fix exactly, we will be happy to make >> a fix. > > I do not think there is any error/problem anywhere with bio_end_sector(). But > indeed, the name is slightly misleading as it doe not return the last sector > processed by the bio but the next one. The reason is that with this > implementation, bio_end_sector() always gives you the first sector for the next > bio with a multi-bio request. > > When you need the last sector processed by the bio, you need to use > bio_end_sector(bio) - 1. > > So to find the access range that served a completed bio, you simply need to call: > > disk_find_ia_range(iars, bio_end_sector(bio) - 1); > Thank you very much, I got it now. > You probably could add a couple of helper functions for getting an access range > from a bio sector or end sector. Something like: > > static inline struct blk_independent_access_range * > __bio_sector_access_range(struct bio *bio, sector_t sector) > { > struct gendisk *disk = bio->bi_bdev->bd_disk; > > if (!disk->ia_ranges) > return NULL; > > iar = disk_find_ia_range(disk->ia_ranges, sector); > WARN_ONCE(!iar, > "bfq_actuator_index: bio sector out of ranges: end=%llu\n", > end); > > return iar; > } > > static inline struct blk_independent_access_range * > bio_sector_access_range(struct bio *bio) > { > return __bio_sector_access_range(bio, bio_sector(bio)); > } > > static inline struct blk_independent_access_range * > bio_end_sector_access_range(struct bio *bio) > { > return __bio_sector_access_range(bio, bio_end_sector(bio) - 1); > } > > or similar. Then your code will be simpler and much less worries about of-by-one > bugs. > For the moment, bio_end_sector is invoked in only one place (for the range-computation stuff). So, for simplicity, I'd go for just appending a -1 after such invocation. To give you credits, I'm adding you in a reviewed-by tag. If anything of this is wrong, I'm willing to change it. I'll send a V2 soon. Thanks, Paolo >> >> Thanks, >> Paolo >> >> >> >>> >>>> >>>> Thanks, Paolo >>>> >>>> [1] https://www.spinics.net/lists/kernel/msg4507408.html >>> >>> -- >>> Damien Le Moal >>> Western Digital Research >> >> > > -- > Damien Le Moal > Western Digital Research