On Wed, Apr 03, 2024 at 01:58:38PM -0400, Stefan Hajnoczi wrote: > > Yes, we want to ensure that: > > > > off1 = lseek(fd, -1, SEEK_END); > > off2 = off1 + 1; // == lseek(fd, 0, SEEK_END) > > > > if off1 belongs to a data extent: > > - lseek(fd, off1, SEEK_DATA) == off1 > > - lseek(fd, off1, SEEK_HOLE) == off2 > > - lseek(fd, off2, SEEK_DATA) == -ENXIO > > - lseek(fd, off2, SEEK_HOLE) == -ENXIO > > Agreed. > > > if off1 belongs to a hole: > > - lseek(fd, off1, SEEK_DATA) == -ENXIO > > - lseek(fd, off1, SEEK_HOLE) == off1 > > - lseek(fd, off2, SEEK_DATA) == -ENXIO > > - lseek(fd, off2, SEEK_HOLE) == -ENXIO > > Agreed. > ... > > > > That is, we can never pass in off2 (beyond the bounds of the table), > > and when passing in off1, I think this interface may be easier to work > > with in the intermediate layers, even though it differs from the > > lseek() interface above. For off1 in data: > > - dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off1 > > - dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off2 > > and for a hole: > > - dm_blk_do_seek_hole_data(dm, off1, SEEK_DATA) == off2 > > - dm_blk_do_seek_hole_data(dm, off1, SEEK_HOLE) == off1 > In the caller, we already need to know both off1 and off2 (that is, the offset we are asking about, AND the offset at which the underlying component ends its map at, since that is where we are then in charge of whether to concatenate that with the next component or give the overall result). If we already guarantee that we never call into a lower layer with off2 (because it was beyond bounds), then the only difference between the two semantics is whether SEEK_DATA in a final hole returns -ENXIO or off2; and it looks like we can do a conversion from either style into the other. So designing the caller logic, it looks like I would want: if off1 >= EOF return -ENXIO (out of bounds) while off1 < EOF: determine off2 of current lower region at this point, we are guaranteed off1 < off2 we also know that off2 < EOF unless we are on last lower region call result=lower_layer(off1, SEEK_X) it is a bug if result is non-negative but < off1, or if result > off2 if result == off1, return result (we are already in the right HOLE or DATA) if result > off1 and < off2, return result (we had to advance to get to the right region, but the right region was within the lower layer, and we don't need to inspect further layers) Note - the above two paragraphs can be collapsed into one: if result < off2, return result (the current subregion gave us an adequate answer) if result == off2, continue to the next region with off1=result (in first semantics, this is only possible for SEEK_HOLE for a lower region ending in data; in the second semantics, it is possible for either SEEK_HOLE or SEEK_DATA for a lower region ending in the type opposite of the request) if result == -ENXIO, continue to the next region by using off1=off2 (only possible in the first semantics for SEEK_DATA on a lower region ending in a hole) any other result is necessarily a negative errno like -EIO; return it if the loop completes without an early return, we are out of lower regions, and we should be left with off1 == EOF. Because we advanced, we know now to: return whence == SEEK_HOLE ? off1 : -ENXIO > I'll take a look again starting from block/fops.c, through dm.c, and > into dm-linear.c to see how to make things clearest. Although I would > like to have the same semantics for every seek function, maybe in the > end your suggestion will make the code clearer. Let's see. Keeping lseek semantics may require a couple more lines of code duplicated at each layer, but less maintainer headaches in the long run of converting between slightly different semantics depending on which layer you are at. -- Eric Blake, Principal Software Engineer Red Hat, Inc. Virtualization: qemu.org | libguestfs.org