On 04/22/2011 10:28 AM, Sunil Mushran wrote: > On 04/22/2011 04:50 AM, Eric Blake wrote: >> That blog also mentioned the useful idea of adding FIND_HOLE and >> FIND_DATA, not implemented in Solaris, but which could easily be >> provided as additional lseek constants in Linux to locate the start of >> the next chunk without repositioning and which could ease application >> programmer's life a bit. After all, cp wants to know where data ends >> without repositioning (FIND_HOLE), read() that much data which >> repositions in the process, then skip to the next chunk of data >> (SEEK_DATA) - two lseek() calls per iteration if we have 4 constants, >> but 3 per iteration if we only have SEEK_HOLE and have to manually >> rewind. > > while(1) { > read(block); > if (block_all_zeroes) > lseek(SEEK_DATA); > } > > What's wrong with the above? If this is the case, even SEEK_HOLE > is not needed but should be added as it is already in Solaris. Because you don't know if the block is the same size as the minimum hole, and because some systems require rather large holes (my Solaris testing on a zfs system didn't have holes until 128k), that's a rather large amount of reading just to prove that the block has all zeros to know that it is even worth trying the lseek(SEEK_DATA). My gut feel is that doing the lseek(SEEK_HOLE) up front coupled with seeking back to the same position is more efficient than manually checking for a run of zeros (less cache pollution, works with 4k read buffers without having to know filesystem hole size). > > My problem with FIND_* is that we are messing with the well understood > semantics of lseek(). You'll notice I didn't propose any FIND_* constants for POSIX. > And if generic_file_llseek_unlocked() treats SEEK_DATA as SEEK_CUR and You meant SEEK_SET not SEEK_CUR, but... > SEEK_HOLE as SEEK_END (both with zero offset) then we don't even > have to bother with the finding the "correct" error code. ...that's still not compatible with Solaris. On a file with size of 0 bytes, lseek(fd, 1, SEEK_SET) and lseek(fd, 0, SEEK_END) will both succeed, but lseek(fd, 1, SEEK_DATA) and lseek(fd, 0, SEEK_HOLE) must fail with ENXIO (the offset was at or beyond the size of the file). For a file with no holes, Solaris semantics behave as if: off_t lseek(int fildes, off_t offset, int whence) { off_t cur, end; switch (whence) { case SEEK_HOLE: case SEEK_DATA: cur = lseek(fildes, 0, SEEK_CUR); if (cur < 0) return cur; end = lseek(fildes, 0, SEEK_END); if (end < 0) return end; if (offset < end) return whence == SEEK_HOLE ? end : lseek(fildes, offset, SEEK_SET); lseek(fildes, cur, SEEK_SET); errno = ENXIO; return -1; default: ... /* Existing implementation */ } } -- Eric Blake eblake@xxxxxxxxxx +1-801-349-2682 Libvirt virtualization library http://libvirt.org
Attachment:
signature.asc
Description: OpenPGP digital signature