Has something changed in the RFC, the knfsd code, or the client with respect to seeking a hole at the end of file, particularly a legacy file that has no actual holes? Thanks Frank > -----Original Message----- > From: Kinglong Mee [mailto:kinglongmee@xxxxxxxxx] > Sent: Wednesday, September 12, 2018 5:04 PM > To: Frank Filz <ffilzlnx@xxxxxxxxxxxxxx>; 'Anna Schumaker' > <schumaker.anna@xxxxxxxxx>; 'Trond Myklebust' > <trondmy@xxxxxxxxxxxxxxx>; devel@xxxxxxxxxxxxxxxxxxxxx; linux- > nfs@xxxxxxxxxxxxxxx > Subject: [NFS-Ganesha-Devel] Re: lseek gets bad offset from nfs client with > ganesha/gluster which supports SEEK > > On 2018/9/12 19:58, Frank Filz wrote: > >> On 2018/9/12 7:20, Frank Filz wrote: > >>>> On Tue, 2018-09-11 at 22:47 +0800, Kinglong Mee wrote: > >>>>> On 2018/9/11 20:57, Trond Myklebust wrote: > >>>>>> On Tue, 2018-09-11 at 20:29 +0800, Kinglong Mee wrote: > >>>>>>> The latest ganesha/gluster supports seek according to, > >>>>>>> > >>>>>>> > >>>>>> > >>>>>> https://tools.ietf.org/html/draft-ietf-nfsv4-minorversion2-41#sec > >>>>>> ti > >>>>>> o > >>>>>> n-15.11 > >>>>>>> > >>>>>>> From the given sa_offset, find the next data_content4 of type > >>>>>>> sa_what > >>>>>>> in the file. If the server can not find a corresponding sa_what, > >>>>>>> then the status will still be NFS4_OK, but sr_eof would be > >>>>>>> TRUE. If > >>>>>>> the server can find the sa_what, then the sr_offset is the > >>>>>>> start of > >>>>>>> that content. If the sa_offset is beyond the end of the > >>>>>>> file, then > >>>>>>> SEEK MUST return NFS4ERR_NXIO. > >>>>>>> > >>>>>>> For a file's filemap as, > >>>>>>> > >>>>>>> Part 1: HOLE 0x0000000000000000 ---> 0x0000000000600000 > >>>>>>> Part 2: DATA 0x0000000000600000 ---> 0x0000000000700000 > >>>>>>> Part 3: HOLE 0x0000000000700000 ---> 0x0000000001000000>> > >>>>>>> SEEK(0x700000, SEEK_DATA) gets result (sr_eof:1, > >>>>>>> sr_offset:0x70000) from ganesha/gluster; SEEK(0x700000, > >>>>>>> SEEK_HOLE) gets result (sr_eof:0, sr_offset:0x70000) from > ganesha/gluster. > >>>>>>> > >>>>>>> If an application depends the lseek result for data searching, > >>>>>>> it may enter infinite loop. > >>>>>>> > >>>>>>> while (1) { > >>>>>>> next_pos = lseek(fd, cur_pos, seek_type); > >>>>>>> if (seek_type == SEEK_DATA) { > >>>>>>> seek_type = SEEK_HOLE; > >>>>>>> } else { > >>>>>>> seek_type = SEEK_DATA; > >>>>>>> } > >>>>>>> > >>>>>>> if (next_pos == -1) { > >>>>>>> return ; > >>>>>>> > >>>>>>> cur_pos = next_pos; > >>>>>>> } > >>>>>>> > >>>>>>> The lseek syscall always gets 0x70000 from nfs client for those > >>>>>>> two cases, but, if underlying filesystem is ext4/f2fs, or the > >>>>>>> nfs server is knfsd, the lseek(0x700000, SEEK_DATA) gets ENXIO. > >>>>>>> > >>>>>>> I wanna to know, > >>>>>>> should I fix the ganesha/gluster as knfsd return ENXIO for the > >>>>>>> first case? > >>>>>>> or should I fix the nfs client to return ENXIO for the first case? > >>>>>>> > >>>>>> > >>>>>> It that test correct? The fallback implementation of SEEK_DATA > >>>>>> assumes that the entire file is data, so lseek(SEEK_DATA) on any > >>>>>> offset that is <= eof will be a no-op. The fallback > >>>>>> implementation of SEEK_HOLE assumes that the first hole is at eof. > >>>>> > >>>>> I think that's non-nfsv4.2's logical. > >>>>> > >>>>>> > >>>>>> IOW: unless the initial value for cur_pos is > eof, it looks to > >>>>>> me as if the above test will loop infinitely given any filesystem > >>>>>> that doesn't implement native support for SEEK_DATA/SEEK_HOLE. > >>>>>> > >>>>> > >>>>> No, if underlying filesystem is ext4/f2fs, or the nfs server is > >>>>> knfsd, the last lseek syscall always return ENXIO no matter the > >>>>> cur_pos is > eof or not. > >>>>> > >>>>> A file ends with a hole as, > >>>>> Part 22: DATA 0x0000000006a00000 ---> 0x0000000006afffff > >>>>> Part 23: HOLE 0x0000000006b00000 ---> 0x000000000c7fffff > >>>>> > >>>>> # stat testfile > >>>>> File: testfile > >>>>> Size: 209715200 Blocks: 22640 IO Block: 4096 regular file > >>>>> Device: 807h/2055d Inode: 843122 Links: 2 > >>>>> Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) > >>>>> Access: 2018-09-11 20:01:41.881227061 +0800 > >>>>> Modify: 2018-09-11 20:01:41.976478311 +0800 > >>>>> Change: 2018-09-11 20:01:41.976478311 +0800 > >>>>> Birth: - > >>>>> > >>>>> # strace filemap testfile > >>>>> ... ... > >>>>> lseek(3, 111149056, SEEK_HOLE) = 112197632 > >>>>> lseek(3, 112197632, SEEK_DATA) = -1 ENXIO (No such device or > >> address) > >>>>> > >>>>> Right now, when knfsd gets the ENXIO error, it returns the error > >>>>> to client directly, and return to syscall. > >>>>> But, ganesha set the sr_eof to true and return NFS4_OK to client > >>>>> as RFC description, nfs client skips the sr_eof and return a bad > >>>>> offset to syscall. > >>>> > >>>> Would it make more sense to change Knfsd instead of the client? I > >>>> think I was trying to keep things simple when I wrote the code, so > >>>> I just passed the result of the lseek system call back to the client. > >>> > >>> Looking at the lseek(2) man page, it's not clear to me what should > >>> be returned > >> if as in this example, there is a HOLE at the end of the file (i.e. > >> filesize is larger than the range of the last DATA in the file). It > >> sounds like ext4 returns ENXIO if a SEEK_DATA is done past the last data in > the file. > >>> > >>> SEEK_DATA > >>> Adjust the file offset to the next location in the > >>> file greater than or > >> equal to offset containing data. If offset points to data, then the > >> file offset is set > >>> to offset. > >>> > >>> SEEK_HOLE > >>> Adjust the file offset to the next hole in the file > >>> greater than or equal > >> to offset. If offset points into the middle of a hole, then the file > >> offset is set to > >>> offset. If there is no hole past offset, then the > >>> file offset is adjusted to > >> the end of the file (i.e., there is an implicit hole at the end of any file). > >>> > >>> In both of the above cases, lseek() fails if offset points > >>> past the end of the > >> file. > >>> > >>> These operations allow applications to map holes in a > >>> sparsely allocated > >> file. This can be useful for applications such as file backup tools, > >> which can save space when > >>> creating backups and preserve holes, if they have a mechanism > >>> for > >> discovering holes. > >>> > >>> For the purposes of these operations, a hole is a sequence of > >>> zeros that > >> (normally) has not been allocated in the underlying file storage. > >> However, a filesystem is not > >>> obliged to report holes, so these operations are not a > >>> guaranteed > >> mechanism for mapping the storage space actually allocated to a file. > >> (Furthermore, a sequence of > >>> zeros that actually has been written to the underlying > >>> storage may not be > >> reported as a hole.) In the simplest implementation, a filesystem > >> can support the operations > >>> by making SEEK_HOLE always return the offset of the end of > >>> the file, and > >> making SEEK_DATA always return offset (i.e., even if the location > >> referred to by offset is a > >>> hole, it can be considered to consist of data that is a sequence of zeros). > >>> > >>> The RFC text is pretty clear: > >>> > >>> SEEK is an operation that allows a client to determine the location > >>> of the next data_content4 in a file. It allows an implementation of > >>> the emerging extension to lseek(2) to allow clients to determine the > >>> next hole whilst in data or the next data whilst in a hole. > >>> > >>> From the given sa_offset, find the next data_content4 of type sa_what > >>> in the file. If the server can not find a corresponding sa_what, > >>> then the status will still be NFS4_OK, but sr_eof would be TRUE. If > >>> the server can find the sa_what, then the sr_offset is the start of > >>> that content. If the sa_offset is beyond the end of the file, then > >>> SEEK MUST return NFS4ERR_NXIO. > >>> > >>> All files MUST have a virtual hole at the end of the file. I.e., if > >>> a filesystem does not support sparse files, then a compound with > >>> {SEEK 0 NFS4_CONTENT_HOLE;} would return a result of {SEEK 1 X;} > >>> where 'X' was the size of the file. > >>> > >>> Sa_offset is not past the end of the file, but there is no more > >>> DATA, so a seek > >> DATA from 0x70000 (original file) should return sr_eof TRUE. > >>> > >>> In either RFC or lseek(2), a seek HOLE for 0x70000 will return 0x70000. > >>> > >>> It certainly makes sense that you should be able to have a hole at > >>> the end of a > >> file (pre-allocated disk blocks but no data written yet), and is in > >> fact what > >> fallocate(2) will do. > >>> > >>> An NFS server could check the filesize and if sa_offset is < > >>> filesize and a > >> SEEK_DATA returns ENXIO, it could translate that to NFS4_OK and set > >> sr_eof to TRUE. > >>> > >>> The Ganesha code in FSAL_GLUSTER I believe is wrong. It changes any > >>> ENXIO > >> result to NFS4_OK with sr_eof TRUE. It would be better for it to do > >> the simple thing knfsd does of always passing along the ENXIO (this > >> may be best if it is not possible to safely verify sa_offset really is < filesize). > >> > >> Do you mean modifying ganesha/gluster as knfsd does? > >> > >> seek->seek_pos = vfs_llseek(file, seek->seek_offset, whence); > >> if (seek->seek_pos < 0) > >> status = nfserrno(seek->seek_pos); > >> else if (seek->seek_pos >= i_size_read(file_inode(file))) > >> seek->seek_eof = true; > >> > >> It is a working implementation, but not according to RFC description, > >> > >> If the server can not find a corresponding sa_what, > >> then the status will still be NFS4_OK, but sr_eof would be TRUE. > >> > >> As in this example, there is a HOLE at the end of the file, SEEK(in > >> hole, > >> SEEK_DATA) should return NFS4_OK and sr_eof is TRUE, but knfsd return > >> NFS4ERR_NXIO. > > > > FSAL_GLUSTER always translates lseek return of ENXIO to NFS4_Ok with > sr_eod TRUE. > > > > It should at least ONLY do that if sa_offset is < filesize (which would then be > correct per RFC). > > > > Knfsd, to my understanding, looks like it always just returns ENXIO (which isn't > exactly per RFC, but at least doesn't confuse the client and application as badly). > > > > Copy that. > I will push a fix as knfsd. > > thanks, > Kinglong Mee > _______________________________________________ > Devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxx To unsubscribe send an email to > devel-leave@xxxxxxxxxxxxxxxxxxxxx