> On Sep 4, 2020, at 10:49 AM, J. Bruce Fields <bfields@xxxxxxxxxx> wrote: > > On Fri, Sep 04, 2020 at 10:36:36AM -0400, Chuck Lever wrote: >>> On Sep 4, 2020, at 10:29 AM, Bruce Fields <bfields@xxxxxxxxxxxx> wrote: >>> It also doesn't guarantee that the results tell you >>> anything about how the file is actually stored--a returned "hole" could >>> represent an unallocated segment, or a fully allocated segment that's >>> filled with zeroes, or some combination. >> >> Understood, but the resulting copied file should look the same whether >> it was read from the server using READ_PLUS or SEEK_DATA/HOLE. > > I'm uncomfortable about promising that. The server should be able to represent a file with holes in exactly the same way with both mechanisms, unless there is a significant flaw in the protocols or implementation. The client's behavior is also important here, so the guarantee would have to be about how the server presents the holes. A quality client implementation would be able to use this guarantee to reconstruct the holes exactly. > What do you think might go wrong otherwise? I don't see a data corruption issue here, if that's what you mean. Suppose the server has a large file with a lot of holes, and these holes are all unallocated. This might be typical of a container image. Suppose further the client is able to punch holes in a destination file as a thin provisioning mechanism. Now, suppose we copy the file via TCP/READ_PLUS, and that preserves the holes. Copy with RDMA/SEEK_HOLE and maybe it doesn't preserve holes. The destination file is now significantly larger and less efficiently stored. Or maybe it's the other way around. Either way, one mechanism is hole-preserving and one isn't. A quality implementation would try to preserve holes as much as possible so that the server can make smart storage provisioning decisions. -- Chuck Lever