On Mon, 2013-09-30 at 22:00 +0200, Bernd Schubert wrote: > On 09/30/2013 09:34 PM, Myklebust, Trond wrote: > > On Mon, 2013-09-30 at 20:49 +0200, Bernd Schubert wrote: > >> On 09/30/2013 08:02 PM, Myklebust, Trond wrote: > >>> On Mon, 2013-09-30 at 19:48 +0200, Bernd Schubert wrote: > >>>> On 09/30/2013 07:44 PM, Myklebust, Trond wrote: > >>>>> On Mon, 2013-09-30 at 19:17 +0200, Bernd Schubert wrote: > >>>>>> It would be nice if there would be way if the file system would get a > >>>>>> hint that the target file is supposed to be copy of another file. That > >>>>>> way distributed file systems could also create the target-file with the > >>>>>> correct meta-information (same storage targets as in-file has). > >>>>>> Well, if we cannot agree on that, file system with a custom protocol at > >>>>>> least can detect from 0 to SSIZE_MAX and then reset metadata. I'm not > >>>>>> sure if this would work for pNFS, though. > >>>>> > >>>>> splice() does not create new files. What you appear to be asking for > >>>>> lies way outside the scope of that system call interface. > >>>>> > >>>> > >>>> Sorry I know, definitely outside the scope of splice, but in the context > >>>> of offloaded file copies. So the question is, what is the best way to > >>>> address/discuss that? > >>> > >>> Why does it need to be addressed in the first place? > >> > >> An offloaded copy is still not efficient if different storage > >> servers/targets used by from-file and to-file. > > > > So? > > mds1: orig-file > oss1/target1: orig-chunk1 > > mds1: target-file > ossN/targetN: target-chunk1 > > clientN: Performs the copy > > Ideally, orig-chunk1 and target-chunk1 are on the same server and same > target. Copy offload then even could done from the underlying fs, > similiar as local splice. > If different ossN servers are used copies still have to be done over > network by these storage servers, although the client only would need to > initiate the copy. Still faster, but also not ideal. > > > > >>> > >>> What is preventing an application from retrieving and setting this > >>> information using standard libc functions such as fstat()+open(), and > >>> supplemented with libattr attr_setf/getf(), and libacl acl_get_fd/set_fd > >>> where appropriate? > >>> > >> > >> At a minimum this requires network and metadata overhead. And while I'm > >> working on FhGFS now, I still wonder what other file system need to do - > >> for example Lustre pre-allocates storage-target files on creating a > >> file, so file layout changes mean even more overhead there. > > > > The problem you are describing is limited to a narrow set of storage > > architectures. If copy offload using splice() doesn't make sense for > > those architectures, then don't implement it for them. > > But it _does_ make sense. The file system just needs a hint that a > splice copy is going to come up. Just wait for the splice() system call. How is this any different from write()? > > You might be able to provide ioctls() to do these special hinted file > > creations for those filesystems that need it, but the vast majority > > don't, and you shouldn't enforce it on them. > > And exactly for that we need a standard - it does not make sense if each > and every distributed file system implements its own > ioctl/libattr/libacl interface for that. > > > > >> Anyway, if we could agree on to use libattr or libacl to teach the file > >> system about the upcoming splice call I would be fine. > > > > libattr and libacl are generic libraries that exist to manipulate xattrs > > and acls. They do not need to contain Lustre-specific code. > > > > pNFS, FhGFS, Lustre, Ceph, etc., all of them shall implement their own > interface? And userspace needs to address all of them differently? > > I'm just asking for something like a vfs ioctl SPLICE_META_COPY (sorry, > didn't find a better name yet), which would take in-file-path and > out-file-path and allow the file system to create out-file-path with the > same meta-layout as in-file-path. And it would need some flags, such as > AUTO (file system decides if it makes sense to do a local copy) and > FORCE (always try a local copy). splice() is not a whole-file copy operation; it's a byte range copy. How does the above help other than in the whole-file case? -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{���)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥