> -----Original Message----- > From: Ric Wheeler [mailto:rwheeler@xxxxxxxxxx] > Sent: Monday, September 30, 2013 10:29 AM > To: Miklos Szeredi > Cc: J. Bruce Fields; Myklebust, Trond; Zach Brown; Anna Schumaker; Kernel > Mailing List; Linux-Fsdevel; linux-nfs@xxxxxxxxxxxxxxx; Schumaker, Bryan; > Martin K. Petersen; Jens Axboe; Mark Fasheh; Joel Becker; Eric Wong > Subject: Re: [RFC] extending splice for copy offloading > > On 09/30/2013 10:24 AM, Miklos Szeredi wrote: > > On Mon, Sep 30, 2013 at 4:52 PM, Ric Wheeler <rwheeler@xxxxxxxxxx> > wrote: > >> On 09/30/2013 10:51 AM, Miklos Szeredi wrote: > >>> On Mon, Sep 30, 2013 at 4:34 PM, J. Bruce Fields > >>> <bfields@xxxxxxxxxxxx> > >>> wrote: > >>>>> My other worry is about interruptibility/restartability. Ideas? > >>>>> > >>>>> What happens on splice(from, to, 4G) and it's a non-reflink copy? > >>>>> Can the page cache copy be made restartable? Or should splice() be > >>>>> allowed to return a short count? What happens on (non-reflink) > >>>>> remote copies and huge request sizes? > >>>> If I were writing an application that required copies to be > >>>> restartable, I'd probably use the largest possible range in the > >>>> reflink case but break the copy into smaller chunks in the splice case. > >>>> > >>> The app really doesn't want to care about that. And it doesn't want > >>> to care about restartability, etc.. It's something the *kernel* has > >>> to care about. You just can't have uninterruptible syscalls that > >>> sleep for a "long" time, otherwise first you'll just have annoyed > >>> users pressing ^C in vain; then, if the sleep is even longer, > >>> warnings about task sleeping too long. > >>> > >>> One idea is letting splice() return a short count, and so the app > >>> can safely issue SIZE_MAX requests and the kernel can decide if it > >>> can copy the whole file in one go or if it wants to do it in smaller > >>> chunks. > >>> > >> You cannot rely on a short count. That implies that an offloaded copy > >> starts at byte 0 and the short count first bytes are all valid. > > Huh? > > > > - app calls splice(from, 0, to, 0, SIZE_MAX) > > 1) VFS calls ->direct_splice(from, 0, to, 0, SIZE_MAX) > > 1.a) fs reflinks the whole file in a jiffy and returns the size of the file > > 1 b) fs does copy offload of, say, 64MB and returns 64M > > 2) VFS does page copy of, say, 1MB and returns 1MB > > - app calls splice(from, X, to, X, SIZE_MAX) where X is the new offset > > ... > > > > The point is: the app is always doing the same (incrementing offset > > with the return value from splice) and the kernel can decide what is > > the best size it can service within a single uninterruptible syscall. > > > > Wouldn't that work? > > > > Thanks, > > Miklos > > No. > > Keep in mind that the offload operation in (1) might fail partially. The target > file (the copy) is allocated, the question is what ranges have valid data. > > I don't see that (2) is interesting or really needed to be done in the kernel. > If nothing else, it tends to confuse the discussion.... > Anna's figures, that were presented at Plumber's, show that (2) is still worth doing on the _server_ for the case of NFS. Cheers Trond ��.n��������+%������w��{.n�����{���)��jg��������ݢj����G�������j:+v���w�m������w�������h�����٥