Re: [PATCH 1/4] fs/splice: enhance direct pipe & splice for moving pages in kernel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Miklos,

On Tue, Feb 14, 2023 at 12:03:44PM +0100, Miklos Szeredi wrote:
> On Mon, 13 Feb 2023 at 21:04, Linus Torvalds
> <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Sat, Feb 11, 2023 at 5:39 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote:
> > >
> > > >
> > > >  (a) what's the point of MAY_READ? A non-readable page sounds insane
> > > > and wrong. All sinks expect to be able to read.
> > >
> > > For example, it is one page which needs sink end to fill data, so
> > > we needn't to zero it in source end every time, just for avoiding
> > > leak kernel data if (unexpected)sink end simply tried to read from
> > > the spliced page instead of writing data to page.
> >
> > I still don't understand.
> >
> > A sink *reads* the data. It doesn't write the data.
> 
> I think Ming is trying to generalize splice to allow flowing data in
> the opposite direction.

I think it isn't opposite direction, it is just that sink may be
WRITE to buffer, and the model is:

device(produce buffer in ->splice_read()) -> direct pipe ->
	file(consume buffer via READ or WRITE)

Follows kernel side implementation:

	splice_direct_to_actor(pipe, sd, source_actor)

	direct_actor():
		__splice_from_pipe(pipe, sd, sink_actor)

	sink_actor():
		get_page()

then read from file/socket to page.

The current userspace owns the whole buffer, so I understand the buffer
ownership can be transferred to consumer/sink side.

> So yes, sink would be writing to the buffer.
> And it MUST NOT be reading the data since the buffer may be
> uninitialized.

The added SPLICE_F_PRIV_FOR_READ[WRITE] is enough to avoid
un-expected READ, but the source end needs to confirm the buffer
ownership can be transferred to consumer, probably PIPE_BUF_FLAG_GIFT
can be used for this purpose.

> 
> The problem is how to tell the original source that the buffer is
> ready?  PG_uptodate comes to mind, but pipe buffers allow partial
> pages to be passed around, and there's no mechanism to describe a
> partially uptodate buffer.

I understand it isn't one issue from block device driver viewpoint at
least, since the WRITE to buffer in sink end can be thought as DMA
to buffer from device, and it is the upper layer(FS)'s responsibility
for updating page flag. And block driver needn't to handle page
status update.

So seems it is one fuse specific issue?


Thanks,
Ming




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux