Re: RFC: [PATCH] staging/lustre/llite: fix O_TMPFILE/O_LOV_DELAY_CREATE conflict

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 11, 2014 at 06:55:59AM +0000, Xiong, Jinshan wrote:
>> 4) cl_io_init() which calls
>> 
>> 5) cl_io_init0() which possibly calls a bunch of instances of
>> ->coo_io_init(),
>> which may or may not return 0; I _hope_ that in this case that's what'll
>> happen and we get back to ll_file_io_generic(), where
> 
>> Umm...  So am I right about the thing returning 0 in this case and in
>> case of ->aio_read()?
> 
> Yes, cio_init() can return one of the following integers:
>  = 0: IO can go forward
>  > 0: IO is not necessary
> < 0: error occurs
> 
> So it will call aio_read() if cl_io_init() returns zero.

Er...  The question is the other way round, actually - will it return
zero when we arrive there from ll_file_aio_read()?  Or could it happen
that it returns something positive in that case?

> Let’s take an example here, say a file has two stripes on OST0 and OST1, and stripe size is 1M, so when 2M data is to be written to this file starting from zero, then the above code will loop twice to finish this IO.
> The first iteration will write [0, 1M) to OST0, and the 2nd one will write [1M, 2M) to OST1. As a result, generic_file_aio_write() will be called twice specifically.

_Twice_?  Oh, I see... ccc_io_advance(), right?  With iov_iter the whole
->cio_advance() thing would die, AFAICS.

> Yes. But as you have already noticed, the ppos in ll_file_io_generic() is NOT always &iocb->ki_pos. So it isn’t totally wrong to have
> 
> “*ppos = io->u.ci_wr.wr.crw_pos;”
> 
> in ll_file_io_generic(), take a look at ll_file_splice_read().

->splice_read/->splice_write is another part of that story; eventually,
I hope to get rid of those guys, when we get polymorphic iov_iter sorted
out.  Short version of the story: turn iov_iter into a tagged structure with
variants; primitives working with it would check the tag to decide what to
do.  Think of *BSD struct uio on steroids; AFAIK, Solaris has kept it as
well.  One of the cases would be iovec-based, with or without splitting the
kernel-space iovec into separate case.  That's pretty much what *BSD one
provides and what iov_iter does right now.  Another case: array of
<struct page *page, size_t offset, size_t size> triples.  ->splice_write()
would simply set such an array for all non-empty pipe_buffers and pass
it to ->write_iter().  That gives a usable generic implementation, suitable
for pretty much all filesystems.  _Maybe_ it can even be taught to deal
with page stealing, in a way that would allow to kill ->splice_write() as
a method - I didn't get to the point where it would be easy to investigate.
->splice_read() ought to use another iov_iter case, where copy_page_to_iter()
would grab a reference to page and plain copy_to_iter() would allocate
a page if needed and copy over there.

Hell knows; I'm still dealing with preliminary messes and with splice_write
side of things.  We'll see what set of primitives will it shake down to.
So far it's been spread between several fsdevel threads (and some bits of
off-list mail when potentially nasty bugs got caught).  I hope to get enough
of the queue stable enough and large enough to make it worth discussing.
Hopefully today or tomorrow there'll be enough meat on it...  In any case
I'm going to bring it up on LSF/MM.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux