Re: Integration of SCST in the mainstream Linux kernel

James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx> · Mon, 04 Feb 2008 12:22:02 -0600

On Mon, 2008-02-04 at 20:56 +0300, Vladislav Bolkhovitin wrote:
> James Bottomley wrote:
> > On Mon, 2008-02-04 at 20:16 +0300, Vladislav Bolkhovitin wrote:
> > 
> >>James Bottomley wrote:
> >>
> >>>>>>So, James, what is your opinion on the above? Or the overall SCSI target 
> >>>>>>project simplicity doesn't matter much for you and you think it's fine 
> >>>>>>to duplicate Linux page cache in the user space to keep the in-kernel 
> >>>>>>part of the project as small as possible?
> >>>>>
> >>>>>
> >>>>>The answers were pretty much contained here
> >>>>>
> >>>>>http://marc.info/?l=linux-scsi&m=120164008302435
> >>>>>
> >>>>>and here:
> >>>>>
> >>>>>http://marc.info/?l=linux-scsi&m=120171067107293
> >>>>>
> >>>>>Weren't they?
> >>>>
> >>>>No, sorry, it doesn't look so for me. They are about performance, but 
> >>>>I'm asking about the overall project's architecture, namely about one 
> >>>>part of it: simplicity. Particularly, what do you think about 
> >>>>duplicating Linux page cache in the user space to have zero-copy cached 
> >>>>I/O? Or can you suggest another architectural solution for that problem 
> >>>>in the STGT's approach?
> >>>
> >>>
> >>>Isn't that an advantage of a user space solution?  It simply uses the
> >>>backing store of whatever device supplies the data.  That means it takes
> >>>advantage of the existing mechanisms for caching.
> >>
> >>No, please reread this thread, especially this message: 
> >>http://marc.info/?l=linux-kernel&m=120169189504361&w=2. This is one of 
> >>the advantages of the kernel space implementation. The user space 
> >>implementation has to have data copied between the cache and user space 
> >>buffer, but the kernel space one can use pages in the cache directly, 
> >>without extra copy.
> > 
> > 
> > Well, you've said it thrice (the bellman cried) but that doesn't make it
> > true.
> > 
> > The way a user space solution should work is to schedule mmapped I/O
> > from the backing store and then send this mmapped region off for target
> > I/O.  For reads, the page gather will ensure that the pages are up to
> > date from the backing store to the cache before sending the I/O out.
> > For writes, You actually have to do a msync on the region to get the
> > data secured to the backing store. 
> 
> James, have you checked how fast is mmaped I/O if work size > size of 
> RAM? It's several times slower comparing to buffered I/O. It was many 
> times discussed in LKML and, seems, VM people consider it unavoidable. 

Erm, but if you're using the case of work size > size of RAM, you'll
find buffered I/O won't help because you don't have the memory for
buffers either.

> So, using mmaped IO isn't an option for high performance. Plus, mmaped 
> IO isn't an option for high reliability requirements, since it doesn't 
> provide a practical way to handle I/O errors.

I think you'll find it does ... the page gather returns -EFAULT if
there's an I/O error in the gathered region.  msync does something
similar if there's a write failure.

> > You also have to pull tricks with
> > the mmap region in the case of writes to prevent useless data being read
> > in from the backing store.
> 
> Can you be more exact and specify what kind of tricks should be done for 
> that?

Actually, just avoid touching it seems to do the trick with a recent
kernel.

James

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html