Re: [PATCH] direct-io: allow file systems to do their own waiting for io V2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Dec 19, 2012 at 07:38:41AM -0700, Christoph Hellwig wrote:
> I have to say I still hate the flag magic in here.  Spent some time to
> look over things to be a bit more constructive in getting what you
> guys want in a nicer way:
> 
> >  static void dio_bio_end_io(struct bio *bio, int error)
> >  {
> >  	struct dio *dio = bio->bi_private;
> >  	unsigned long flags;
> > +	unsigned long remaining;
> > +	bool own_waiting = ((dio->rw & WRITE) &&
> > +			    (dio->flags & DIO_OWN_WAITING));
> > +
> > +	if (own_waiting)
> > +		dio_bio_complete(dio, bio);
> >  
> >  	spin_lock_irqsave(&dio->bio_lock, flags);
> > +	if (!own_waiting) {
> > +		bio->bi_private = dio->bio_list;
> > +		dio->bio_list = bio;
> > +	}
> > +	remaining = --dio->refcount;
> > +	if (remaining == 1 && dio->waiter)
> >  		wake_up_process(dio->waiter);
> >  	spin_unlock_irqrestore(&dio->bio_lock, flags);
> > +
> > +	if (remaining == 0) {
> > +		BUG_ON(!(dio->flags & DIO_OWN_WAITING));
> > +		dio_complete(dio, dio->iocb->ki_pos, 0, false);
> > +		kmem_cache_free(dio_cache, dio);
> > +	}
> 
> This own_waiting case of this is not identical to dio_bio_end_aio
> except for the inverted is_async argument of dio_complete.
> 
> So even if we allow for the flag I think we should test it in dio_end_io
> and use common code for the case where we don't use the linked list of
> bios to complete.  In that case you could also just call the current
> aio version from btrfs as it already calls dio_end_io directly and
> remove the flag given that dio_await_completion would become a no-op.
> 
> That being said I would much, much prefer to consolidate code here
> rather than adding more special cases.
> 
> What I would really like to understand is what the point for the
> bio_list batching is to start with, given that it also requires nasty
> workarounds like dio_bio_reap() to work around the amount of memory it
> might have to use.

Just to clarify a little, we didn't send this with my last pull request.

I mentioned before how we want to reduce the number of waits in the DIO
chain, especially for btrfs who has to do metadata updates along with
data IO for O_DIRECT | O_SYNC.  If the FS has control over the waiting,
we can turn three waits (data, log-metadata, super) into
two (data + log-metadata, super)

That's nice, but the flash vendors are coming out with apis for atomic
ios.  They basically want a full set of IO all at once, instead of the
model where you get a token, do some IO and commit the token.

So, this code allows us to create that batch of atomic IO.  I'm hoping
for an API where we hand a list of bios over to the block layer and
it is completed as a single unit (data + log-metadata + super).

The truth is that btrfs doesn't really need atomic IO, we just need
ordered IO (do the super last please), and if that ends up useful in
general the fusionio cards may provide it.  <insert barrier discussion
here, hopefully having learned from the past>

The atomic vs ordered difference is important because cards may be able
to do a larger set of IO in an ordered fashion than atomic.

Of course, I'm hoping everyone is able to make use of whatever is
included.  There's nothing btrfs specific here.

> 
> The only thing I could think of is to allow ->end_io callbacks from user
> context, but that is a bigger problem as we can't do that for AIO.  I'd
> much prefer a unified approach with my generic user context callbacks
> from a few weeks ago to actually simplify this code.  (and yeah, it's
> probably up to me to demonstrate at least a prototype of this)

end_io callbacks from user context are definitely interesting, but
that's not the kind of performance tuning we're targeting right now.

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux