On Wed, Dec 19, 2012 at 07:38:41AM -0700, Christoph Hellwig wrote: > I have to say I still hate the flag magic in here. Spent some time to > look over things to be a bit more constructive in getting what you > guys want in a nicer way: > > > static void dio_bio_end_io(struct bio *bio, int error) > > { > > struct dio *dio = bio->bi_private; > > unsigned long flags; > > + unsigned long remaining; > > + bool own_waiting = ((dio->rw & WRITE) && > > + (dio->flags & DIO_OWN_WAITING)); > > + > > + if (own_waiting) > > + dio_bio_complete(dio, bio); > > > > spin_lock_irqsave(&dio->bio_lock, flags); > > + if (!own_waiting) { > > + bio->bi_private = dio->bio_list; > > + dio->bio_list = bio; > > + } > > + remaining = --dio->refcount; > > + if (remaining == 1 && dio->waiter) > > wake_up_process(dio->waiter); > > spin_unlock_irqrestore(&dio->bio_lock, flags); > > + > > + if (remaining == 0) { > > + BUG_ON(!(dio->flags & DIO_OWN_WAITING)); > > + dio_complete(dio, dio->iocb->ki_pos, 0, false); > > + kmem_cache_free(dio_cache, dio); > > + } > > This own_waiting case of this is not identical to dio_bio_end_aio > except for the inverted is_async argument of dio_complete. > > So even if we allow for the flag I think we should test it in dio_end_io > and use common code for the case where we don't use the linked list of > bios to complete. In that case you could also just call the current > aio version from btrfs as it already calls dio_end_io directly and > remove the flag given that dio_await_completion would become a no-op. > > That being said I would much, much prefer to consolidate code here > rather than adding more special cases. > > What I would really like to understand is what the point for the > bio_list batching is to start with, given that it also requires nasty > workarounds like dio_bio_reap() to work around the amount of memory it > might have to use. Just to clarify a little, we didn't send this with my last pull request. I mentioned before how we want to reduce the number of waits in the DIO chain, especially for btrfs who has to do metadata updates along with data IO for O_DIRECT | O_SYNC. If the FS has control over the waiting, we can turn three waits (data, log-metadata, super) into two (data + log-metadata, super) That's nice, but the flash vendors are coming out with apis for atomic ios. They basically want a full set of IO all at once, instead of the model where you get a token, do some IO and commit the token. So, this code allows us to create that batch of atomic IO. I'm hoping for an API where we hand a list of bios over to the block layer and it is completed as a single unit (data + log-metadata + super). The truth is that btrfs doesn't really need atomic IO, we just need ordered IO (do the super last please), and if that ends up useful in general the fusionio cards may provide it. <insert barrier discussion here, hopefully having learned from the past> The atomic vs ordered difference is important because cards may be able to do a larger set of IO in an ordered fashion than atomic. Of course, I'm hoping everyone is able to make use of whatever is included. There's nothing btrfs specific here. > > The only thing I could think of is to allow ->end_io callbacks from user > context, but that is a bigger problem as we can't do that for AIO. I'd > much prefer a unified approach with my generic user context callbacks > from a few weeks ago to actually simplify this code. (and yeah, it's > probably up to me to demonstrate at least a prototype of this) end_io callbacks from user context are definitely interesting, but that's not the kind of performance tuning we're targeting right now. -chris -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html