On Thu, Aug 20 2009, Christoph Hellwig wrote: > Btw, something semi-related I've been looking at recently: > > Currently O_DIRECT writes bypass all kernel caches, but there they do > use the disk caches. We currenly don't have any barrier support for > them at all, which is really bad for data integrity in virtualized > environments. I've started thinking about how to implement this. > > The simplest scheme would be to mark the last request of each > O_DIRECT write as barrier requests. This works nicely from the FS > perspective and works with all hardware supporting barriers. It's > massive overkill though - we really only need to flush the cache > after our request, and not before. And for SCSI we would be much > better just setting the FUA bit on the commands and not require a > full cache flush at all. > > The next scheme would be to simply always do a cache flush after > the direct I/O write has completed, but given that blkdev_issue_flush > blocks until the command is done that would a) require everyone to > use the end_io callback and b) spend a lot of time in that workque. > This only requires one full cache flush, but it's still suboptimal. > > I have prototypes this for XFS, but I don't really like it. > > The best scheme would be to get some highlevel FUA request in the > block layer which gets emulated by a post-command cache flush. I've talked to Chris about this in the past too, but I never got around to benchmarking FUA for O_DIRECT. It should be pretty easy to wire up without making too many changes, and we do have FUA support on most SATA drives too. Basically just a check in the driver for whether the request is O_DIRECT and a WRITE, ala: if (rq_data_dir(rq) == WRITE && rq_is_sync(rq)) WRITE_FUA; I know that FUA is used by that other OS, so I think we should be golden on the hw support side. -- Jens Axboe -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html