Dan Williams <dan.j.williams@xxxxxxxxx> writes: > On Fri, Nov 22, 2019 at 8:09 AM Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: >> >> Dan Williams <dan.j.williams@xxxxxxxxx> writes: >> >> > On Wed, Nov 20, 2019 at 9:26 AM Jeff Moyer <jmoyer@xxxxxxxxxx> wrote: >> >> >> >> Pankaj Gupta <pagupta@xxxxxxxxxx> writes: >> >> >> >> > Remove logic to create child bio in the async flush function which >> >> > causes child bio to get executed after parent bio 'pmem_make_request' >> >> > completes. This resulted in wrong ordering of REQ_PREFLUSH with the >> >> > data write request. >> >> > >> >> > Instead we are performing flush from the parent bio to maintain the >> >> > correct order. Also, returning from function 'pmem_make_request' if >> >> > REQ_PREFLUSH returns an error. >> >> > >> >> > Reported-by: Jeff Moyer <jmoyer@xxxxxxxxxx> >> >> > Signed-off-by: Pankaj Gupta <pagupta@xxxxxxxxxx> >> >> >> >> There's a slight change in behavior for the error path in the >> >> virtio_pmem driver. Previously, all errors from virtio_pmem_flush were >> >> converted to -EIO. Now, they are reported as-is. I think this is >> >> actually an improvement. >> >> >> >> I'll also note that the current behavior can result in data corruption, >> >> so this should be tagged for stable. >> > >> > I added that and was about to push this out, but what about the fact >> > that now the guest will synchronously wait for flushing to occur. The >> > goal of the child bio was to allow that to be an I/O wait with >> > overlapping I/O, or at least not blocking the submission thread. Does >> > the block layer synchronously wait for PREFLUSH requests? >> >> You *have* to wait for the preflush to complete before issuing the data >> write. See the "Explicit cache flushes" section in >> Documentation/block/writeback_cache_control.rst. > > I'm not debating the ordering, or that the current implementation is > obviously broken. I'm questioning whether the bio tagged with PREFLUSH > is a barrier for future I/Os. My reading is that it is only a gate for > past writes, and it can be queued. I.e. along the lines of > md_flush_request(). Sorry, I misunderstood your question. For a write bio with REQ_PREFLUSH set, the PREFLUSH has to be done before the data attached to the bio is written. That preflush is not an I/O barrier. In other words, for unrelated I/O (any other bio in the system), it does not impart any specific ordering requirements. Upper layers are expected to wait for any related I/O completions before issuing a flush request. So yes, you can queue the bio to a worker thread and return to the caller. In fact, this is what I had originally suggested to Pankaj. Cheers, Jeff