David Howells <dhowells@xxxxxxxxxx> wrote: > + /* Wait for writeback to complete. The writeback engine owns > + * the info in folio->private and may change it until it > + * removes the WB mark. > + */ > + if (folio_wait_writeback_killable(folio)) { > + ret = written ? -EINTR : -ERESTARTSYS; > + goto error_folio_unlock; > + } > + It turns out that this really kills performance with fio with as many jobs as cpus. It's taking up to around 8x longer to complete a pwrite() on average and perf shows a 30% of the CPU cycles are being spent in contention on the i_rwsem. The reason this was added here is that writeback cannot take the folio lock in order to clean up folio->private without risking deadlock vs the truncation routines (IIRC). I can mitigate this by skipping the wait if folio->private is not set and if we're not going to attach anything there (see attached). Note that if writeout is ongoing and there is nothing attached to ->private, then we should not be engaging write-streaming mode and attaching a new netfs_folio (and if we did, we'd flush the page and wait for it anyway). The other possibility is if we have a writeback group to set. This only applies to ceph for the moment and is something that will need dealing with if/when ceph is made to use this code. David --- diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c index 1eff9413eb1b..279b296f8014 100644 --- a/fs/netfs/buffered_write.c +++ b/fs/netfs/buffered_write.c @@ -255,7 +255,8 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, * the info in folio->private and may change it until it * removes the WB mark. */ - if (folio_wait_writeback_killable(folio)) { + if (folio_get_private(folio) && + folio_wait_writeback_killable(folio)) { ret = written ? -EINTR : -ERESTARTSYS; goto error_folio_unlock; } To unsubscribe from this group and stop receiving emails from it, send an email to linux-cachefs+unsubscribe@xxxxxxxxxx.