On Mon, May 16, 2022 at 11:13 PM Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, May 16, 2022 at 05:08:05PM -0700, T.J. Mercier wrote: > > On Mon, May 16, 2022 at 12:21 PM Christian König > > <christian.koenig@xxxxxxx> wrote: > > > > > > Am 16.05.22 um 20:08 schrieb T.J. Mercier: > > > > On Mon, May 16, 2022 at 10:20 AM Christian König > > > > <christian.koenig@xxxxxxx> wrote: > > > >> Am 16.05.22 um 19:13 schrieb T.J. Mercier: > > > >>> Recently, we noticed an issue where a process went into direct reclaim > > > >>> while holding the kernfs rw semaphore for sysfs in write (exclusive) > > > >>> mode. This caused processes who were doing DMA-BUF exports and releases > > > >>> to go into uninterruptible sleep since they needed to acquire the same > > > >>> semaphore for the DMA-BUF sysfs entry creation/deletion. In order to avoid > > > >>> blocking DMA-BUF export for an indeterminate amount of time while > > > >>> another process is holding the sysfs rw semaphore in exclusive mode, > > > >>> this patch moves the per-buffer sysfs file creation to the default work > > > >>> queue. Note that this can lead to a short-term inaccuracy in the dmabuf > > > >>> sysfs statistics, but this is a tradeoff to prevent the hot path from > > > >>> being blocked. A work_struct is added to dma_buf to achieve this, but as > > > >>> it is unioned with the kobject in the sysfs_entry, dma_buf does not > > > >>> increase in size. > > > >> I'm still not very keen of this approach as it strongly feels like we > > > >> are working around shortcoming somewhere else. > > > >> > > > > My read of the thread for the last version is that we're running into > > > > a situation where sysfs is getting used for something it wasn't > > > > originally intended for, but we're also stuck with this sysfs > > > > functionality for dmabufs. > > > > > > > >>> Fixes: bdb8d06dfefd ("dmabuf: Add the capability to expose DMA-BUF stats in sysfs") > > > >>> Originally-by: Hridya Valsaraju <hridya@xxxxxxxxxx> > > > >>> Signed-off-by: T.J. Mercier <tjmercier@xxxxxxxxxx> > > > >>> > > > >>> --- > > > >>> See the originally submitted patch by Hridya Valsaraju here: > > > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2022%2F1%2F4%2F1066&data=05%7C01%7Cchristian.koenig%40amd.com%7C794614324d114880a25508da37672e4b%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637883213566903705%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=bGlA2FeubfSeL5XDHYyWMZqUXfScoCphZjjK4jrqQJs%3D&reserved=0 > > > >>> > > > >>> v2 changes: > > > >>> - Defer only sysfs creation instead of creation and teardown per > > > >>> Christian König > > > >>> > > > >>> - Use a work queue instead of a kthread for deferred work per > > > >>> Christian König > > > >>> --- > > > >>> drivers/dma-buf/dma-buf-sysfs-stats.c | 56 ++++++++++++++++++++------- > > > >>> include/linux/dma-buf.h | 14 ++++++- > > > >>> 2 files changed, 54 insertions(+), 16 deletions(-) > > > >>> > > > >>> diff --git a/drivers/dma-buf/dma-buf-sysfs-stats.c b/drivers/dma-buf/dma-buf-sysfs-stats.c > > > >>> index 2bba0babcb62..67b0a298291c 100644 > > > >>> --- a/drivers/dma-buf/dma-buf-sysfs-stats.c > > > >>> +++ b/drivers/dma-buf/dma-buf-sysfs-stats.c > > > >>> @@ -11,6 +11,7 @@ > > > >>> #include <linux/printk.h> > > > >>> #include <linux/slab.h> > > > >>> #include <linux/sysfs.h> > > > >>> +#include <linux/workqueue.h> > > > >>> > > > >>> #include "dma-buf-sysfs-stats.h" > > > >>> > > > >>> @@ -168,10 +169,46 @@ void dma_buf_uninit_sysfs_statistics(void) > > > >>> kset_unregister(dma_buf_stats_kset); > > > >>> } > > > >>> > > > >>> +static void sysfs_add_workfn(struct work_struct *work) > > > >>> +{ > > > >>> + struct dma_buf_sysfs_entry *sysfs_entry = > > > >>> + container_of(work, struct dma_buf_sysfs_entry, sysfs_add_work); > > > >>> + struct dma_buf *dmabuf = sysfs_entry->dmabuf; > > > >>> + > > > >>> + /* > > > >>> + * A dmabuf is ref-counted via its file member. If this handler holds the only > > > >>> + * reference to the dmabuf, there is no need for sysfs kobject creation. This is an > > > >>> + * optimization and a race; when the reference count drops to 1 immediately after > > > >>> + * this check it is not harmful as the sysfs entry will still get cleaned up in > > > >>> + * dma_buf_stats_teardown, which won't get called until the final dmabuf reference > > > >>> + * is released, and that can't happen until the end of this function. > > > >>> + */ > > > >>> + if (file_count(dmabuf->file) > 1) { > > > >> Please completely drop that. I see absolutely no justification for this > > > >> additional complexity. > > > >> > > > > This case gets hit around 5% of the time in my testing so the else is > > > > not a completely unused branch. > > > > > > Well I can only repeat myself: This means that your userspace is > > > severely broken! > > > > > > DMA-buf are meant to be long living objects > > This patch addresses export *latency* regardless of how long-lived the > > object is. Even a single, long-lived export will benefit from this > > change if it would otherwise be blocked on adding an object to sysfs. > > I think attempting to improve this latency still has merit. > > Fixing the latency is nice, but as it's just pushing the needed work off > to another code path, it will take longer overall for the sysfs stuff to > be ready for userspace to see. > > Perhaps we need to step back and understand what this code is supposed > to be doing. As I recall, it was created because some systems do not > allow debugfs anymore, and they wanted the debugging information that > the dmabuf code was exposing to debugfs on a "normal" system. Moving > that logic to sysfs made sense, but now I am wondering why we didn't see > these issues in the debugfs code previously? > The debugfs stuff doesn't happen on every export, right? > Perhaps we should go just one step further and make a misc device node > for dmabug debugging information to be in and just have userspace > poll/read on the device node and we spit the info that used to be in > debugfs out through that? That way this only affects systems when they > want to read the information and not normal code paths? Yeah that's a > hack, but this whole thing feels overly complex now. > And deprecate sysfs support? I'm happy to try out anything you think might be a better way. As far as complexity of this patch, this revision is a much simpler version of the one from Hridya you already reviewed. > thanks, > > greg k-h