Hey, On Tue, May 19, 2015 at 05:54:54AM -0400, Frediano Ziglio wrote: > This problem happens using KMS surfaces and QXL driver. > To easy reproduce use KDE Plasma (which use surfaces a lot) and assure > you are using KMS surfaces (QXL driver on Fedora/RedHat has a patch to > stop using them). Open some complex application like LibreOffice and > after a while your machine get stuck using 100% CPU on Xorg. > The problem occurs as creating new surfaces not interruptible wait > are used however instead of returning ERESTARTSYS back to userspace > you try to loop but wait routines always keep returning ERESTARTSYS > once the signal is marked. > On out of memory conditions TTM module try to move objects to system > memory and QXL assure surface is updated before the move. > The fix handle differently this case using no interruptible wait so > wait functions will wait instead of returning ERESTARTSYS. > Note the when the loop occurs driver will send a lot of update requests > causing more CPU usage on Qemu side too. > > Signed-off-by: Frediano Ziglio <fziglio@xxxxxxxxxx> > --- > qxl/qxl_cmd.c | 12 +++--------- > qxl/qxl_drv.h | 2 +- > qxl/qxl_ioctl.c | 2 +- > 3 files changed, 5 insertions(+), 11 deletions(-) > > diff --git a/drivers/gpu/drm/drivers/gpu/drm/qxl/qxl_cmd.c b/qxl/qxl_cmd.c > index 9782364..bd5404e 100644 > --- a/drivers/gpu/drm/qxl/qxl_cmd.c > +++ b/drivers/gpu/drm/qxl/qxl_cmd.c > @@ -317,14 +317,11 @@ static void wait_for_io_cmd(struct qxl_device *qdev, uint8_t val, long port) > { > int ret; > > -restart: > ret = wait_for_io_cmd_user(qdev, val, port, false); > - if (ret == -ERESTARTSYS) > - goto restart; I think this one is not directly related to the fix, but can be removed because wait_for_io_cmd_user(qdev, val, port, false); will call wait_event_timeout() which cannot return ERESTARTSYS? Or was this loop causing issues too? > } > > int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf, > - const struct qxl_rect *area) > + const struct qxl_rect *area, bool intr) > { > int surface_id; > uint32_t surface_width, surface_height; > @@ -350,7 +347,7 @@ int qxl_io_update_area(struct qxl_device *qdev, struct qxl_bo *surf, > mutex_lock(&qdev->update_area_mutex); > qdev->ram_header->update_area = *area; > qdev->ram_header->update_surface = surface_id; > - ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, true); > + ret = wait_for_io_cmd_user(qdev, 0, QXL_IO_UPDATE_AREA_ASYNC, intr); > mutex_unlock(&qdev->update_area_mutex); > return ret; > } > @@ -588,10 +585,7 @@ int qxl_update_surface(struct qxl_device *qdev, struct qxl_bo *surf) > rect.right = surf->surf.width; > rect.top = 0; > rect.bottom = surf->surf.height; > -retry: > - ret = qxl_io_update_area(qdev, surf, &rect); > - if (ret == -ERESTARTSYS) > - goto retry; > + ret = qxl_io_update_area(qdev, surf, &rect, false); My understanding is that the fix is this hunk? If so, this could be made more obvious with an intermediate commit adding the 'bool intr' arg to qxl_io_update_area and only calling it with 'true' in the appropriate places. This code path is only triggered from qxl_surface_evict() which I assume is not necessarily easily interruptible, so this change makes sense to me. However it would be much better to get a review from Dave Airlie ;) Christophe
Attachment:
pgpcXOGyjzUqN.pgp
Description: PGP signature
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel