Re: [PATCH 00/10] drm/vkms: rework crc worker

Rodrigo Siqueira <rodrigosiqueiramelo@xxxxxxxxx> · Tue, 25 Jun 2019 22:44:26 -0300

On 06/19, Daniel Vetter wrote:
> On Tue, Jun 18, 2019 at 11:54 PM Rodrigo Siqueira
> <rodrigosiqueiramelo@xxxxxxxxx> wrote:
> >
> > On Tue, Jun 18, 2019 at 5:56 AM Daniel Vetter <daniel@xxxxxxxx> wrote:
> > >
> > > On Mon, Jun 17, 2019 at 11:49:04PM -0300, Rodrigo Siqueira wrote:
> > > > On 06/12, Daniel Vetter wrote:
> > > > > On Wed, Jun 12, 2019 at 10:28:41AM -0300, Rodrigo Siqueira wrote:
> > > > > > Hi Daniel,
> > > > > >
> > > > > > First of all, thank you very much for your patchset.
> > > > > >
> > > > > > I tried to make a detailed review of your series, and you can see my
> > > > > > comments in each patch. You’ll notice that I asked many things related
> > > > > > to the DRM subsystem with the hope that I could learn a little bit
> > > > > > more about DRM from your comments.
> > > > > >
> > > > > > Before you go through the review, I would like to start a discussion
> > > > > > about the vkms race conditions. First, I have to admit that I did not
> > > > > > understand the race conditions that you described before because I
> > > > > > couldn’t reproduce them. Now, I'm suspecting that I could not
> > > > > > experience the problem because I'm using QEMU with KVM; with this idea
> > > > > > in mind, I suppose that we have two scenarios for using vkms in a
> > > > > > virtual machine:
> > > > > >
> > > > > > * Scenario 1: The user has hardware virtualization support; in this
> > > > > > case, it is a little bit harder to experience race conditions with
> > > > > > vkms.
> > > > > >
> > > > > > * Scenario 2: Without hardware virtualization support, it is much
> > > > > > easier to experience race conditions.
> > > > > >
> > > > > > With these two scenarios in mind, I conducted a bunch of experiments
> > > > > > for trying to shed light on this issue. I did:
> > > > > >
> > > > > > 1. Enabled lockdep
> > > > > >
> > > > > > 2. Defined two different environments for testing both using QEMU with
> > > > > > and without kvm. See below the QEMU hardware options:
> > > > > >
> > > > > > * env_kvm: -enable-kvm -daemonize -m 1G -smp cores=2,cpus=2
> > > > > > * env_no_kvm: -daemonize -m 2G -smp cores=4,cpus=4
> > > > > >
> > > > > > 3. My test protocol: I) turn on the vm, II) clean /proc/lock_stat,
> > > > > > III) execute kms_cursor_crc, III) save the lock_stat file, and IV)
> > > > > > turn off the vm.
> > > > > >
> > > > > > 4. From the lockdep_stat, I just highlighted the row related to vkms
> > > > > > and the columns holdtime-total and holdtime-avg
> > > > > >
> > > > > > I would like to highlight that the following data does not have any
> > > > > > statistical approach, and its intention is solely to assist our
> > > > > > discussion. See below the summary of the collected data:
> > > > > >
> > > > > > Summary of the experiment results:
> > > > > >
> > > > > > +----------------+----------------+----------------+
> > > > > > |                |     env_kvm    |   env_no_kvm   |
> > > > > > +                +----------------+----------------+
> > > > > > | Test           | Before | After | Before | After |
> > > > > > +----------------+--------+-------+--------+-------+
> > > > > > | kms_cursor_crc |   S1   |   S2  |   M1   |   M2  |
> > > > > > +----------------+--------+-------+--------+-------+
> > > > > >
> > > > > > * Before: before apply this patchset
> > > > > > * After: after apply this patchset
> > > > > >
> > > > > > -----------------------------------------+------------------+-----------
> > > > > > S1: Without this patchset and with kvm   | holdtime-total | holdtime-avg
> > > > > > -----------------------------------------+----------------+-------------
> > > > > > &(&vkms_out->lock)->rlock:               |  21983.52      |  6.21
> > > > > > (work_completion)(&vkms_state->crc_wo:   |  20.47         |  20.47
> > > > > > (wq_completion)vkms_crc_workq:           |  3999507.87    |  3352.48
> > > > > > &(&vkms_out->state_lock)->rlock:         |  378.47        |  0.30
> > > > > > (work_completion)(&vkms_state->crc_#2:   |  3999066.30    |  2848.34
> > > > > > -----------------------------------------+----------------+-------------
> > > > > > S2: With this patchset and with kvm      |                |
> > > > > > -----------------------------------------+----------------+-------------
> > > > > > &(&vkms_out->lock)->rlock:               |  23262.83      |  6.34
> > > > > > (work_completion)(&vkms_state->crc_wo:   |  8.98          |  8.98
> > > > > > &(&vkms_out->crc_lock)->rlock:           |  307.28        |  0.32
> > > > > > (wq_completion)vkms_crc_workq:           |  6567727.05    |  12345.35
> > > > > > (work_completion)(&vkms_state->crc_#2:   |  6567135.15    |  4488.81
> > > > > > -----------------------------------------+----------------+-------------
> > > > > > M1: Without this patchset and without kvm|                |
> > > > > > -----------------------------------------+----------------+-------------
> > > > > > &(&vkms_out->state_lock)->rlock:         |  4994.72       |  1.61
> > > > > > &(&vkms_out->lock)->rlock:               |  247190.04     |  39.39
> > > > > > (work_completion)(&vkms_state->crc_wo:   |  31.32         |  31.32
> > > > > > (wq_completion)vkms_crc_workq:           |  20991073.78   |  13525.18
> > > > > > (work_completion)(&vkms_state->crc_#2:   |  20988347.18   |  11904.90
> > > > > > -----------------------------------------+----------------+-------------
> > > > > > M2: With this patchset and without kvm   |                |
> > > > > > -----------------------------------------+----------------+-------------
> > > > > > (wq_completion)vkms_crc_workq:           |  42819161.68   |  36597.57
> > > > > > &(&vkms_out->lock)->rlock:               |  251257.06     |  35.80
> > > > > > (work_completion)(&vkms_state->crc_wo:   |  69.37         |  69.37
> > > > > > &(&vkms_out->crc_lock)->rlock:           |  3620.92       |  1.54
> > > > > > (work_completion)(&vkms_state->crc_#2:   |  42803419.59   |  24306.31
> > > > > >
> > > > > > First, I analyzed the scenarios with KVM (S1 and S2); more
> > > > > > specifically, I focused on these two classes:
> > > > > >
> > > > > > 1. (work_completion)(&vkms_state->crc_wo
> > > > > > 2. (work_completion)(&vkms_state->crc_#2
> > > > > >
> > > > > > After taking a look at the data, it looks like that this patchset
> > > > > > greatly reduces the hold time average for crc_wo. On the other hand,
> > > > > > it increases the hold time average for crc_#2. I didn’t understand
> > > > > > well the reason for the difference. Could you help me here?
> > > > >
> > > > > So there's two real locks here from our code, the ones you can see as
> > > > > spinlocks:
> > > > >
> > > > > &(&vkms_out->state_lock)->rlock:         |  4994.72       |  1.61
> > > > > &(&vkms_out->lock)->rlock:               |  247190.04     |  39.39
> > > > >
> > > > > All the others are fake locks that the workqueue adds, which only exist in
> > > > > lockdep. They are used to catch special kinds of deadlocks like the below:
> > > > >
> > > > > thread A:
> > > > > 1. mutex_lock(mutex_A)
> > > > > 2. flush_work(work_A)
> > > > >
> > > > > thread B
> > > > > 1. running work_A:
> > > > > 2. mutex_lock(mutex_A)
> > > > >
> > > > > thread B can't continue becuase mutex_A is already held by thread A.
> > > > > thread A can't continue because thread B is blocked and the work never
> > > > > finishes.
> > > > > -> deadlock
> > > > >
> > > > > I haven't checked which is which, but essentially what you measure with
> > > > > the hold times of these fake locks is how long a work execution takes on
> > > > > average.
> > > > >
> > > > > Since my patches are supposed to fix races where the worker can't keep up
> > > > > with the vblank hrtimer, then the average worker will (probably) do more,
> > > > > so that going up is expected. I think.
> > > > >
> > > > > I'm honestly not sure what's going on here, never looking into this in
> > > > > detail.
> > > > >
> > > > > > When I looked for the second set of scenarios (M1 and M2, both without
> > > > > > KVM), the results look much more distant; basically, this patchset
> > > > > > increased the hold time average. Again, could you help me understand a
> > > > > > little bit better this issue?
> > > > > >
> > > > > > Finally, after these tests, I have some questions:
> > > > > >
> > > > > > 1. VKMS is a software-only driver; because of this, how about defining
> > > > > > a minimal system resource for using it?
> > > > >
> > > > > No idea, in reality it's probably "if it fails too often, buy faster cpu".
> > > > > I do think we should make the code robust against a slow cpu, since atm
> > > > > that's needed even on pretty fast machines (because our blending code is
> > > > > really, really slow and inefficient).
> > > > >
> > > > > > 2. During my experiments, I noticed that running tests with a VM that
> > > > > > uses KVM had consistent results. For example, kms_flip never fails
> > > > > > with QEMU+KVM; however, without KVM, two or three tests failed (one of
> > > > > > them looks random). If we use vkms for test DRM stuff, should we
> > > > > > recommend the use of KVM?
> > > > >
> > > > > What do you mean without kvm? In general running without kvm shouldn't be
> > > > > slower, so I'm a bit confused ... I'm running vgem directly on the
> > > > > machine, by booting into new kernels (and controlling the machine over the
> > > > > network).
> > > >
> > > > Sorry, I should have detailed my point.
> > > >
> > > > Basically, I do all my testing with vkms in QEMU VM. In that sense, I
> > > > did some experiments which I enabled and disabled the KVM (i.e., flag
> > > > '-enable-kvm') to check the vkms behaviour in these two scenarios. I
> > > > noticed that the tests are consistent when I use the '-enable-kvm' flag,
> > > > in that context it seemed a good idea to recommend the use of KVM for
> > > > those users who want to test vkms with igt. Anyway, don't worry about
> > > > that I'll try to add more documentation for vkms in the future and in
> > > > that time we discuss about this again.
> > >
> > > Ah, qemu without kvm is going to use software emulation for a lot of the
> > > kernel stuff. That's going to be terribly slow indeed.
> > >
> > > > Anyway, from my side, I think we should merge this series. Do you want
> > > > to prepare a V2 with the fixes and maybe update the commit messages by
> > > > using some of your explanations? Or, if you want, I can fix the tiny
> > > > details and merge it.
> > >
> > > I'm deeply burried in my prime cleanup/doc series right now, so will take
> > > a few days until I get around to this again. If you want, please go ahead
> > > with merging.
> > >
> > > btw if you edit a patch when merging, please add a comment about that to
> > > the commit message. And ime it's best to only augment the commit message
> > > and maybe delete an unused variable or so that got forgotten. For
> > > everything more better to do the edits and resubmit.
> >
> > First of all, thank you very much for all your reviews and
> > explanation. I’ll try the exercise that you proposed on Patch 1.
> >
> > I’ll merge patch [4] and [7] since they’re not related to these
> > series. For the other patches, I’ll merge them after I finish the new
> > version of writeback series. I’ll ping you later.
> >

Hi,

I already sent the new version of the writeback patchset. So, how do you
want to proceed with this series? Do you prefer to send a V2 or should I
apply the patchset and make the tiny fixes?

Thanks

> > 4. https://patchwork.freedesktop.org/patch/309031/?series=61737&rev=1
> > 7. https://patchwork.freedesktop.org/patch/309029/?series=61737&rev=1$
> 
> Can you merge them quicker? I have another 3 vkms patches here
> touching that area with some follow-up stuff ...
> 
> > Finally, not related with this patchset, can I apply the patch
> > “drm/drm_vblank: Change EINVAL by the correct errno” [1] or do I need
> > more SoB? I’ll also apply Oleg patch (drm/vkms: add crc sources list).
> >
> > 1. https://patchwork.freedesktop.org/patch/310006/?series=50697&rev=4
> 
> If you want get some acks from igt maintainers (those patches landed
> now, right), but this is good enough.
> -Daniel
> 
> 
> > Thanks
> >
> > > Thanks, Daniel
> > >
> > > >
> > > > > -Daniel
> > > > >
> > > > > > Best Regards
> > > > > >
> > > > > > On Thu, Jun 6, 2019 at 7:28 PM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote:
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > This here is the first part of a rework for the vkms crc worker. I think
> > > > > > > this should fix all the locking/races/use-after-free issues I spotted in
> > > > > > > the code. There's more work we can do in the future as a follow-up:
> > > > > > >
> > > > > > > - directly access vkms_plane_state->base in the crc worker, with this
> > > > > > >   approach in this series here that should be safe now. Follow-up patches
> > > > > > >   could switch and remove all the separate crc_data infrastructure.
> > > > > > >
> > > > > > > - I think some kerneldoc for vkms structures would be nice. Documentation
> > > > > > >   the various functions is probably overkill.
> > > > > > >
> > > > > > > - Implementing a more generic blending engine, as prep for adding more
> > > > > > >   pixel formats, more planes, and more features in general.
> > > > > > >
> > > > > > > Tested with kms_pipe_crc, kms_flip and kms_cursor_crc. Seems to not make
> > > > > > > things worse, but I didn't do a full run.
> > > > > > >
> > > > > > > Cheers, Daniel
> > > > > > >
> > > > > > > Daniel Vetter (10):
> > > > > > >   drm/vkms: Fix crc worker races
> > > > > > >   drm/vkms: Use spin_lock_irq in process context
> > > > > > >   drm/vkms: Rename vkms_output.state_lock to crc_lock
> > > > > > >   drm/vkms: Move format arrays to vkms_plane.c
> > > > > > >   drm/vkms: Add our own commit_tail
> > > > > > >   drm/vkms: flush crc workers earlier in commit flow
> > > > > > >   drm/vkms: Dont flush crc worker when we change crc status
> > > > > > >   drm/vkms: No _irqsave within spin_lock_irq needed
> > > > > > >   drm/vkms: totally reworked crc data tracking
> > > > > > >   drm/vkms: No need for ->pages_lock in crc work anymore
> > > > > > >
> > > > > > >  drivers/gpu/drm/vkms/vkms_crc.c   | 74 +++++++++-------------------
> > > > > > >  drivers/gpu/drm/vkms/vkms_crtc.c  | 80 ++++++++++++++++++++++++++-----
> > > > > > >  drivers/gpu/drm/vkms/vkms_drv.c   | 35 ++++++++++++++
> > > > > > >  drivers/gpu/drm/vkms/vkms_drv.h   | 24 +++++-----
> > > > > > >  drivers/gpu/drm/vkms/vkms_plane.c |  8 ++++
> > > > > > >  5 files changed, 146 insertions(+), 75 deletions(-)
> > > > > > >
> > > > > > > --
> > > > > > > 2.20.1
> > > > > > >
> > > > > > > _______________________________________________
> > > > > > > dri-devel mailing list
> > > > > > > dri-devel@xxxxxxxxxxxxxxxxxxxxx
> > > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > >
> > > > > > Rodrigo Siqueira
> > > > > > https://siqueira.tech
> > > > >
> > > > > --
> > > > > Daniel Vetter
> > > > > Software Engineer, Intel Corporation
> > > > > http://blog.ffwll.ch
> > > >
> > > > --
> > > > Rodrigo Siqueira
> > > > https://siqueira.tech
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> >
> >
> >
> > --
> >
> > Rodrigo Siqueira
> > https://siqueira.tech
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch

-- 
Rodrigo Siqueira
https://siqueira.tech
Attachment:
signature.asc

Description: PGP signature
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel