On Tue, Nov 26, 2013 at 11:35:38AM -0800, Daniel Vetter wrote: > Hi Brad, > > On Tue, Nov 26, 2013 at 08:51:17AM -0800, bradley.d.volkin@xxxxxxxxx wrote: > > From: Brad Volkin <bradley.d.volkin@xxxxxxxxx> > > > > Certain OpenGL features (e.g. transform feedback, performance monitoring) > > require userspace code to submit batches containing commands such as > > MI_LOAD_REGISTER_IMM to access various registers. Unfortunately, some > > generations of the hardware will noop these commands in "unsecure" batches > > (which includes all userspace batches submitted via i915) even though the > > commands may be safe and represent the intended programming model of the device. > > > > This series introduces a software command parser similar in operation to the > > command parsing done in hardware for unsecure batches. However, the software > > parser allows some operations that would be noop'd by hardware, if the parser > > determines the operation is safe, and submits the batch as "secure" to prevent > > hardware parsing. Currently the series implements this on IVB and HSW. > > > > The series is divided into several phases: > > > > patches 01-09: These implement infrastructure and the command parsing algorithm, > > all behind a module parameter. I expect some discussion and > > rework, but hopefully there's nothing too controversial. > > patches 10-17: These define the checks performed by the parser. > > I expect much discussion :) > > patches 18-20: In a final pass over the command checks, I found some issues with > > the definitions. They looked painful to rebase in, so I've added > > them here. > > patches 21-22: These enable the parser by default. It runs on all batches except > > those that set the I915_EXEC_SECURE flag in the execbuffer2 call. > > I think long-term we should even scan secure batches. We'd need to allow > some registers which only the drm master (i.e. owner of the display > hardware) is allowed to do, e.g. for scanline waits. But once we have that > we should be able to port all current users of secure batches over to > scanned batches and so enforce this everywhere by default. > > The other issue is that igt tests assume to be able to run some evil > tests, so maybe we don't actually want this. Agreed. I thought we could handle this as a follow-up task once the basic stuff is in place, particularly given that we'd want to modify at least some users to test. I also wasn't sure if we would want the check to be root && master, as in the current secure flag, or just master. W.r.t. the tests, I suppose we can just turn checking on for secure batches and see what happens. > > > There are follow-up patches to libdrm and to i-g-t. The i-g-t tests are very > > basic and do not test all of the commands used by the parser on the assumption > > that I'm likely to make the same mistakes in both the parser and the test. > > Yeah, I agree that just checking whether commands all go through (or not) > as expected adds very little value on top of the few tests you have done. > I think we should take a look at some corner cases which might trip up > your checker a bit though: > - I think we should check batchbuffer chaining and make sure it works on > the vcs ring and not anywhere else (we can't ever break shipping libva > which uses this). > - Some tests to trip up your parser should be done, like 3D commands that > fall off the end of the batch bo. Or commands that span page boundaries. > The later isn't an issue atm since you use vmap, but we should switch to > per-page kmap since the vmap overhead is fairly horrible. Good suggestions. I'll look into these. > > > I've run the i-g-t gem_* tests, the piglit quick tests (w/Mesa git from a few > > days ago), and generally used an Ubuntu 13.10 IVB system with the parser > > running. Aside from a failure described below, I don't think there are any > > regressions. That is, piglit claims some regressions, but from manually running > > the tests I think these are false positives. However, I could use help in > > getting broader testing, particularly around performance. In general, I see less > > than 3% performance impact on HSW, with more like 10% impact for pathological > > batch sizes. But we'll certainly want to run relevant benchmarks beyond what > > I've done. > > Yeah, a microbenchmark that just shovels MI_NOP batches of various sizes > through the checker and bypassing it (with EXEC_SECURE) would be really > good. Maybe even some variable-sized commands (all the state setup stuff > should be useful for that) to keep things interesting. Some variation is > also important to have some good cache thrasing going on (since your check > tables are fairly large I think). Ok. I'd be interested in some comment from the mesa and libva guys here for real world workloads, but a microbenchmark would be a good start. Which "state setup stuff" are you referring to? Something specific in i-g-t or something more general? > > > At this point there are a couple of known issues and potential improvements. > > > > 1) VLV. The parser is currently disabled for VLV. One type of check performed by > > the parser is that commands which access memory do so via PPGTT. VLV does not > > have PPGTT enabled at this time. I chose to implement the PPGTT checks via > > generic bit checking infrastructure in the parser, so they are not easily > > disabled for VLV. For now, I'm disabling parsing altogether in the hope that > > PPGTT can be enabled for VLV in the near future. > > We need ppgtt for the parser anyway, since otherwise userspace can submit > a self-modifying batch. Checking for that is impossible, so allowing sw > checked batches without the ppgtt/ggtt split would be a decent security > hole. > > > 2) Coherency. I've found two types of coherency issues when reading the batch > > buffer from the CPU during execbuffer2. Looking for help with both issues. > > i. First, the i-g-t test gem_cpu_reloc blits to a batch buffer and the > > parser isn't properly waiting for the operation to complete before > > parsing. I tried adding i915_gem_object_sync(batch_obj, [ring|NULL]) > > but that actually caused more failures. > > This synchronization should happen when processing the relocations. The > batch itself isn't modified by the gpu, we simply upload it using the > blitter. So this going wrong indicates there's some issue somewhere ... Ok. I didn't debug too far. Putting a gem_sync() in the test between the upload and exec fixed the issue. Since I wasn't doing any explicit synchronization I assumed it was my issue. > > > > ii. Second, on VLV, I've seen cache coherency issues when userspace writes > > the batch via pwrite fast path before calling execbuffer2. The parser > > reads stale data. This works fine on IVB and HSW, so I believe it's an > > LLC vs. non-LLC issue. I'm just unclear on what the correct flushing or > > synchronization is for this scenario. > > Imo we take a good look at the optimized buffer read/write code from > i915_gem_shmem_pread (for reading the userspace batch) and > i915_gem_shmem_pwrite (for writing to the checked buffer). If we do the > checking in-line with the reading this should also bring down the overhead > massively compared to the current solution (those shmem read/write > functions are fairly optimized). I'll take a look. So far I'm not seeing the vmap as a bottleneck and I'm a bit concerned about the complexity of trying to do mapping/parsing per-page, but I agree it's worth a more detailed analysis. > > > 3) 2nd-level batches. The parser currently allows MI_BATCH_BUFFER_START commands > > in userspace batches without parsing them. The media driver uses 2nd-level > > batches, so a solution is required. I have some ideas but don't want to delay > > the review process for what I have so far. It may be that the 2nd-level > > parsing is only needed for VCS and the current code (plus rejecting BBS) > > would be sufficient for RCS. > > Afaik only libva uses second-level batches, and only on the vcs. So I hope I would be very happy if that was the case. I couldn't easily tell from reading the libva code which ring they were submitting to. I definitely did not find uses in mesa or the ddx. > we can just submit those as unpriviledged batches if possible. If that's > not possible it'll get fairly ugly I fear :( > > > 4) Command buffer copy. To avoid CPU modifications to buffers after parsing, and > > to avoid GPU modifications to buffers via EUs or commands in the batch, we > > should copy the userspace batch buffer to memory that userspace does not > > have access to, map it into GGTT, and execute that batch buffer. I have a > > sense of how to do this for 1st-level batches, but it would need changes to > > tie in with the 2nd-level batch parsing I think, so I've again held off. > > Yeah, we need the copying for otherwise the parsing is fairly pointless. > I've stumbled over some of your internally patches and had a quick look at > them. Two high-level comments: > > - Using the existing active buffer lru instead of manual pinning would > better integrate with the eviction code. For an example of in-kernel > objects (and not userspace objects) using this look at the hw context > code. > - Imo we should tag all buffers as purgeable while they're in the cache. > That way the shrinker will automatically drop the backing storage if > memory runs low (and thanks to the active list lru only when the gpu has > stopped processing the batch). That way we can just keep on allocating > buffers if they're all busy without any concern for running out of > memory. Ok, I'll look at the hw context code for buffer mgmt. For "purgeable", just via the madv field in the i915 gem object? Also, there are a couple iterations of the work-in-progress patches. Do you prefer a cache per ring or a single cache shared by all rings? Thanks, Brad > > I'll try to read through your patch pile in the next few days, this is > just the very-very high-level stuff that came to mind immediately. > > Cheers, Daniel > > > > Brad Volkin (22): > > drm/i915: Add data structures for command parser > > drm/i915: Initial command parser table definitions > > drm/i915: Hook command parser tables up to rings > > drm/i915: Add per-ring command length decode functions > > drm/i915: Implement command parsing > > drm/i915: Add a HAS_CMD_PARSER getparam > > drm/i915: Add support for rejecting commands during parsing > > drm/i915: Add support for checking register accesses > > drm/i915: Add support for rejecting commands via bitmasks > > drm/i915: Reject unsafe commands > > drm/i915: Add register whitelists for mesa > > drm/i915: Enable register whitelist checks > > drm/i915: Enable bit checking for some commands > > drm/i915: Enable PPGTT command parser checks > > drm/i915: Reject commands that would store to global HWS page > > drm/i915: Reject additional commands > > drm/i915: Add parser data for perf monitoring GL extensions > > drm/i915: Reject MI_ARB_ON_OFF on VECS > > drm/i915: Fix length handling for MFX_WAIT > > drm/i915: Fix MI_STORE_DWORD_IMM parser defintion > > drm/i915: Clean up command parser enable decision > > drm/i915: Enable command parsing by default > > > > drivers/gpu/drm/i915/Makefile | 3 +- > > drivers/gpu/drm/i915/i915_cmd_parser.c | 712 +++++++++++++++++++++++++++++ > > drivers/gpu/drm/i915/i915_dma.c | 3 + > > drivers/gpu/drm/i915/i915_drv.c | 5 + > > drivers/gpu/drm/i915/i915_drv.h | 96 ++++ > > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 15 + > > drivers/gpu/drm/i915/i915_reg.h | 66 +++ > > drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + > > drivers/gpu/drm/i915/intel_ringbuffer.h | 25 + > > include/uapi/drm/i915_drm.h | 1 + > > 10 files changed, 927 insertions(+), 1 deletion(-) > > create mode 100644 drivers/gpu/drm/i915/i915_cmd_parser.c > > > > -- > > 1.8.4.4 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > http://lists.freedesktop.org/mailman/listinfo/intel-gfx > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx