On Wed, Jun 28, 2017 at 11:19:21AM +0100, Chris Wilson wrote: > Quoting Francisco Jerez (2017-05-04 21:59:44) > > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > > > On Thu, May 04, 2017 at 10:56:54AM -0700, Francisco Jerez wrote: > > >> David Weinehall <david.weinehall@xxxxxxxxxxxxxxx> writes: > > >> > > >> > On Thu, May 04, 2017 at 10:51:29AM +0100, Chris Wilson wrote: > > >> >> A good default for garbage entries from the user is to follow the > > >> >> default setting of the object (i.e. the PTE). Currently they use the > > >> >> uncached entry, and now the only way to accidentally hit uncached > > >> >> performance is via explicit use of the uncached MOCS or setting the > > >> >> object to uncached. Note that these entries are currently undefined in > > >> >> the ABI and we reserve the right to change them. We originally chose > > >> >> uncached to eliminate any problem with reducing the caching level in > > >> >> future, but the object is a much better definition of the minimum > > >> >> caching level. > > >> >> > > >> > > >> NAK. The reason for the default being UC is that it's the only setting > > >> that guarantees full forwards compatibility with any other entry that > > >> might be added in the future. If you default to PTE on (e)LLC and WB on > > >> L3, userspace will no longer be able to use any newly introduced entry > > >> with stricter coherency guarantees than that (e.g. any L3-uncached > > >> entry) in a backwards-compatible way. Attempting to do so may break > > >> memory coherency assumptions of the application and lead to misrendering > > >> when run on older kernel versions (which to my judgment is a scarier > > >> failure mode than reduced performance). Tbh, if you use a MOCS entry that's not set that's a userspace bug and needs to be fixed there. I don't think we should frob the settings of the unused entries. The rough plan is to have a GETPARAM to give you the version of the MOCS table, with the idea we'll only add new entries (on any giving pci id) at the end (and hope it'll never get full, but that seems to be the case looking at the bsepc recommended mocs entries). If userspace wants to use more than what's there at power on (or when we declare the i915.ko support stable and enable it by default), then it must consult the mocs table. Imo mocs entries shouldn't change once we've made that slot part of the uapi, that's just too much headaches. media-sdk userspace is in the process of getting fixed to work like that. Still a reason not to move forward with this patch, but a different one :-) > > > You can't use a weaker coherency model in mocs than that specified for > > > the object as you can't control other uses of the object (even just > > > memory pressure will break your assumptions). > > > > Exactly, but you can use a stronger coherency model than the application > > requested, which is why falling back to UC should generally work for > > unknown entries but falling back to PTE+WB isn't guaranteed to. > > Still wrong. GEM will write into the CPU cache believing the object is > coherent. The GPU will read from memory bypassing the CPU cache > following the UC mocs. The only safe option is for it to follow PTE. Userspace is already allowed to shoot it's feet off with corrupted cachelines when e.g. not calling set_domain when it should. Same applies if it plays games on the gpu side. The only thing we need to ensure from the kernel pov is that gpu caches are fully flushed (either to memory or cpu coherent caches) when the batch is over, to make sure the gpu doesn't scribble over memory after we've swapped it out. We might swap out garbage, but that's not the kernel's problem. Or do I miss something, and userspace can (ab)use this to pull the kernel over the table? -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch