From: Dan Williams > Sent: 17 January 2018 06:50 ... > > Anything that open-codes copy_from_user() that way is *ALREADY* fucked if > > it cares about the overhead - recent x86 boxen will have slowdown from > > hell on stac()/clac() pairs. Anything like that on a hot path is already > > deep in trouble and needs to be found and fixed. What drivers would those > > be? > > So I took a closer look and the pattern is not copy_from_user it's > more like __get_user + write-to-hardware loops. If the performance is > already expected to be bad for those then perhaps an lfence each loop > iteration won't be much worse. It's still a waste because the lfence > is only needed once after the access_ok. Performance of PCIe writes isn't that back (since they are posted) adding a synchronising instructions to __get_user() could easily be noticeable. Unfortunately you can't use copy_from_user() with a PCIe mapped target memory address (or even memcpy_to_io()) because on x86 the copy is aliased to memcpy() and that uses 'rep movsb' which has to do single byte transfers because the address is uncached. (This is really bad for PCIe reads.) David