>From: Michael S. Tsirkin [mailto:mst@xxxxxxxxxx] >Sent: Wednesday, September 15, 2010 7:28 PM >To: Xin, Xiaohui >Cc: netdev@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >mingo@xxxxxxx; davem@xxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxxx; >jdike@xxxxxxxxxxxxxxx >Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device. > >On Wed, Sep 15, 2010 at 11:13:44AM +0800, Xin, Xiaohui wrote: >> >From: Michael S. Tsirkin [mailto:mst@xxxxxxxxxx] >> >Sent: Sunday, September 12, 2010 9:37 PM >> >To: Xin, Xiaohui >> >Cc: netdev@xxxxxxxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; >> >mingo@xxxxxxx; davem@xxxxxxxxxxxxx; herbert@xxxxxxxxxxxxxxxxxxxx; >> >jdike@xxxxxxxxxxxxxxx >> >Subject: Re: [RFC PATCH v9 12/16] Add mp(mediate passthru) device. >> > >> >On Sat, Sep 11, 2010 at 03:41:14PM +0800, Xin, Xiaohui wrote: >> >> >>Playing with rlimit on data path, transparently to the application in this way >> >> >>looks strange to me, I suspect this has unexpected security implications. >> >> >>Further, applications may have other uses for locked memory >> >> >>besides mpassthru - you should not just take it because it's there. >> >> >> >> >> >>Can we have an ioctl that lets userspace configure how much >> >> >>memory to lock? This ioctl will decrement the rlimit and store >> >> >>the data in the device structure so we can do accounting >> >> >>internally. Put it back on close or on another ioctl. >> >> >Yes, we can decrement the rlimit in ioctl in one time to avoid >> >> >data path. >> >> > >> >> >>Need to be careful for when this operation gets called >> >> >>again with 0 or another small value while we have locked memory - >> >> >>maybe just fail with EBUSY? or wait until it gets unlocked? >> >> >>Maybe 0 can be special-cased and deactivate zero-copy?. >> >> >> >> >> >> >> How about we don't use a new ioctl, but just check the rlimit >> >> in one MPASSTHRU_BINDDEV ioctl? If we find mp device >> >> break the rlimit, then we fail the bind ioctl, and thus can't do >> >> zero copy any more. >> > >> >Yes, and not just check, but decrement as well. >> >I think we should give userspace control over >> >how much memory we can lock and subtract from the rlimit. >> >It's OK to add this as a parameter to MPASSTHRU_BINDDEV. >> >Then increment the rlimit back on unbind and on close? >> > >> >This opens up an interesting condition: process 1 >> >calls bind, process 2 calls unbind or close. >> >This will increment rlimit for process 2. >> >Not sure how to fix this properly. >> > >> I can't too, can we do any synchronous operations on rlimit stuff? >> I quite suspect in it. >> >> >-- >> >MST > >Here's what infiniband does: simply pass the amount of memory userspace >wants you to lock on an ioctl, and verify that either you have >CAP_IPC_LOCK or this number does not exceed the current rlimit. (must >be on ioctl, not on open, as we likely want the fd passed around between >processes), but do not decrement rlimit. Use this on following >operations. Be careful if this can be changed while operations are in >progress. > >This does mean that the effective amount of memory that userspace can >lock is doubled, but at least it is not unlimited, and we sidestep all >other issues such as userspace running out of lockable memory simply by >virtue of using the driver. > What I have done in mp device is almost the same as it. The difference is that I do not check the capability, and I use my own parameter ctor->pages instead of mm->locked_vm. So currently, 1) add the capability check 2) use mm->locked_vm 3) add an ioctl for userspace to configure how much memory can lock. >-- >MST -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html