On Tue, Feb 14, 2023 at 08:51:09AM +0100, Juergen Gross wrote: > On 13.02.23 22:01, Demi Marie Obenour wrote: > > On Mon, Feb 13, 2023 at 10:26:11AM +0100, Juergen Gross wrote: > > > On 07.02.23 03:10, Demi Marie Obenour wrote: > > > > When a grant entry is still in use by the remote domain, Linux must put > > > > it on a deferred list. Normally, this list is very short, because > > > > the PV network and block protocols expect the backend to unmap the grant > > > > first. However, Qubes OS's GUI protocol is subject to the constraints > > > > of the X Window System, and as such winds up with the frontend unmapping > > > > the window first. As a result, the list can grow very large, resulting > > > > in a massive memory leak and eventual VM freeze. > > > > > > > > Fix this problem by bumping the number of entries that the VM will > > > > attempt to free at each iteration to 10000. This is an ugly hack that > > > > may well make a denial of service easier, but for Qubes OS that is less > > > > bad than the problem Qubes OS users are facing today. > > > > > > > There really > > > > needs to be a way for a frontend to be notified when the backend has > > > > unmapped the grants. > > > > > > Please remove this sentence from the commit message, or move it below the > > > "---" marker. > > > > Will fix in v2. > > > > > There are still some flag bits unallocated in struct grant_entry_v1 or > > > struct grant_entry_header. You could suggest some patches for Xen to use > > > one of the bits as a marker to get an event from the hypervisor if a > > > grant with such a bit set has been unmapped. > > > > That is indeed a good idea. There are other problems with the grant > > interface as well, but those can be dealt with later. > > > > > I have no idea, whether such an interface would be accepted by the > > > maintainers, though. > > > > > > > Additionally, a module parameter is provided to > > > > allow tuning the reclaim speed. > > > > > > > > The code previously used printk(KERN_DEBUG) whenever it had to defer > > > > reclaiming a page because the grant was still mapped. This resulted in > > > > a large volume of log messages that bothered users. Use pr_debug > > > > instead, which suppresses the messages by default. Developers can > > > > enable them using the dynamic debug mechanism. > > > > > > > > Fixes: QubesOS/qubes-issues#7410 (memory leak) > > > > Fixes: QubesOS/qubes-issues#7359 (excessive logging) > > > > Fixes: 569ca5b3f94c ("xen/gnttab: add deferred freeing logic") > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > Signed-off-by: Demi Marie Obenour <demi@xxxxxxxxxxxxxxxxxxxxxx> > > > > --- > > > > Anyone have suggestions for improving the grant mechanism? Argo isn't > > > > a good option, as in the GUI protocol there are substantial performance > > > > wins to be had by using true shared memory. Resending as I forgot the > > > > Signed-off-by on the first submission. Sorry about that. > > > > > > > > drivers/xen/grant-table.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c > > > > index 5c83d41..2c2faa7 100644 > > > > --- a/drivers/xen/grant-table.c > > > > +++ b/drivers/xen/grant-table.c > > > > @@ -355,14 +355,20 @@ > > > > static void gnttab_handle_deferred(struct timer_list *); > > > > static DEFINE_TIMER(deferred_timer, gnttab_handle_deferred); > > > > +static atomic64_t deferred_count; > > > > +static atomic64_t leaked_count; > > > > +static unsigned int free_per_iteration = 10000; > > > > > > As you are adding a kernel parameter to change this value, please set the > > > default to a value not potentially causing any DoS problems. Qubes OS can > > > still use a higher value then. > > > > Do you have any suggestions? I don’t know if this is actually a DoS > > concern anymore. Shrinking the interval between iterations would be. > > Why don't you use today's value of 10 for the default? Will do. I now remember that the DoS concern is that the kernel could be made to use excess CPU trying and failing to reclaim memory. > > > > + > > > > static void gnttab_handle_deferred(struct timer_list *unused) > > > > { > > > > - unsigned int nr = 10; > > > > + unsigned int nr = READ_ONCE(free_per_iteration); > > > > > > I don't see why you are needing READ_ONCE() here. > > > > free_per_iteration can be concurrently modified via sysfs. > > My remark was based on the wrong assumption that ignore_limit could be > dropped. Even if ignore_limit could not be dropped, READ_ONCE is still necessary to avoid a data race. -- Sincerely, Demi Marie Obenour (she/her/hers) Invisible Things Lab
Attachment:
signature.asc
Description: PGP signature