Hi Andrei On Thu, Dec 12, 2024 at 10:33 PM Andrei Vagin <avagin@xxxxxxxxx> wrote: > > On Wed, Dec 11, 2024 at 2:47 PM Jeff Xu <jeffxu@xxxxxxxxxxxx> wrote: > > > > Hi Andrei > > > > Thanks for your email. > > I was hoping to get some feedback from CRIU devs, and happy to see you > > reaching out.. > > > ... > > I have been thinking of other alternatives, but those would require > > more understanding on CRIU use cases. > > One of my questions is: Would CRIU target an individual process? or > > entire systems? > > It targets individual processes that have been forked from the main > CRIU process. > > > > > If it is an individual process, we could use prctl to opt-in/opt-out > > certain processes. There could be two alternatives. > > 1> Opt-in solution: process must set prctl.seal_criu_mapping, this > > needs to be set before execve() because sealing is applied at execve() > > call. > > 2> opt-out solution: The system will by default seal all of the system > > mappings, but individual processes can opt-out by setting > > prctl.not_seal_criu_mappings. This also needs to be set before > > execve() call. > > I like the idea and I think the opt-out solution should work for CRIU. > CRIU will be able to call this prctl and re-execute itself. > Great! Let's iterate on the opt-out solution then. > Let me give you a bit of context on how CRIU works. When CRIU restores > processes, it recreates a process tree by forking itself. Afterwards, it > restores all mappings in each process but doesn't put them to proper > addresses. After that, each process unmaps CRIU mappings from its address > space and remaps its restored mappings to the proper addresses. So CRIU should > be able to move system mappings and seal them if they have been sealed before > dump. Thanks for the context. > BTW, It isn't just about CRIU. gVisor and maybe some other sandbox solutions > will be affected by this change too. gVisor uses stub-processes to represent > guest address spaces. In a stub process, it unmaps all system mappings. > > > > > For both cases, we will want to identify what type of mapping CRIU > > cares about, i.e. maybe CRIU doesn't care about uprobe and vsyscall ? > > and only care about vdso/vvar/sigpage ? > > As for now, it handles only vdso/vvar/sigpage mappings. It doesn't care > about vsyscall because it is always mapped to the fixed address. > Given this understanding that CRIU intends to replace the current process's vdso/vvar with that of the restored process, and therefore doesn't want the parent CRIU process to seal the vdso/vvar, a prctl opt-out for vdso/vvar is reasonable path going forward. The sigpage mapping also should be included in this opt-out, for the same reason as vdso/vvar, it is created by the arch_setup_additional_pages() call during execve(). However, the uprobe mapping shouldn't be included by this opt-out, as it is not created by arch_setup_additional_pages() during execveat(). CRIU should simply restore it from the restored process, if present. vsyscall, which is created when the system boots, and maps to a fixed virtual address and page, shouldn't be included by this opt-out. So I'm proposing to opt-out vdso/vvar/sigpage with a new prctl: disable_mseal_criu_system_mappings = true/false What do you think ? > gVisor should be able to unmap all system mappings from a process > address space. > Do you think this opt-out solution will work for gVisor too ? Thanks -Jeff > Thanks, > Andrei