Re: [PATCH v4] /dev/mem: Revoke mappings when a driver claims the region

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 28, 2021 at 1:58 AM David Hildenbrand <david@xxxxxxxxxx> wrote:
>
> On 27.05.21 23:30, Dan Williams wrote:
> > On Thu, May 27, 2021 at 1:58 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >>
> >> [+cc Daniel, Krzysztof, Jason, Christoph, linux-pci]
> >>
> >> On Thu, May 21, 2020 at 02:06:17PM -0700, Dan Williams wrote:
> >>> Close the hole of holding a mapping over kernel driver takeover event of
> >>> a given address range.
> >>>
> >>> Commit 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
> >>> introduced CONFIG_IO_STRICT_DEVMEM with the goal of protecting the
> >>> kernel against scenarios where a /dev/mem user tramples memory that a
> >>> kernel driver owns. However, this protection only prevents *new* read(),
> >>> write() and mmap() requests. Established mappings prior to the driver
> >>> calling request_mem_region() are left alone.
> >>>
> >>> Especially with persistent memory, and the core kernel metadata that is
> >>> stored there, there are plentiful scenarios for a /dev/mem user to
> >>> violate the expectations of the driver and cause amplified damage.
> >>>
> >>> Teach request_mem_region() to find and shoot down active /dev/mem
> >>> mappings that it believes it has successfully claimed for the exclusive
> >>> use of the driver. Effectively a driver call to request_mem_region()
> >>> becomes a hole-punch on the /dev/mem device.
> >>
> >> This idea of hole-punching /dev/mem has since been extended to PCI
> >> BARs via [1].
> >>
> >> Correct me if I'm wrong: I think this means that if a user process has
> >> mmapped a PCI BAR via sysfs, and a kernel driver subsequently requests
> >> that region via pci_request_region() or similar, we punch holes in the
> >> the user process mmap.  The driver might be happy, but my guess is the
> >> user starts seeing segmentation violations for no obvious reason and
> >> is not happy.
> >>
> >> Apart from the user process issue, the implementation of [1] is
> >> problematic for PCI because the mmappable sysfs attributes now depend
> >> on iomem_init_inode(), an fs_initcall, which means they can't be
> >> static attributes, which ultimately leads to races in creating them.
> >
> > See the comments in iomem_get_mapping(), and revoke_iomem():
> >
> >          /*
> >           * Check that the initialization has completed. Losing the race
> >           * is ok because it means drivers are claiming resources before
> >           * the fs_initcall level of init and prevent iomem_get_mapping users
> >           * from establishing mappings.
> >           */
> >
> > ...the observation being that it is ok for the revocation inode to
> > come on later in the boot process because userspace won't be able to
> > use the fs yet. So any missed calls to revoke_iomem() would fall back
> > to userspace just seeing the resource busy in the first instance. I.e.
> > through the normal devmem_is_allowed() exclusion.
> >
> >>
> >> So I'm raising the question of whether this hole-punch is the right
> >> strategy.
> >>
> >>    - Prior to revoke_iomem(), __request_region() was very
> >>      self-contained and really only depended on the resource tree.  Now
> >>      it depends on a lot of higher-level MM machinery to shoot down
> >>      mappings of other tasks.  This adds quite a bit of complexity and
> >>      some new ordering constraints.
> >>
> >>    - Punching holes in the address space of an existing process seems
> >>      unfriendly.  Maybe the driver's __request_region() should fail
> >>      instead, since the driver should be prepared to handle failure
> >>      there anyway.
> >
> > It's prepared to handle failure, but in this case it is dealing with a
> > root user of 2 minds.
> >
> >>
> >>    - [2] suggests that the hole punch protects drivers from /dev/mem
> >>      writers, especially with persistent memory.  I'm not really
> >>      convinced.  The hole punch does nothing to prevent a user process
> >>      from mmapping and corrupting something before the driver loads.
> >
> > The motivation for this was a case that was swapping between /dev/mem
> > access and /dev/pmem0 access and they forgot to stop using /dev/mem
> > when they switched to /dev/pmem0. If root wants to use /dev/mem it can
> > use it, if root wants to stop the driver from loading it can set
> > mopdrobe policy or manually unbind, and if root asks the kernel to
> > load the driver while it is actively using /dev/mem something has to
> > give. Given root has other options to stop a driver the decision to
> > revoke userspace access when root messes up and causes a collision
> > seems prudent to me.
> >
>
> Is there a real use case for mapping pmem via /dev/mem or could we just
> prohibit the access to these areas completely?

The kernel offers conflicting access to iomem resources and a
long-standing mechanism to enforce mutual exclusion
(CONFIG_IO_STRICT_DEVMEM) between those interfaces. That mechanism was
found to be incomplete for the case where a /dev/mem mapping is
maintained after a kernel driver is attached, and incomplete for other
mechanisms to map iomem like pci-sysfs. This was found with PMEM, but
the issue is larger and applies to userspace drivers / debug in
general.

> What's the use case for "swapping between /dev/mem access and /dev/pmem0
> access" ?

"Who knows". I mean, I know in this case it was a platform validation
test using /dev/mem for "reasons", but I am not sure that is relevant
to the wider concern. If CONFIG_IO_STRICT_DEVMEM=n exclusion is
enforced when drivers pass the IORESOURCE_EXCLUSIVE flag, if
CONFIG_IO_STRICT_DEVMEM=y exclusion is enforced whenever the kernel
marks a resource IORESOURCE_BUSY, and if kernel lockdown is enabled
the driver state is moot as LOCKDOWN_DEV_MEM and LOCKDOWN_PCI_ACCESS
policy is in effect.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux