Re: [PATCH 1/5] xfs_scrub: allow auxiliary pathnames for sandboxing

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Tue, 7 Nov 2023 10:35:11 -0800

On Tue, Nov 07, 2023 at 12:48:39AM -0800, Christoph Hellwig wrote:
> On Thu, May 25, 2023 at 06:55:02PM -0700, Darrick J. Wong wrote:
> > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > 
> > In the next patch, we'll tighten up the security on the xfs_scrub
> > service so that it can't escape.  However, sanboxing the service
> > involves making the host filesystem as inaccessible as possible, with
> > the filesystem to scrub bind mounted onto a known location within the
> > sandbox.  Hence we need one path for reporting and a new -A argument to
> > tell scrub what it should actually be trying to open.
> 
> This confuses me a bit.  Let me try to see if I understood it correctly:
> 
>  - currently xfs_scrub is called on the mount point, where the
>    mount-point is the first non-optional argument
> 
> With this patch there is a new environment variable that tells it what
> mount point to use, and only uses the one passed as the argument for
> reporting messages.
> 
> If I understand this correctly I find the decision odd.  I can see
> why you want to separate the two.  But I'd still expect the mount point
> to operate on to be passed as the argument, with an override for the
> reported messages.  And I'd expect the override passed as a normal
> command line option and not an environment variable. 

The reason why I bolted on the SERVICE_MOUNTPOINT= environment variable
is to preserve procfs discoverability.  The bash translation of these
systemd unit definitions for a scrub of /home is:

  mount /home /tmp/scrub --bind
  SERVICE_MODE=1 SERVICE_MOUNTPOINT=/tmp/scrub xfs_scrub -b /home

And the top listing for that will look like:

    PID USER      PR  NI %CPU  %MEM     TIME+ COMMAND
  11804 xfs_scru+ 20  19 10.3   0.1   1:26.94 xfs_scrub -b /home

(I omitted a few columns to narrow the top output.)

Notice how the path that the program is allegedly scrubbing (despite all
the private bind mount security mania) shows up in the ps listing, so
it's easier to figure out what each process is doing.  The actual
horrible details of the sandboxing are hidden in /proc/11804/environ

So that's the reasoning behind the somewhat backwards phrasing.  As they
say, "Permits many, money more!"

For everyone else following at home -- the reason for bind mounting the
actual mountpoint into a private mount tree at /tmp/scrub is (a) to
make it so that the scrub process can only see a ro version of a subset
of the filesystem tree; and (b) separate the mountpoint in the scrub
process so that the sysadmin typing "umount /home" will see it disappear
out of most process' mount trees without that affecting scrub.

(I don't think xfs_scrub is going to go rogue and start reading users'
credit card numbers out of /home, but why give it an easy opportunity?)

--D