On Tue, Apr 26, 2022 at 08:14:45PM -0700, Darrick J. Wong wrote: > On Wed, Apr 27, 2022 at 12:22:52PM +1000, Dave Chinner wrote: > > From: Dave Chinner <dchinner@xxxxxxxxxx> > > > > Callers currently have to round out the size of buffers to match the > > aligment constraints of log iovecs and xlog_write(). They should not > > need to know this detail, so introduce a new function to calculate > > the iovec length (for use in ->iop_size implementations). Also > > modify xlog_finish_iovec() to round up the length to the correct > > alignment so the callers don't need to do this, either. > > > > Convert the only user - inode forks - of this alignment rounding to > > use the new interface. > > Hmm. So currently, we require that the inode fork buffer be rounded up > to the next 4 bytes, and then I guess the log will copy that into the > log iovec? IOWs, if we have a 37-byte data fork, we'll allocate a 40 > byte buffer for the xfs_ifork, and the log will copy all 40 bytes into a > 40 byte iovec. Yes, that's how the current code works. It ends up leaking whatever was in those 3 bytes into the shadow buffer that we then copy into the log region. i.e. the existing code "leaks" non-zeroed allocated memory to the journal. > Now it looks like we'd allocate a 37-byte buffer for the xfs_ifork, but > the log iovec will still be 40 bytes. So ... do we copy 37 bytes out of > the ifork buffer and zero the last 3 bytes in the iovec? Yes, we copy 37 bytes out of the ifork buffer now into the shadow buffer so we do not overrun the inode fork buffer. > Does we leak > kernel memory in those last 3 bytes? We does indeed still leak the remaining 3 bytes as they are not zeroed. > Or do we copy 40 bytes and > overrun? No, we definitely don't do that - KASAN gets very unhappy when you do that... > It sorta looks like (at least for the local format case) xlog_copy_iovec > will copy 37 bytes and leave the last 3 bytes of the iovec in whatever > state it was in previously. Is that zeroed? Because it then looks like > xlog_finish_iovec will round that 37 up to 40. The shadow buffer is only partially zeroed - the part that makes io the header and iovec pointer array is zeroed, but the region that the journal data is written to is not zeroed. > (FWIW I'm just checking for kernel memory exposure vectors here.) Yup, I hadn't even considered that aspect of the code because we aren't actually leaking anything to userspace. If an unprivileged user can read 3 bytes of uninitialised data out of the journal we've got much, much bigger security problems to deal with. It should be trivial to fix, though. I'll do the initial fix as a standalone patch, though, and then roll it into this one because the problem has been around for a long while and fixing this patch doesn't produce an easily backportable fix... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx