On Tue, Dec 6, 2016 at 4:38 PM, Johannes Thumshirn <jthumshirn@xxxxxxx> wrote: > On Tue, Dec 06, 2016 at 10:43:57AM +0100, Dmitry Vyukov wrote: >> On Tue, Dec 6, 2016 at 10:32 AM, Johannes Thumshirn <jthumshirn@xxxxxxx> wrote: >> > On Mon, Dec 05, 2016 at 07:03:39PM +0000, Al Viro wrote: >> >> On Mon, Dec 05, 2016 at 04:17:53PM +0100, Johannes Thumshirn wrote: >> >> > 633 hp = &srp->header; >> >> > [...] >> >> > 646 hp->dxferp = (char __user *)buf + cmd_size; >> >> >> >> > So the memory for hp->dxferp comes from: >> >> > 633 hp = &srp->header; >> >> >> >> ???? >> >> >> >> > >From my debug instrumentation I see that the dxferp ends up in the >> >> > iovec_iter's kvec->iov_base and the faulting address is always dxferp + n * >> >> > 4k with n in [1, 16] (and we're copying 16 4k pages from the iovec into the >> >> > bio). >> >> >> >> _Address_ of hp->dxferp comes from that assignment; the value is 'buf' >> >> argument of sg_write() + small offset. In this case, it should point >> >> inside a pipe buffer, which is, indeed, at a kernel address. Who'd >> >> allocated srp is irrelevant. >> > >> > Yes I realized that as well when I had enough distance between me and the >> > code... >> > >> >> >> >> And if you end up dereferencing more than one page worth there, you do have >> >> a problem - pipe buffers are not going to be that large. Could you slap >> >> WARN_ON((size_t)input_size > count); >> >> right after the calculation of input_size in sg_write() and see if it triggers >> >> on your reproducer? >> > >> > I did and it didn't trigger. What triggers is (as expected) a >> > WARN_ON((size_t)mxsize > count); >> > We have count at 80 and mxsize (which ends in hp->dxfer_len) at 65499. But the >> > 65499 bytes are the len of the data we're suppost to be copying in via the >> > iov. I'm still rather confused what's happening here, sorry. >> >> >> I think the critical piece here is some kind of race or timing >> condition. Note that the test program executes all of >> memfd_create/write/open/sendfile twice. Second time the calls race >> with each other, but they also can race with the first execution of >> the calls. > > FWIW I've just run the reproducer once instead of looping it to check how it > would normally behave and it bailes out at: > > 604 if (count < (SZ_SG_HEADER + 6)) > 605 return -EIO; /* The minimum scsi command length is 6 bytes. */ > > That means, weren't going down the copy_form_iter() road at all. Usually, but > sometimes we do. And then we try to copy 16 pages from the pipe buffer (is > this correct?). > The reproducer does: sendfile("/dev/sg0", memfd, offset_in_memfd, 0x10000); > > I don't see how we get there? Could it be random data from the mmap() we point > the memfd to? > > This bug is confusing to be honest. Where does this count come from? What address in the user program? Is it 0x20012fxx? One possibility for non-deterministically changing inputs is that this part: case 2: NONFAILING(*(uint32_t*)0x20012fd8 = (uint32_t)0x28); NONFAILING(*(uint32_t*)0x20012fdc = (uint32_t)0xffff); NONFAILING(*(uint64_t*)0x20012fe0 = (uint64_t)0x0); NONFAILING(*(uint64_t*)0x20012fe8 = (uint64_t)0xffffffffffff993f); NONFAILING(*(uint64_t*)0x20012ff0 = (uint64_t)0xa8b); NONFAILING(*(uint32_t*)0x20012ff8 = (uint32_t)0xff); r[9] = syscall(__NR_write, r[2], 0x20012fd8ul, 0x28ul, 0, 0, 0, 0, 0, 0); runs concurrently with this part: case 0: r[0] = syscall(__NR_mmap, 0x20000000ul, 0x13000ul, 0x3ul, 0x32ul, 0xfffffffffffffffful, 0x0ul, 0, 0, 0); So all of the input data to the write, or a subset of the input data, can be zeros. -- To unsubscribe from this list: send the line "unsubscribe linux-block" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html