Li Zefan <lizefan@xxxxxxxxxx> writes: > The commit has a stable tag, and it has been backported to 3.10, and I think > it should be applied to older kernels too, so here it is. > > I change bio_for_each_segment_all() to __bio_for_each_segment(), otherwise > it won't compile. Thanks Li, I've queued this for the 3.5 kernel as well. Cheers, -- Luis > > =========== > > From: Roland Dreier <roland@xxxxxxxxxxxxxxx> > > commit 35dc248383bbab0a7203fca4d722875bc81ef091 upstream. > > There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances > leads to one process writing data into the address space of some other > random unrelated process if the ioctl is interrupted by a signal. > What happens is the following: > > - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the > underlying SCSI command will transfer data from the SCSI device to > the buffer provided in the ioctl) > > - Before the command finishes, a signal is sent to the process waiting > in the ioctl. This will end up waking up the sg_ioctl() code: > > result = wait_event_interruptible(sfp->read_wait, > (srp_done(sfp, srp) || sdp->detached)); > > but neither srp_done() nor sdp->detached is true, so we end up just > setting srp->orphan and returning to userspace: > > srp->orphan = 1; > write_unlock_irq(&sfp->rq_list_lock); > return result; /* -ERESTARTSYS because signal hit process */ > > At this point the original process is done with the ioctl and > blithely goes ahead handling the signal, reissuing the ioctl, etc. > > - Eventually, the SCSI command issued by the first ioctl finishes and > ends up in sg_rq_end_io(). At the end of that function, we run through: > > write_lock_irqsave(&sfp->rq_list_lock, iflags); > if (unlikely(srp->orphan)) { > if (sfp->keep_orphan) > srp->sg_io_owned = 0; > else > done = 0; > } > srp->done = done; > write_unlock_irqrestore(&sfp->rq_list_lock, iflags); > > if (likely(done)) { > /* Now wake up any sg_read() that is waiting for this > * packet. > */ > wake_up_interruptible(&sfp->read_wait); > kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN); > kref_put(&sfp->f_ref, sg_remove_sfp); > } else { > INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext); > schedule_work(&srp->ew.work); > } > > Since srp->orphan *is* set, we set done to 0 (assuming the > userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN > ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext() > to run in a workqueue. > > - In workqueue context we go through sg_rq_end_io_usercontext() -> > sg_finish_rem_req() -> blk_rq_unmap_user() -> ... -> > bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user(). > > The key point here is that we are doing copy_to_user() on a > workqueue -- that is, we're on a kernel thread with current->mm > equal to whatever random previous user process was scheduled before > this kernel thread. So we end up copying whatever data the SCSI > command returned to the virtual address of the buffer passed into > the original ioctl, but it's quite likely we do this copying into a > different address space! > > As suggested by James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>, > add a check for current->mm (which is NULL if we're on a kernel thread > without a real userspace address space) in bio_uncopy_user(), and skip > the copy if we're on a kernel thread. > > There's no reason that I can think of for any caller of bio_uncopy_user() > to want to do copying on a kernel thread with a random active userspace > address space. > > Huge thanks to Costa Sapuntzakis <costa@xxxxxxxxxxxxxxx> for the > original pointer to this bug in the sg code. > > Signed-off-by: Roland Dreier <roland@xxxxxxxxxxxxxxx> > Tested-by: David Milburn <dmilburn@xxxxxxxxxx> > Cc: Jens Axboe <axboe@xxxxxxxxx> > Signed-off-by: James Bottomley <JBottomley@xxxxxxxxxxxxx> > [lizf: backported to 3.4: > - Use __bio_for_each_segment() instead of bio_for_each_segment_all()] > Signed-off-by: Li Zefan <lizefan@xxxxxxxxxx> > --- > fs/bio.c | 20 +++++++++++++++----- > 1 file changed, 15 insertions(+), 5 deletions(-) > > diff --git a/fs/bio.c b/fs/bio.c > index 84da885..c0e5a4e 100644 > --- a/fs/bio.c > +++ b/fs/bio.c > @@ -787,12 +787,22 @@ static int __bio_copy_iov(struct bio *bio, struct bio_vec *iovecs, > int bio_uncopy_user(struct bio *bio) > { > struct bio_map_data *bmd = bio->bi_private; > - int ret = 0; > + struct bio_vec *bvec; > + int ret = 0, i; > > - if (!bio_flagged(bio, BIO_NULL_MAPPED)) > - ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, > - bmd->nr_sgvecs, bio_data_dir(bio) == READ, > - 0, bmd->is_our_pages); > + if (!bio_flagged(bio, BIO_NULL_MAPPED)) { > + /* > + * if we're in a workqueue, the request is orphaned, so > + * don't copy into a random user address space, just free. > + */ > + if (current->mm) > + ret = __bio_copy_iov(bio, bmd->iovecs, bmd->sgvecs, > + bmd->nr_sgvecs, bio_data_dir(bio) == READ, > + 0, bmd->is_our_pages); > + else if (bmd->is_our_pages) > + __bio_for_each_segment(bvec, bio, i, 0) > + __free_page(bvec->bv_page); > + } > bio_free_map_data(bmd); > bio_put(bio); > return ret; -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html