Re: out-of-bounds write in the function ata_pio_sector

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Jens, Christoph, Martin,

On Wed, Jan 01, 2025 at 09:17:02PM -0500, reveliofuzzing wrote:
> Hi there,
>
> We found an out-of-bounds write in the function ata_pio_sector, which can cause
> the kernel to crash. We would like to report it for your reference.
>
> ## Problem in ata_pio_sector
> ata_pio_sector uses the following code to decide which page to use for the I/O:
> page = sg_page(qc->cursg);
> offset = qc->cursg->offset + qc->cursg_ofs;
>
> /* get the current page and offset */
> page = nth_page(page, (offset >> PAGE_SHIFT));
> offset %= PAGE_SIZE;
> but we found that `offset` could be as high as 0x5000---qc->cursg_ofs==0x5000,
> qc->cursg->offset == 0x0, making `page` point to a higher-position page that
> belongs to other threads.

I've managed to reproduce this on v6.13 using reveliofuzzing syzkaller
reproducer (which issues SCSI_IOCTL_SEND_COMMAND to the /dev/sg devices).

The problem is that the final if-statement in function ata_pio_sector()
(which lives in drivers/ata/libata-sff.c) looks like this:

	if (qc->cursg_ofs == qc->cursg->length) {
		qc->cursg = sg_next(qc->cursg);
		if (!qc->cursg)
			ap->hsm_task_state = HSM_ST_LAST;
		qc->cursg_ofs = 0;
	}

When the memory corruption occurs, it is because this if-statement never
evaluated to true, so ata_pio_sector() just continued writing to more and
more pages, overwriting pages owned by other tasks.


Here is some trace_printks that I added that shows the problem:

     kworker/1:3-116     [001] d.h1.   121.445073: ata_pio_sector: 1/3 qc: ffff888008928130 cursg_ofs: 0x0 cursg->offset: 0x0 offset: 0x0 offset_mod: 0x0 sect_size: 0x200
     kworker/1:3-116     [001] d.h1.   121.445079: ata_pio_sector: 2/3 orig_page: ffffea0000263180 page: ffffea0000263180 page_diff: 0x0 nbytes: 0xd42 curbytes: 0x0
     kworker/1:3-116     [001] d.h1.   121.445098: ata_pio_sector: 3/3 qc->cursg_ofs: 0x200 qc->cursg->length: 0xd42 equal ? 0 n_elem: 1 orig_n_elem: 1
     kworker/1:3-116     [001] d.h1.   121.445164: ata_pio_sector: 1/3 qc: ffff888008928130 cursg_ofs: 0x200 cursg->offset: 0x0 offset: 0x200 offset_mod: 0x200 sect_size: 0x200
     kworker/1:3-116     [001] d.h1.   121.445168: ata_pio_sector: 2/3 orig_page: ffffea0000263180 page: ffffea0000263180 page_diff: 0x0 nbytes: 0xd42 curbytes: 0x200
     kworker/1:3-116     [001] d.h1.   121.445185: ata_pio_sector: 3/3 qc->cursg_ofs: 0x400 qc->cursg->length: 0xd42 equal ? 0 n_elem: 1 orig_n_elem: 1
     kworker/1:3-116     [001] d.h1.   123.492925: ata_pio_sector: 1/3 qc: ffff888008928130 cursg_ofs: 0x400 cursg->offset: 0x0 offset: 0x400 offset_mod: 0x400 sect_size: 0x200
     kworker/1:3-116     [001] d.h1.   123.492929: ata_pio_sector: 2/3 orig_page: ffffea0000263180 page: ffffea0000263180 page_diff: 0x0 nbytes: 0xd42 curbytes: 0x400
     kworker/1:3-116     [001] d.h1.   123.492943: ata_pio_sector: 3/3 qc->cursg_ofs: 0x600 qc->cursg->length: 0xd42 equal ? 0 n_elem: 1 orig_n_elem: 1
     kworker/1:3-116     [001] d.h1.   123.492990: ata_pio_sector: 1/3 qc: ffff888008928130 cursg_ofs: 0x600 cursg->offset: 0x0 offset: 0x600 offset_mod: 0x600 sect_size: 0x200
     kworker/1:3-116     [001] d.h1.   123.492993: ata_pio_sector: 2/3 orig_page: ffffea0000263180 page: ffffea0000263180 page_diff: 0x0 nbytes: 0xd42 curbytes: 0x600
     kworker/1:3-116     [001] d.h1.   123.493005: ata_pio_sector: 3/3 qc->cursg_ofs: 0x800 qc->cursg->length: 0xd42 equal ? 0 n_elem: 1 orig_n_elem: 1

We can see that qc->cursg_ofs gets incremented by qc->sect_size (0x200),
but since qc->cursg_ofs is never equal to qc->cursg->length (0xd42),
it doesn't stop, and ends up writing way more than qc->cursg->length.



If I compare the function:
/* Transfer qc->sect_size bytes of data from/to the ATA device. */
ata_pio_sector()

with function:
/* __atapi_pio_bytes - Transfer data from/to the ATAPI device. */
__atapi_pio_bytes()


I can see that __atapi_pio_bytes() actually has this code:
/* don't overrun current sg */
count = min(sg->length - qc->cursg_ofs, bytes);


which I guess makes sense, since the function has _bytes_ in the name.
ata_pio_sector() has _sector_ in the name, so it makes sense that it
would have to transfer one or more sectors (even if there currently
doesn't seem to be anything that guarantees this...).


So, should I add a similar:
count = min(sg->length - qc->cursg_ofs, bytes);
to ata_pio_sector(), or should I add code to
ata_pio_sector() that simply rejects a qc->nbytes that is not a multiple
of qc->sect_size ?



I was kind of expecting some upper layer, SCSI or block, to have rejected
an operation that is not a multiple of the sector size.

Is that a silly assumption?


Looking at drivers/scsi/scsi_ioctl.c:sg_scsi_ioctl() I can see that it
rejects:
if (in_len > PAGE_SIZE || out_len > PAGE_SIZE)


If I dump the request in gdb, it seems that:
cmd_flags is: REQ_OP_DRV_IN
__data_len is: 0xd42

(gdb) p /x *((struct request *) ((void*)qc->scsicmd - sizeof(struct request)))
$21 = {q = 0xffff888008d50000, mq_ctx = 0xffff88806d242980, mq_hctx = 0xffff88800781e000, 
  cmd_flags = 0x22, rq_flags = 0x18, tag = 0x0, internal_tag = 0x1, timeout = 0xea60, 
  __data_len = 0xd42, __sector = 0xffffffffffffffff, bio = 0xffff8880088fc700, 
  biotail = 0xffff8880088fc700, {queuelist = {next = 0xffff8880088f12c8, 
      prev = 0xffff8880088f12c8}, rq_next = 0xffff8880088f12c8}, part = 0x0, 
  alloc_time_ns = 0x0, start_time_ns = 0xa837cc389, io_start_time_ns = 0x0, 
  stats_sectors = 0x0, nr_phys_segments = 0x1, nr_integrity_segments = 0x0, state = 0x1, 
  ref = {counter = 0x1}, deadline = 0xfffd2516, {hash = {next = 0x0, pprev = 0x0}, 
    ipi_list = {next = 0x0}}, {rb_node = {__rb_parent_color = 0xffff8880088f1320, 
      rb_right = 0x0, rb_left = 0x0}, special_vec = {bv_page = 0xffff8880088f1320, 
      bv_len = 0x0, bv_offset = 0x0}}, elv = {icq = 0x0, priv = {0xffff88800781f860, 
      0x0}}, flush = {seq = 0x7, saved_end_io = 0x0}, fifo_time = 0xfffc2f88, 
  end_io = 0xffffffff81f55550, end_io_data = 0xffff88800de2fb80}

p /x qc->scsicmd->cmnd[0]
$25 = 0x85
include/scsi/scsi_proto.h:#define       ATA_16                0x85      /* 16-byte pass-thru */


which must have meant that:
in_len == 0, out_len = 0xd42 (3394).

The page size on this system is 4k (4096), so I suppose that out_len is not
greater than PAGE_SIZE.

So I guess that SCSI layer does not reject a request that is not a multiple of
the sector size.

So what is the best solution here?

Should we add such a check in SCSI? libata? or should each (libata) driver
themselves reject such a thing?

Advice is welcome :)


Kind regards,
Niklas




[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux