RE: [PATCH v5 10/10] Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Michael Kelley <mhklinux@xxxxxxxxxxx> Sent: Tuesday, March 18, 2025 7:10 PM
> 
> From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Tuesday, March
> 18, 2025 5:34 PM
> >
> > On 3/17/2025 4:51 PM, Michael Kelley wrote:
> > > From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Wednesday, February 26, 2025 3:08 PM

[snip]

> > >> +
> > >> +	region = mshv_partition_region_by_gfn(partition, mem.guest_pfn);
> > >> +	if (!region)
> > >> +		return -EINVAL;
> > <snip>
> >> +	case MSHV_GPAP_ACCESS_TYPE_ACCESSED:
> > >> +		hv_type_mask = 1;
> > >> +		if (args.access_op == MSHV_GPAP_ACCESS_OP_CLEAR) {
> > >> +			hv_flags.clear_accessed = 1;
> > >> +			/* not accessed implies not dirty */
> > >> +			hv_flags.clear_dirty = 1;
> > >> +		} else { // MSHV_GPAP_ACCESS_OP_SET
> > >
> > > Avoid C++ style comments.
> > >
> > Ack
> >
> > >> +			hv_flags.set_accessed = 1;
> > >> +		}
> > >> +		break;
> > >> +	case MSHV_GPAP_ACCESS_TYPE_DIRTY:
> > >> +		hv_type_mask = 2;
> > >> +		if (args.access_op == MSHV_GPAP_ACCESS_OP_CLEAR) {
> > >> +			hv_flags.clear_dirty = 1;
> > >> +		} else { // MSHV_GPAP_ACCESS_OP_SET
> > >
> > > Same here.
> > >
> > Ack
> >
> > >> +			hv_flags.set_dirty = 1;
> > >> +			/* dirty implies accessed */
> > >> +			hv_flags.set_accessed = 1;
> > >> +		}
> > >> +		break;
> > >> +	}
> > >> +
> > >> +	states = vzalloc(states_buf_sz);
> > >> +	if (!states)
> > >> +		return -ENOMEM;
> > >> +
> > >> +	ret = hv_call_get_gpa_access_states(partition->pt_id, args.page_count,
> > >> +					    args.gpap_base, hv_flags, &written,
> > >> +					    states);
> > >> +	if (ret)
> > >> +		goto free_return;
> > >> +
> > >> +	/*
> > >> +	 * Overwrite states buffer with bitmap - the bits in hv_type_mask
> > >> +	 * correspond to bitfields in hv_gpa_page_access_state
> > >> +	 */
> > >> +	for (i = 0; i < written; ++i)
> > >> +		assign_bit(i, (ulong *)states,
> > >
> > > Why the cast to ulong *?  I think this argument to assign_bit() is void *, in
> > > which case the cast wouldn't be needed.
> > >
> > It looks like assign_bit() and friends resolve to a set of functions which do
> > take an unsigned long pointer, e.g.:
> >
> > __set_bit() -> generic___set_bit(unsigned long nr, volatile unsigned long *addr)
> > set_bit() -> arch_set_bit(unsigned int nr, volatile unsigned long *p)
> > etc...
> >
> > So a cast is necessary.
> 
> Indeed, you are right.  Seems like set_bit() and friends should take a void *.
> But that's a different kettle of fish.
> 
> >
> > > Also, assign_bit() does atomic bit operations. Doing such in a loop like
> > > here will really hammer the hardware memory bus with atomic
> > > read-modify-write cycles. Use __assign_bit() instead, which does
> > > non-atomic operations. You don't need atomic here as no other
> > > threads are modifying the bit array.
> > >
> > I didn't realize it was atomic. I'll change it to __assign_bit().
> >
> > >> +			   states[i].as_uint8 & hv_type_mask);
> > >
> > > OK, so the starting contents of "states" is an array of bytes. The ending
> > > contents is an array of bits. This works because every bit in the ending
> > > bit array is set to either 0 or 1. Overlap occurs on the first iteration
> > > where the code reads the 0th byte, and writes the 0th bit, which is part of
> > > the 0th byte. The second iteration reads the 1st byte, and writes the 1st bit,
> > > which doesn't overlap, and there's no overlap from then on.
> > >
> > > Suppose "written" is not a multiple of 8. The last byte of "states" as an
> > > array of bits will have some bits that have not been set to either 0 or 1 and
> > > might be leftover garbage from when "states" was an array of bytes. That
> > > garbage will get copied to user space. Is that OK? Even if user space knows
> > > enough to ignore those bits, it seems a little dubious to be copying even
> > > a few bits of garbage to user space.
> > >
> > > Some comments might help here.
> > >
> > This is a good point. The expectation is indeed that userspace knows which
> > bits are valid from the returned "written" value, but I agree it's a bit
> > odd to have some garbage bits in the last byte. How does this look (to be
> > inserted here directly after the loop):
> >
> > +       /* zero the unused bits in the last byte of the returned bitmap */
> > +       if (written > 0) {
> > +               u8 last_bits_mask;
> > +               int last_byte_idx;
> > +               int bits_rem = written % 8;
> > +
> > +               /* bits_rem == 0 when all bits in the last byte were assigned */
> > +               if (bits_rem > 0) {
> > +                       /* written > 0 ensures last_byte_idx >= 0 */
> > +                       last_byte_idx = ((written + 7) / 8) - 1;
> > +                       /* bits_rem > 0 ensures this masks 1 to 7 bits */
> > +                       last_bits_mask = (1 << bits_rem) - 1;
> > +                       states[last_byte_idx].as_uint8 &= last_bits_mask;
> > +               }
> > +       }
> 
> A simpler approach is to "continue" the previous loop.  And if "written"
> is zero, this additional loop won't do anything either:
> 
> 	for (i = written; i < ALIGN(written, 8); ++i)
> 		__clear_bit(i, (ulong *)states);
> 

One further thought here: Could "written" be less than
args.page_count at this point? That would require
hv_call_get_gpa_access_states() to not fail, but still return
a value for written that is less than args.page_count. If that
could happen, then the above loop should be:

	for (i = written; i < bitmap_buf_sz * 8; ++i)
		__clear_bit(i, (ulong *)states);

so that all the uninitialized bits and bytes that will be written
back to user space are cleared.

> >
> > The remaining bytes could be memset() to zero but I think it's fine to leave
> > them.
> 
> I agree.  The remaining bytes aren't written back to user space anyway
> since the copy_to_user() uses bitmap_buf_sz.

Maybe I misunderstood what you meant by "remaining bytes".  I think
all bits and bytes that are written back to user space should have
valid data or zeros so that no garbage is written back.

Michael

> 
> >
> > >> +
> > >> +	args.page_count = written;
> > >> +
> > >> +	if (copy_to_user(user_args, &args, sizeof(args))) {
> > >> +		ret = -EFAULT;
> > >> +		goto free_return;
> > >> +	}
> > >> +	if (copy_to_user((void __user *)args.bitmap_ptr, states, bitmap_buf_sz))
> > >> +		ret = -EFAULT;
> > >> +
> > >> +free_return:
> > >> +	vfree(states);
> > >> +	return ret;
> > >> +}





[Index of Archives]     [Linux Kernel]     [Kernel Newbies]     [x86 Platform Driver]     [Netdev]     [Linux Wireless]     [Netfilter]     [Bugtraq]     [Linux Filesystems]     [Yosemite Discussion]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]

  Powered by Linux