From: Michael Kelley <mhklinux@xxxxxxxxxxx> Sent: Tuesday, March 18, 2025 7:10 PM > > From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Tuesday, March > 18, 2025 5:34 PM > > > > On 3/17/2025 4:51 PM, Michael Kelley wrote: > > > From: Nuno Das Neves <nunodasneves@xxxxxxxxxxxxxxxxxxx> Sent: Wednesday, February 26, 2025 3:08 PM [snip] > > >> + > > >> + region = mshv_partition_region_by_gfn(partition, mem.guest_pfn); > > >> + if (!region) > > >> + return -EINVAL; > > <snip> > >> + case MSHV_GPAP_ACCESS_TYPE_ACCESSED: > > >> + hv_type_mask = 1; > > >> + if (args.access_op == MSHV_GPAP_ACCESS_OP_CLEAR) { > > >> + hv_flags.clear_accessed = 1; > > >> + /* not accessed implies not dirty */ > > >> + hv_flags.clear_dirty = 1; > > >> + } else { // MSHV_GPAP_ACCESS_OP_SET > > > > > > Avoid C++ style comments. > > > > > Ack > > > > >> + hv_flags.set_accessed = 1; > > >> + } > > >> + break; > > >> + case MSHV_GPAP_ACCESS_TYPE_DIRTY: > > >> + hv_type_mask = 2; > > >> + if (args.access_op == MSHV_GPAP_ACCESS_OP_CLEAR) { > > >> + hv_flags.clear_dirty = 1; > > >> + } else { // MSHV_GPAP_ACCESS_OP_SET > > > > > > Same here. > > > > > Ack > > > > >> + hv_flags.set_dirty = 1; > > >> + /* dirty implies accessed */ > > >> + hv_flags.set_accessed = 1; > > >> + } > > >> + break; > > >> + } > > >> + > > >> + states = vzalloc(states_buf_sz); > > >> + if (!states) > > >> + return -ENOMEM; > > >> + > > >> + ret = hv_call_get_gpa_access_states(partition->pt_id, args.page_count, > > >> + args.gpap_base, hv_flags, &written, > > >> + states); > > >> + if (ret) > > >> + goto free_return; > > >> + > > >> + /* > > >> + * Overwrite states buffer with bitmap - the bits in hv_type_mask > > >> + * correspond to bitfields in hv_gpa_page_access_state > > >> + */ > > >> + for (i = 0; i < written; ++i) > > >> + assign_bit(i, (ulong *)states, > > > > > > Why the cast to ulong *? I think this argument to assign_bit() is void *, in > > > which case the cast wouldn't be needed. > > > > > It looks like assign_bit() and friends resolve to a set of functions which do > > take an unsigned long pointer, e.g.: > > > > __set_bit() -> generic___set_bit(unsigned long nr, volatile unsigned long *addr) > > set_bit() -> arch_set_bit(unsigned int nr, volatile unsigned long *p) > > etc... > > > > So a cast is necessary. > > Indeed, you are right. Seems like set_bit() and friends should take a void *. > But that's a different kettle of fish. > > > > > > Also, assign_bit() does atomic bit operations. Doing such in a loop like > > > here will really hammer the hardware memory bus with atomic > > > read-modify-write cycles. Use __assign_bit() instead, which does > > > non-atomic operations. You don't need atomic here as no other > > > threads are modifying the bit array. > > > > > I didn't realize it was atomic. I'll change it to __assign_bit(). > > > > >> + states[i].as_uint8 & hv_type_mask); > > > > > > OK, so the starting contents of "states" is an array of bytes. The ending > > > contents is an array of bits. This works because every bit in the ending > > > bit array is set to either 0 or 1. Overlap occurs on the first iteration > > > where the code reads the 0th byte, and writes the 0th bit, which is part of > > > the 0th byte. The second iteration reads the 1st byte, and writes the 1st bit, > > > which doesn't overlap, and there's no overlap from then on. > > > > > > Suppose "written" is not a multiple of 8. The last byte of "states" as an > > > array of bits will have some bits that have not been set to either 0 or 1 and > > > might be leftover garbage from when "states" was an array of bytes. That > > > garbage will get copied to user space. Is that OK? Even if user space knows > > > enough to ignore those bits, it seems a little dubious to be copying even > > > a few bits of garbage to user space. > > > > > > Some comments might help here. > > > > > This is a good point. The expectation is indeed that userspace knows which > > bits are valid from the returned "written" value, but I agree it's a bit > > odd to have some garbage bits in the last byte. How does this look (to be > > inserted here directly after the loop): > > > > + /* zero the unused bits in the last byte of the returned bitmap */ > > + if (written > 0) { > > + u8 last_bits_mask; > > + int last_byte_idx; > > + int bits_rem = written % 8; > > + > > + /* bits_rem == 0 when all bits in the last byte were assigned */ > > + if (bits_rem > 0) { > > + /* written > 0 ensures last_byte_idx >= 0 */ > > + last_byte_idx = ((written + 7) / 8) - 1; > > + /* bits_rem > 0 ensures this masks 1 to 7 bits */ > > + last_bits_mask = (1 << bits_rem) - 1; > > + states[last_byte_idx].as_uint8 &= last_bits_mask; > > + } > > + } > > A simpler approach is to "continue" the previous loop. And if "written" > is zero, this additional loop won't do anything either: > > for (i = written; i < ALIGN(written, 8); ++i) > __clear_bit(i, (ulong *)states); > One further thought here: Could "written" be less than args.page_count at this point? That would require hv_call_get_gpa_access_states() to not fail, but still return a value for written that is less than args.page_count. If that could happen, then the above loop should be: for (i = written; i < bitmap_buf_sz * 8; ++i) __clear_bit(i, (ulong *)states); so that all the uninitialized bits and bytes that will be written back to user space are cleared. > > > > The remaining bytes could be memset() to zero but I think it's fine to leave > > them. > > I agree. The remaining bytes aren't written back to user space anyway > since the copy_to_user() uses bitmap_buf_sz. Maybe I misunderstood what you meant by "remaining bytes". I think all bits and bytes that are written back to user space should have valid data or zeros so that no garbage is written back. Michael > > > > > >> + > > >> + args.page_count = written; > > >> + > > >> + if (copy_to_user(user_args, &args, sizeof(args))) { > > >> + ret = -EFAULT; > > >> + goto free_return; > > >> + } > > >> + if (copy_to_user((void __user *)args.bitmap_ptr, states, bitmap_buf_sz)) > > >> + ret = -EFAULT; > > >> + > > >> +free_return: > > >> + vfree(states); > > >> + return ret; > > >> +}