On 10/14/2015 05:40 PM, Stefan Hajnoczi wrote:
On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
static void dsm_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
{
+ NVDIMMState *state = opaque;
+ MemoryRegion *dsm_ram_mr;
+ dsm_in *in;
+ dsm_out *out;
+ uint32_t revision, function, handle;
+
if (val != NOTIFY_VALUE) {
fprintf(stderr, "BUG: unexepected notify value 0x%" PRIx64, val);
}
+
+ dsm_ram_mr = memory_region_find(&state->mr, state->page_size,
+ state->page_size).mr;
+ memory_region_unref(dsm_ram_mr);
+ in = memory_region_get_ram_ptr(dsm_ram_mr);
This looks suspicious. Shouldn't the memory_region_unref(dsm_ram_mr)
happen after we're done using it?
This region is keep-alive during QEMU's running, it is okay. The
same style is applied to other codes, for example:
line 208 in hw/s390x/sclp.c.
+ out = (dsm_out *)in;
+
+ revision = in->arg1;
+ function = in->arg2;
+ handle = in->handle;
+ le32_to_cpus(&revision);
+ le32_to_cpus(&function);
+ le32_to_cpus(&handle);
+
+ nvdebug("UUID " UUID_FMT ".\n", in->arg0[0], in->arg0[1], in->arg0[2],
+ in->arg0[3], in->arg0[4], in->arg0[5], in->arg0[6],
+ in->arg0[7], in->arg0[8], in->arg0[9], in->arg0[10],
+ in->arg0[11], in->arg0[12], in->arg0[13], in->arg0[14],
+ in->arg0[15]);
+ nvdebug("Revision %#x Function %#x Handler %#x.\n", revision, function,
+ handle);
+
+ if (revision != DSM_REVISION) {
+ nvdebug("Revision %#x is not supported, expect %#x.\n",
+ revision, DSM_REVISION);
+ goto exit;
+ }
+
+ if (!handle) {
+ if (!dsm_is_root_uuid(in->arg0)) {
Please don't dereference 'in' or pass it to other functions. Avoid race
conditions with guest vcpus by coping in the entire dsm_in struct.
This is like a system call - the kernel cannot trust userspace memory
and must copy in before accessing data. The same rules apply.
It's little different for QEMU:
- the memory address is always valid to QEMU, it's not always true for Kernel
due to context-switch
- we have checked the header before use it's data, for example, when we get
data from GET_NAMESPACE_DATA, we have got the @offset and @length from the
memory, then copy memory based on these values, that means the userspace
has no chance to cause buffer overflow by increasing these values at runtime.
The scenario for our case is simple but Kernel is difficult to do
check_all_before_use as many paths may be involved.
- guest changes some data is okay, the worst case is that the label data is
corrupted. This is caused by guest itself. Kernel also supports this kind
of behaviour, e,g. network TX zero copy, the userspace page is being
transferred while userspace can still access it.
- it's 4K size on x86, full copy wastes CPU time too much.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html