Re: [PATCH v3 27/32] nvdimm: support DSM_CMD_IMPLEMENTED function

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 10/14/2015 05:40 PM, Stefan Hajnoczi wrote:
On Sun, Oct 11, 2015 at 11:52:59AM +0800, Xiao Guangrong wrote:
  static void dsm_write(void *opaque, hwaddr addr,
                        uint64_t val, unsigned size)
  {
+    NVDIMMState *state = opaque;
+    MemoryRegion *dsm_ram_mr;
+    dsm_in *in;
+    dsm_out *out;
+    uint32_t revision, function, handle;
+
      if (val != NOTIFY_VALUE) {
          fprintf(stderr, "BUG: unexepected notify value 0x%" PRIx64, val);
      }
+
+    dsm_ram_mr = memory_region_find(&state->mr, state->page_size,
+                                    state->page_size).mr;
+    memory_region_unref(dsm_ram_mr);
+    in = memory_region_get_ram_ptr(dsm_ram_mr);

This looks suspicious.  Shouldn't the memory_region_unref(dsm_ram_mr)
happen after we're done using it?

This region is keep-alive during QEMU's running, it is okay. The
same style is applied to other codes, for example:
line 208 in hw/s390x/sclp.c.


+    out = (dsm_out *)in;
+
+    revision = in->arg1;
+    function = in->arg2;
+    handle = in->handle;
+    le32_to_cpus(&revision);
+    le32_to_cpus(&function);
+    le32_to_cpus(&handle);
+
+    nvdebug("UUID " UUID_FMT ".\n", in->arg0[0], in->arg0[1], in->arg0[2],
+            in->arg0[3], in->arg0[4], in->arg0[5], in->arg0[6],
+            in->arg0[7], in->arg0[8], in->arg0[9], in->arg0[10],
+            in->arg0[11], in->arg0[12], in->arg0[13], in->arg0[14],
+            in->arg0[15]);
+    nvdebug("Revision %#x Function %#x Handler %#x.\n", revision, function,
+            handle);
+
+    if (revision != DSM_REVISION) {
+        nvdebug("Revision %#x is not supported, expect %#x.\n",
+                revision, DSM_REVISION);
+        goto exit;
+    }
+
+    if (!handle) {
+        if (!dsm_is_root_uuid(in->arg0)) {

Please don't dereference 'in' or pass it to other functions.  Avoid race
conditions with guest vcpus by coping in the entire dsm_in struct.

This is like a system call - the kernel cannot trust userspace memory
and must copy in before accessing data.  The same rules apply.


It's little different for QEMU:
- the memory address is always valid to QEMU, it's not always true for Kernel
  due to context-switch

- we have checked the header before use it's data, for example, when we get
  data from GET_NAMESPACE_DATA, we have got the @offset and @length from the
  memory, then copy memory based on these values, that means the userspace
  has no chance to cause buffer overflow by increasing these values at runtime.

  The scenario for our case is simple but Kernel is difficult to do
  check_all_before_use as many paths may be involved.

- guest changes some data is okay, the worst case is that the label data is
  corrupted. This is caused by guest itself. Kernel also supports this kind
  of behaviour, e,g. network TX zero copy, the userspace page is being
  transferred while userspace can still access it.

- it's 4K size on x86, full copy wastes CPU time too much.






--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux