On Thu, Jun 14, 2012 at 11:34:09PM +0200, Juan Quintela wrote: > Isaku Yamahata <yamahata@xxxxxxxxxxxxx> wrote: > > +//#define DEBUG_UMEM > > +#ifdef DEBUG_UMEM > > +#include <sys/syscall.h> > > +#define DPRINTF(format, ...) \ > > + do { \ > > + printf("%d:%ld %s:%d "format, getpid(), syscall(SYS_gettid), \ > > + __func__, __LINE__, ## __VA_ARGS__); \ > > + } while (0) > > This should be in a header file that is linux specific? And (at least > on my systems) gettid is already defined on glibc. I'll remove getpid/gettid. It was just for debugging in early phase. They are not necessary any more. > > +#else > > +#define DPRINTF(format, ...) do { } while (0) > > +#endif > > > > + > > +#define DEV_UMEM "/dev/umem" > > + > > +UMem *umem_new(void *hostp, size_t size) > > +{ > > + struct umem_init uinit = { > > + .size = size, > > + }; > > + UMem *umem; > > + > > + assert((size % getpagesize()) == 0); > > + umem = g_new(UMem, 1); > > + umem->fd = open(DEV_UMEM, O_RDWR); > > + if (umem->fd < 0) { > > + perror("can't open "DEV_UMEM); > > + abort(); > > Can we return one error insntead of abort? the same for the rest of the > file aborts. Ok. > > +size_t umem_pages_size(uint64_t nr) > > +{ > > + return sizeof(struct umem_pages) + nr * sizeof(uint64_t); > > Can we make sure that the pgoffs field is aligned? I know that as it is > now it is aligned, but better to be sure? It is already done by gcc extension, zero length array. > > +} > > + > > +static void umem_write_cmd(int fd, uint8_t cmd) > > +{ > > + DPRINTF("write cmd %c\n", cmd); > > + > > + for (;;) { > > + ssize_t ret = write(fd, &cmd, 1); > > + if (ret == -1) { > > + if (errno == EINTR) { > > + continue; > > + } else if (errno == EPIPE) { > > + perror("pipe"); > > + DPRINTF("write cmd %c %zd %d: pipe is closed\n", > > + cmd, ret, errno); > > + break; > > + } > > > Grr, we don't have a function that writes does a "safe_write". The most > similar thing in qemu looks to be send_all(). So we should introduce something like qemu_safe_write/read? > > + > > + perror("pipe"); > > Can we make a different perror() message than previous error? > > > + DPRINTF("write cmd %c %zd %d\n", cmd, ret, errno); > > + abort(); > > + } > > + > > + break; > > + } > > +} > > + > > +static void umem_read_cmd(int fd, uint8_t expect) > > +{ > > + uint8_t cmd; > > + for (;;) { > > + ssize_t ret = read(fd, &cmd, 1); > > + if (ret == -1) { > > + if (errno == EINTR) { > > + continue; > > + } > > + perror("pipe"); > > + DPRINTF("read error cmd %c %zd %d\n", cmd, ret, errno); > > + abort(); > > + } > > + > > + if (ret == 0) { > > + DPRINTF("read cmd %c %zd: pipe is closed\n", cmd, ret); > > + abort(); > > + } > > + > > + break; > > + } > > + > > + DPRINTF("read cmd %c\n", cmd); > > + if (cmd != expect) { > > + DPRINTF("cmd %c expect %d\n", cmd, expect); > > + abort(); > > Ouch. If we receive garbage, we just exit? > > I really think that we should implement error handling. > > > + } > > +} > > + > > +struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset) > > +{ > > + int ret; > > + uint64_t nr; > > + size_t size; > > + struct umem_pages *pages; > > + > > + ret = qemu_peek_buffer(f, (uint8_t*)&nr, sizeof(nr), *offset); > > + *offset += sizeof(nr); > > + DPRINTF("ret %d nr %ld\n", ret, nr); > > + if (ret != sizeof(nr) || nr == 0) { > > + return NULL; > > + } > > + > > + size = umem_pages_size(nr); > > + pages = g_malloc(size); > > Just thinking about this. Couldn't we just decide on a "big enough" > buffer, and never send anything bigger than that? That would remove the > need to have to malloc()/free() a buffer for each reception? Will try to address it. > > +/* qemu side handler */ > > +struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd, > > + int *offset) > > +{ > > + uint64_t i; > > + int page_shift = ffs(getpagesize()) - 1; > > + struct umem_pages *pages = umem_recv_pages(from_umemd, offset); > > + if (pages == NULL) { > > + return NULL; > > + } > > + > > + for (i = 0; i < pages->nr; i++) { > > + ram_addr_t addr = pages->pgoffs[i] << page_shift; > > + > > + /* make pages present by forcibly triggering page fault. */ > > + volatile uint8_t *ram = qemu_get_ram_ptr(addr); > > + uint8_t dummy_read = ram[0]; > > + (void)dummy_read; /* suppress unused variable warning */ > > + } > > + > > + /* > > + * Very Linux implementation specific. > > + * Make it sure that other thread doesn't fault on the above virtual > > + * address. (More exactly other thread doesn't call fault handler with > > + * the offset.) > > + * the fault handler is called with mmap_sem read locked. > > + * madvise() does down/up_write(mmap_sem) > > + */ > > + qemu_madvise(NULL, 0, MADV_NORMAL); > > If it is linux specific, should be inside CONFIG_LINUX ifdef, or a > function hided on some header. Good idea. > Talking about looking, what protects that no other thread enters this > function before this one calls madvise? Or I am losing something obvious? It is assumed that only main thread calls this function via iohandler. > > + > > +struct umem_pages { > > + uint64_t nr; > > + uint64_t pgoffs[0]; > > +}; > > + > > QEMU really likes typedefs for structs. > > Later, Juan. > -- yamahata -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html