Re: [PATCH v2 35/41] postcopy: introduce helper functions for postcopy

Isaku Yamahata <yamahata@xxxxxxxxxxxxx> · Sat, 16 Jun 2012 18:48:27 +0900

On Thu, Jun 14, 2012 at 11:34:09PM +0200, Juan Quintela wrote:
> Isaku Yamahata <yamahata@xxxxxxxxxxxxx> wrote:
> > +//#define DEBUG_UMEM
> > +#ifdef DEBUG_UMEM
> > +#include <sys/syscall.h>
> > +#define DPRINTF(format, ...)                                            \
> > +    do {                                                                \
> > +        printf("%d:%ld %s:%d "format, getpid(), syscall(SYS_gettid),    \
> > +               __func__, __LINE__, ## __VA_ARGS__);                     \
> > +    } while (0)
> 
> This should be in a header file that is linux specific?  And (at least
> on my systems) gettid is already defined on glibc.

I'll remove getpid/gettid. It was just for debugging in early phase.
They are not necessary any more.

> > +#else
> > +#define DPRINTF(format, ...)    do { } while (0)
> > +#endif
> 
> 
> > +
> > +#define DEV_UMEM        "/dev/umem"
> > +
> > +UMem *umem_new(void *hostp, size_t size)
> > +{
> > +    struct umem_init uinit = {
> > +        .size = size,
> > +    };
> > +    UMem *umem;
> > +
> > +    assert((size % getpagesize()) == 0);
> > +    umem = g_new(UMem, 1);
> > +    umem->fd = open(DEV_UMEM, O_RDWR);
> > +    if (umem->fd < 0) {
> > +        perror("can't open "DEV_UMEM);
> > +        abort();
> 
> Can we return one error insntead of abort?  the same for the rest of the
> file aborts.

Ok.

> > +size_t umem_pages_size(uint64_t nr)
> > +{
> > +    return sizeof(struct umem_pages) + nr * sizeof(uint64_t);
> 
> Can we make sure that the pgoffs field is aligned?  I know that as it is
> now it is aligned, but better to be sure?

It is already done by gcc extension, zero length array.

> > +}
> > +
> > +static void umem_write_cmd(int fd, uint8_t cmd)
> > +{
> > +    DPRINTF("write cmd %c\n", cmd);
> > +
> > +    for (;;) {
> > +        ssize_t ret = write(fd, &cmd, 1);
> > +        if (ret == -1) {
> > +            if (errno == EINTR) {
> > +                continue;
> > +            } else if (errno == EPIPE) {
> > +                perror("pipe");
> > +                DPRINTF("write cmd %c %zd %d: pipe is closed\n",
> > +                        cmd, ret, errno);
> > +                break;
> > +            }
> 
> 
> Grr, we don't have a function that writes does a "safe_write".  The most
> similar thing in qemu looks to be send_all().

So we should introduce something like qemu_safe_write/read?

> > +
> > +            perror("pipe");
> 
> Can we make a different perror() message than previous error?
> 
> > +            DPRINTF("write cmd %c %zd %d\n", cmd, ret, errno);
> > +            abort();
> > +        }
> > +
> > +        break;
> > +    }
> > +}
> > +
> > +static void umem_read_cmd(int fd, uint8_t expect)
> > +{
> > +    uint8_t cmd;
> > +    for (;;) {
> > +        ssize_t ret = read(fd, &cmd, 1);
> > +        if (ret == -1) {
> > +            if (errno == EINTR) {
> > +                continue;
> > +            }
> > +            perror("pipe");
> > +            DPRINTF("read error cmd %c %zd %d\n", cmd, ret, errno);
> > +            abort();
> > +        }
> > +
> > +        if (ret == 0) {
> > +            DPRINTF("read cmd %c %zd: pipe is closed\n", cmd, ret);
> > +            abort();
> > +        }
> > +
> > +        break;
> > +    }
> > +
> > +    DPRINTF("read cmd %c\n", cmd);
> > +    if (cmd != expect) {
> > +        DPRINTF("cmd %c expect %d\n", cmd, expect);
> > +        abort();
> 
> Ouch.  If we receive garbage, we just exit?
> 
> I really think that we should implement error handling.
> 
> > +    }
> > +}
> > +
> > +struct umem_pages *umem_recv_pages(QEMUFile *f, int *offset)
> > +{
> > +    int ret;
> > +    uint64_t nr;
> > +    size_t size;
> > +    struct umem_pages *pages;
> > +
> > +    ret = qemu_peek_buffer(f, (uint8_t*)&nr, sizeof(nr), *offset);
> > +    *offset += sizeof(nr);
> > +    DPRINTF("ret %d nr %ld\n", ret, nr);
> > +    if (ret != sizeof(nr) || nr == 0) {
> > +        return NULL;
> > +    }
> > +
> > +    size = umem_pages_size(nr);
> > +    pages = g_malloc(size);
> 
> Just thinking about this.  Couldn't we just decide on a "big enough"
> buffer, and never send anything bigger than that?  That would remove the
> need to have to malloc()/free() a buffer for each reception?

Will try to address it.

> > +/* qemu side handler */
> > +struct umem_pages *umem_qemu_trigger_page_fault(QEMUFile *from_umemd,
> > +                                                int *offset)
> > +{
> > +    uint64_t i;
> > +    int page_shift = ffs(getpagesize()) - 1;
> > +    struct umem_pages *pages = umem_recv_pages(from_umemd, offset);
> > +    if (pages == NULL) {
> > +        return NULL;
> > +    }
> > +
> > +    for (i = 0; i < pages->nr; i++) {
> > +        ram_addr_t addr = pages->pgoffs[i] << page_shift;
> > +
> > +        /* make pages present by forcibly triggering page fault. */
> > +        volatile uint8_t *ram = qemu_get_ram_ptr(addr);
> > +        uint8_t dummy_read = ram[0];
> > +        (void)dummy_read;   /* suppress unused variable warning */
> > +    }
> > +
> > +    /*
> > +     * Very Linux implementation specific.
> > +     * Make it sure that other thread doesn't fault on the above virtual
> > +     * address. (More exactly other thread doesn't call fault handler with
> > +     * the offset.)
> > +     * the fault handler is called with mmap_sem read locked.
> > +     * madvise() does down/up_write(mmap_sem)
> > +     */
> > +    qemu_madvise(NULL, 0, MADV_NORMAL);
> 
> If it is linux specific, should be inside CONFIG_LINUX ifdef, or a
> function hided on some header.

Good idea.

> Talking about looking, what protects that no other thread enters this
> function before this one calls madvise?   Or I am losing something obvious?

It is assumed that only main thread calls this function via iohandler.

> > +
> > +struct umem_pages {
> > +    uint64_t nr;
> > +    uint64_t pgoffs[0];
> > +};
> > +
> 
> QEMU really likes typedefs for structs.
> 
> Later, Juan.
> 

-- 
yamahata
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html