Re: [Qemu-devel] [PATCH] ceph/rbd block driver for qemu-kvm (v4)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 7, 2010 at 7:12 AM, Anthony Liguori <anthony@xxxxxxxxxxxxx> wrote:
> On 08/03/2010 03:14 PM, Christian Brunner wrote:
>>
>> +#include "qemu-common.h"
>> +#include "qemu-error.h"
>> +#include<sys/types.h>
>> +#include<stdbool.h>
>> +
>> +#include<qemu-common.h>
>>
>
> This looks to be unnecessary.  Generally, system includes shouldn't be
> required so all of these should go away except rado/librados.h
Removed.

>
>> +
>> +#include "rbd_types.h"
>> +#include "module.h"
>> +#include "block_int.h"
>> +
>> +#include<stdio.h>
>> +#include<stdlib.h>
>> +#include<rados/librados.h>
>> +
>> +#include<signal.h>
>> +
>> +
>> +int eventfd(unsigned int initval, int flags);
>>
>
> This is not quite right.  Depending on eventfd is curious but in the very
> least, you need to detect the presence of eventfd in configure and provide a
> wrapper that redefines it as necessary.

Can fix that, though please see my later remarks.
>> +static int create_tmap_op(uint8_t op, const char *name, char **tmap_desc)
>> +{
>> +    uint32_t len = strlen(name);
>> +    /* total_len = encoding op + name + empty buffer */
>> +    uint32_t total_len = 1 + (sizeof(uint32_t) + len) + sizeof(uint32_t);
>> +    char *desc = NULL;
>>
>
> char is the wrong type to use here as it may be signed or unsigned.  That
> can have weird effects with binary data when you're directly manipulating
> it.
Well, I can change it to uint8_t, so that it matches the op type, but
that'll require adding some other castings. In any case, you usually
get such a weird behavior when you cast to types of different sizes
and have the sign bit padded which is not the case in here.

>
>> +
>> +    desc = qemu_malloc(total_len);
>> +
>> +    *tmap_desc = desc;
>> +
>> +    *desc = op;
>> +    desc++;
>> +    memcpy(desc,&len, sizeof(len));
>> +    desc += sizeof(len);
>> +    memcpy(desc, name, len);
>> +    desc += len;
>> +    len = 0;
>> +    memcpy(desc,&len, sizeof(len));
>> +    desc += sizeof(len);
>>
>
> Shouldn't endianness be a concern?
Right. Fixed that.

>
>> +
>> +    return desc - *tmap_desc;
>> +}
>> +
>> +static void free_tmap_op(char *tmap_desc)
>> +{
>> +    qemu_free(tmap_desc);
>> +}
>> +
>> +static int rbd_register_image(rados_pool_t pool, const char *name)
>> +{
>> +    char *tmap_desc;
>> +    const char *dir = RBD_DIRECTORY;
>> +    int ret;
>> +
>> +    ret = create_tmap_op(CEPH_OSD_TMAP_SET, name,&tmap_desc);
>> +    if (ret<  0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = rados_tmap_update(pool, dir, tmap_desc, ret);
>> +    free_tmap_op(tmap_desc);
>> +
>> +    return ret;
>> +}
>>
>
> This ops are all synchronous?  IOW, rados_tmap_update() call blocks until
> the operation is completed?

Yeah. And this is only called from the rbd_create() callback.

>> +            header_snap += strlen(header_snap) + 1;
>> +            if (header_snap>  end)
>> +                error_report("bad header, snapshot list broken");
>>
>
> Missing curly braces here.
Fixed.

>> +    if (strncmp(hbuf + 68, RBD_HEADER_VERSION, 8)) {
>> +        error_report("Unknown image version %s", hbuf + 68);
>> +        r = -EMEDIUMTYPE;
>> +        goto failed;
>> +    }
>> +
>> +    RbdHeader1 *header;
>>
>>
>
> Don't mix variable definitions with code.

Fixed.

>> +    s->efd = eventfd(0, 0);
>> +    if (s->efd<  0) {
>> +        error_report("error opening eventfd");
>> +        goto failed;
>> +    }
>> +    fcntl(s->efd, F_SETFL, O_NONBLOCK);
>> +    qemu_aio_set_fd_handler(s->efd, rbd_aio_completion_cb, NULL,
>> +        rbd_aio_flush_cb, NULL, s);
>>
>
> It looks like you just use the eventfd to signal aio completion callbacks.
>  A better way to do this would be to schedule a bottom half.  eventfds are
> Linux specific and specific to recent kernels.

Digging back why we introduced the eventfd, it was due to some issues
seen with do_savevm() hangs on qemu_aio_flush(). The reason seemed
that we had no fd associated with the block device, which seemed to
not work well with the qemu aio model. If that assumption is wrong,
we'd be happy to change it. In any case, there are other more portable
ways to generate fds, so if it's needed we can do that.

>> +static int rbd_rw(BlockDriverState *bs, int64_t sector_num,
>> +                  uint8_t *buf, int nb_sectors, int write)
>> +{
>> +    BDRVRBDState *s = bs->opaque;
>> +    char n[RBD_MAX_SEG_NAME_SIZE];
>> +
>
> You don't need to implement synchronous functions as long as you have the
> async interfaces implemented.
Snipped.

>> +     */
>> +    if (sn_info->id_str[0] != '\0'&&
>> +        strcmp(sn_info->id_str, sn_info->name) != 0)
>> +        return -EINVAL;
>>
>
> I don't fully understand.  Does this mean that snapshots are stored in a
> shared namespace?  IOW, if a user root creates a snapshot of in one VM, the
> other VM running as root sees it too?
>

Snapshots are stored in a namespace for each block device. If you
share a block device between different vms, you'll also share its
snapshots.


Thanks,
Yehuda
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux