On Thu, Jun 16, 2011 at 5:48 PM, Corey Bryant <bryntcor@xxxxxxxxxx> wrote: > > > On 06/15/2011 03:12 PM, Blue Swirl wrote: >> >> On Tue, Jun 14, 2011 at 4:31 PM, Corey Bryant<bryntcor@xxxxxxxxxx> wrote: >>> >>> > sVirt provides SELinux MAC isolation for Qemu guest processes and >>> > their >>> > corresponding resources (image files). sVirt provides this support >>> > by labeling guests and resources with security labels that are stored >>> > in file system extended attributes. Some file systems, such as NFS, do >>> > not support the extended attribute security namespace, which is needed >>> > for image file isolation when using the sVirt SELinux security driver >>> > in libvirt. >>> > >>> > The proposed solution entails a combination of Qemu, libvirt, and >>> > SELinux patches that work together to isolate multiple guests' images >>> > when they're stored in the same NFS mount. This results in an >>> > environment where sVirt isolation and NFS image file isolation can >>> > both >>> > be provided. >>> > >>> > This patch contains the Qemu code to support this solution. I would >>> > like to solicit input from the libvirt community prior to starting >>> > the libvirt patch. >>> > >>> > Currently, Qemu opens an image file in addition to performing the >>> > necessary read and write operations. The proposed solution will move >>> > the open out of Qemu and into libvirt. Once libvirt opens an image >>> > file for the guest, it will pass the file descriptor to Qemu via a >>> > new fd: protocol. >>> > >>> > If the image file resides in an NFS mount, the following SELinux >>> > policy >>> > changes will provide image isolation: >>> > >>> > - A new SELinux boolean is created (e.g. virt_read_write_nfs) to >>> > allow Qemu (svirt_t) to only have SELinux read and write >>> > permissions on nfs_t files >>> > >>> > - Qemu (svirt_t) also gets SELinux use permissions on libvirt >>> > (virtd_t) file descriptors >>> > >>> > Following is a sample invocation of Qemu using the fd: protocol on >>> > the command line: >>> > >>> > qemu -drive file=fd:4,format=qcow2 >>> > >>> > The fd: protocol is also supported by the drive_add monitor command. >>> > This requires that the specified file descriptor is passed to the >>> > monitor alongside a prior getfd monitor command. >>> > >>> > There are some additional features provided by certain image types >>> > where Qemu reopens the image file. All of these scenarios will be >>> > unsupported for the fd: protocol, at least for this patch: >>> > >>> > - The -snapshot command line option >>> > - The savevm monitor command >>> > - The snapshot_blkdev monitor command >>> > - Starting Qemu with a backing file >> >> There's also native CDROM device. Did you consider adding an explicit >> reopen method to block layer? > > Thanks. Yes it looks like I overlooked CDROM reopens. > > I'm not sure that I'm clear on the purpose of the reopen function. Would the > goal be to funnel all block layer reopens through a single function, > enabling potential future support where a privileged layer of Qemu, or > libvirt, performs the open? Eventually yes, but I think it would help also now by moving the checks to a single place. It's a bit orthogonal to this patch though. >>> > The thought is that this support can be added in the future, but is >>> > not required for the initial fd: support. >>> > >>> > This patch was tested with the following formats: raw, cow, qcow, >>> > qcow2, qed, and vmdk, using the fd: protocol from the command line >>> > and the monitor. Tests were also run to verify existing file name >>> > support and qemu-img were not regressed. Non-valid file descriptors, >>> > fd: without format, snapshot and backing files were also tested. >>> > >>> > Signed-off-by: Corey Bryant<coreyb@xxxxxxxxxxxxxxxxxx> >>> > --- >>> > block.c | 16 ++++++++++ >>> > block.h | 1 + >>> > block/cow.c | 5 +++ >>> > block/qcow.c | 5 +++ >>> > block/qcow2.c | 5 +++ >>> > block/qed.c | 4 ++ >>> > block/raw-posix.c | 81 >>> > +++++++++++++++++++++++++++++++++++++++++++++++------ >>> > block/vmdk.c | 5 +++ >>> > block_int.h | 1 + >>> > blockdev.c | 10 ++++++ >>> > monitor.c | 5 +++ >>> > monitor.h | 1 + >>> > qemu-options.hx | 8 +++-- >>> > qemu-tool.c | 5 +++ >>> > 14 files changed, 140 insertions(+), 12 deletions(-) >>> > >>> > diff --git a/block.c b/block.c >>> > index 24a25d5..500db84 100644 >>> > --- a/block.c >>> > +++ b/block.c >>> > @@ -536,6 +536,10 @@ int bdrv_open(BlockDriverState *bs, const char >>> > *filename, int flags, >>> > char tmp_filename[PATH_MAX]; >>> > char backing_filename[PATH_MAX]; >>> > >>> > + if (bdrv_is_fd_protocol(bs)) { >>> > + return -ENOTSUP; >>> > + } >>> > + >>> > /* if snapshot, we create a temporary backing file and open >>> > it >>> > instead of opening 'filename' directly */ >>> > >>> > @@ -585,6 +589,10 @@ int bdrv_open(BlockDriverState *bs, const char >>> > *filename, int flags, >>> > >>> > /* Find the right image format driver */ >>> > if (!drv) { >>> > + /* format must be specified for fd: protocol */ >>> > + if (bdrv_is_fd_protocol(bs)) { >>> > + return -ENOTSUP; >>> > + } >>> > ret = find_image_format(filename,&drv); >>> > } >>> > >>> > @@ -1460,6 +1468,11 @@ int bdrv_enable_write_cache(BlockDriverState >>> > *bs) >>> > return bs->enable_write_cache; >>> > } >>> > >>> > +int bdrv_is_fd_protocol(BlockDriverState *bs) >>> > +{ >>> > + return bs->fd_protocol; >>> > +} >>> > + >>> > /* XXX: no longer used */ >>> > void bdrv_set_change_cb(BlockDriverState *bs, >>> > void (*change_cb)(void *opaque, int reason), >>> > @@ -1964,6 +1977,9 @@ int bdrv_snapshot_create(BlockDriverState *bs, >>> > BlockDriver *drv = bs->drv; >>> > if (!drv) >>> > return -ENOMEDIUM; >>> > + if (bdrv_is_fd_protocol(bs)) { >>> > + return -ENOTSUP; >>> > + } >>> > if (drv->bdrv_snapshot_create) >>> > return drv->bdrv_snapshot_create(bs, sn_info); >>> > if (bs->file) >>> > diff --git a/block.h b/block.h >>> > index da7d39c..dd46d52 100644 >>> > --- a/block.h >>> > +++ b/block.h >>> > @@ -182,6 +182,7 @@ int bdrv_is_removable(BlockDriverState *bs); >>> > int bdrv_is_read_only(BlockDriverState *bs); >>> > int bdrv_is_sg(BlockDriverState *bs); >>> > int bdrv_enable_write_cache(BlockDriverState *bs); >>> > +int bdrv_is_fd_protocol(BlockDriverState *bs); >>> > int bdrv_is_inserted(BlockDriverState *bs); >>> > int bdrv_media_changed(BlockDriverState *bs); >>> > int bdrv_is_locked(BlockDriverState *bs); >>> > diff --git a/block/cow.c b/block/cow.c >>> > index 4cf543c..e17f8e7 100644 >>> > --- a/block/cow.c >>> > +++ b/block/cow.c >>> > @@ -82,6 +82,11 @@ static int cow_open(BlockDriverState *bs, int >>> > flags) >>> > pstrcpy(bs->backing_file, sizeof(bs->backing_file), >>> > cow_header.backing_file); >>> > >>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >>> > + /* backing file currently not supported by fd: protocol */ >>> > + goto fail; >>> > + } >>> > + >>> > bitmap_size = ((bs->total_sectors + 7)>> 3) + >>> > sizeof(cow_header); >>> > s->cow_sectors_offset = (bitmap_size + 511)& ~511; >>> > return 0; >>> > diff --git a/block/qcow.c b/block/qcow.c >>> > index a26c886..1e46bdb 100644 >>> > --- a/block/qcow.c >>> > +++ b/block/qcow.c >>> > @@ -157,6 +157,11 @@ static int qcow_open(BlockDriverState *bs, int >>> > flags) >>> > if (bdrv_pread(bs->file, header.backing_file_offset, >>> > bs->backing_file, len) != len) >>> > goto fail; >>> > bs->backing_file[len] = '\0'; >>> > + >>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >>> > + /* backing file currently not supported by fd: protocol >>> > */ >>> > + goto fail; >>> > + } >>> > } >>> > return 0; >>> > >>> > diff --git a/block/qcow2.c b/block/qcow2.c >>> > index 8451ded..8b9c160 100644 >>> > --- a/block/qcow2.c >>> > +++ b/block/qcow2.c >>> > @@ -270,6 +270,11 @@ static int qcow2_open(BlockDriverState *bs, int >>> > flags) >>> > goto fail; >>> > } >>> > bs->backing_file[len] = '\0'; >>> > + >>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >>> > + ret = -ENOTSUP; >>> > + goto fail; >>> > + } >>> > } >>> > if (qcow2_read_snapshots(bs)< 0) { >>> > ret = -EINVAL; >>> > diff --git a/block/qed.c b/block/qed.c >>> > index 3970379..5028897 100644 >>> > --- a/block/qed.c >>> > +++ b/block/qed.c >>> > @@ -446,6 +446,10 @@ static int bdrv_qed_open(BlockDriverState *bs, >>> > int flags) >>> > return ret; >>> > } >>> > >>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >>> > + return -ENOTSUP; >>> > + } >>> > + >>> > if (s->header.features& QED_F_BACKING_FORMAT_NO_PROBE) { >>> > pstrcpy(bs->backing_format, sizeof(bs->backing_format), >>> > "raw"); >>> > } >>> > diff --git a/block/raw-posix.c b/block/raw-posix.c >>> > index 4cd7d7a..c72de3d 100644 >>> > --- a/block/raw-posix.c >>> > +++ b/block/raw-posix.c >>> > @@ -28,6 +28,7 @@ >>> > #include "block_int.h" >>> > #include "module.h" >>> > #include "block/raw-posix-aio.h" >>> > +#include "monitor.h" >>> > >>> > #ifdef CONFIG_COCOA >>> > #include<paths.h> >>> > @@ -183,7 +184,8 @@ static int raw_open_common(BlockDriverState *bs, >>> > const char *filename, >>> > int bdrv_flags, int open_flags) >>> > { >>> > BDRVRawState *s = bs->opaque; >>> > - int fd, ret; >>> > + int fd = -1; >>> > + int ret; >>> > >>> > ret = raw_normalize_devicepath(&filename); >>> > if (ret != 0) { >>> > @@ -205,15 +207,17 @@ static int raw_open_common(BlockDriverState *bs, >>> > const char *filename, >>> > if (!(bdrv_flags& BDRV_O_CACHE_WB)) >>> > s->open_flags |= O_DSYNC; >>> > >>> > - s->fd = -1; >>> > - fd = qemu_open(filename, s->open_flags, 0644); >>> > - if (fd< 0) { >>> > - ret = -errno; >>> > - if (ret == -EROFS) >>> > - ret = -EACCES; >>> > - return ret; >>> > + if (s->fd == -1) { >>> > + fd = qemu_open(filename, s->open_flags, 0644); >>> > + if (fd< 0) { >>> > + ret = -errno; >>> > + if (ret == -EROFS) { >>> > + ret = -EACCES; >>> > + } >>> > + return ret; >>> > + } >>> > + s->fd = fd; >>> > } >>> > - s->fd = fd; >>> > s->aligned_buf = NULL; >>> > >>> > if ((bdrv_flags& BDRV_O_NOCACHE)) { >>> > @@ -270,6 +274,7 @@ static int raw_open(BlockDriverState *bs, const >>> > char *filename, int flags) >>> > { >>> > BDRVRawState *s = bs->opaque; >>> > >>> > + s->fd = -1; >>> > s->type = FTYPE_FILE; >>> > return raw_open_common(bs, filename, flags, 0); >>> > } >>> > @@ -890,6 +895,60 @@ static BlockDriver bdrv_file = { >>> > .create_options = raw_create_options, >>> > }; >>> > >>> > +static int raw_open_fd(BlockDriverState *bs, const char *filename, >>> > int flags) >>> > +{ >>> > + BDRVRawState *s = bs->opaque; >>> > + const char *fd_str; >>> > + int fd; >>> > + >>> > + /* extract the file descriptor - fail if it's not fd: */ >>> > + if (!strstart(filename, "fd:",&fd_str)) { >>> > + return -EINVAL; >>> > + } >>> > + >>> > + if (!qemu_isdigit(fd_str[0])) { >>> > + /* get fd from monitor */ >>> > + fd = qemu_get_fd(fd_str); >>> > + if (fd == -1) { >>> > + return -EBADF; >>> > + } >>> > + } else { >>> > + char *endptr = NULL; >>> > + >>> > + fd = strtol(fd_str,&endptr, 10); >>> > + if (*endptr || (fd == 0&& fd_str == endptr)) { >>> > + return -EBADF; >>> > + } >>> > + } >>> > + >>> > + s->fd = fd; >>> > + s->type = FTYPE_FILE; >>> > + >>> > + return raw_open_common(bs, filename, flags, 0); >>> > +} >>> > + >>> > +static BlockDriver bdrv_file_fd = { >>> > + .format_name = "file", >>> > + .protocol_name = "fd", >>> > + .instance_size = sizeof(BDRVRawState), >>> > + .bdrv_probe = NULL, /* no probe for protocols */ >>> > + .bdrv_file_open = raw_open_fd, >>> > + .bdrv_read = raw_read, >>> > + .bdrv_write = raw_write, >>> > + .bdrv_close = raw_close, >>> > + .bdrv_flush = raw_flush, >>> > + .bdrv_discard = raw_discard, >>> > + >>> > + .bdrv_aio_readv = raw_aio_readv, >>> > + .bdrv_aio_writev = raw_aio_writev, >>> > + .bdrv_aio_flush = raw_aio_flush, >>> > + >>> > + .bdrv_truncate = raw_truncate, >>> > + .bdrv_getlength = raw_getlength, >>> > + >>> > + .create_options = raw_create_options, >>> > +}; >>> > + >>> > /***********************************************/ >>> > /* host device */ >>> > >>> > @@ -998,6 +1057,7 @@ static int hdev_open(BlockDriverState *bs, const >>> > char *filename, int flags) >>> > } >>> > #endif >>> > >>> > + s->fd = -1; >>> > s->type = FTYPE_FILE; >>> > #if defined(__linux__) >>> > { >>> > @@ -1168,6 +1228,7 @@ static int floppy_open(BlockDriverState *bs, >>> > const char *filename, int flags) >>> > BDRVRawState *s = bs->opaque; >>> > int ret; >>> > >>> > + s->fd = -1; >>> > s->type = FTYPE_FD; >>> > >>> > /* open will not fail even if no floppy is inserted, so add >>> > O_NONBLOCK */ >>> > @@ -1280,6 +1341,7 @@ static int cdrom_open(BlockDriverState *bs, >>> > const char *filename, int flags) >>> > { >>> > BDRVRawState *s = bs->opaque; >>> > >>> > + s->fd = -1; >>> > s->type = FTYPE_CD; >>> > >>> > /* open will not fail even if no CD is inserted, so add >>> > O_NONBLOCK */ >>> > @@ -1503,6 +1565,7 @@ static void bdrv_file_init(void) >>> > * Register all the drivers. Note that order is important, the >>> > driver >>> > * registered last will get probed first. >>> > */ >>> > + bdrv_register(&bdrv_file_fd); >>> > bdrv_register(&bdrv_file); >>> > bdrv_register(&bdrv_host_device); >>> > #ifdef __linux__ >>> > diff --git a/block/vmdk.c b/block/vmdk.c >>> > index 922b23d..2ea808e 100644 >>> > --- a/block/vmdk.c >>> > +++ b/block/vmdk.c >>> > @@ -353,6 +353,11 @@ static int vmdk_parent_open(BlockDriverState *bs) >>> > return -1; >>> > >>> > pstrcpy(bs->backing_file, end_name - p_name + 1, p_name); >>> > + >>> > + if (bs->backing_file[0] != '\0'&& bdrv_is_fd_protocol(bs)) { >>> > + /* backing file currently not supported by fd: protocol >>> > */ >>> > + return -1; >>> > + } >>> > } >>> > >>> > return 0; >>> > diff --git a/block_int.h b/block_int.h >>> > index fa91337..a305ee2 100644 >>> > --- a/block_int.h >>> > +++ b/block_int.h >>> > @@ -152,6 +152,7 @@ struct BlockDriverState { >>> > int encrypted; /* if true, the media is encrypted */ >>> > int valid_key; /* if true, a valid encryption key has been set */ >>> > int sg; /* if true, the device is a /dev/sg* */ >>> > + int fd_protocol; /* if true, the fd: protocol was specified */ >> >> bool? >> > I was following suit here, but I agree that bool would be better. Or > better, these could all be reduced to bit flags. My thought is that I'll > stick with following suit here though. OK, better convert all at once with a separate patch. >>> > /* event callback when inserting/removing */ >>> > void (*change_cb)(void *opaque, int reason); >>> > void *change_opaque; >>> > diff --git a/blockdev.c b/blockdev.c >>> > index e81e0ab..a536c20 100644 >>> > --- a/blockdev.c >>> > +++ b/blockdev.c >>> > @@ -546,6 +546,10 @@ DriveInfo *drive_init(QemuOpts *opts, int >>> > default_to_scsi) >>> > >>> > bdrv_flags |= ro ? 0 : BDRV_O_RDWR; >>> > >>> > + if (strncmp(file, "fd:", 3) == 0) { >>> > + dinfo->bdrv->fd_protocol = 1; >>> > + } >>> > + >>> > ret = bdrv_open(dinfo->bdrv, file, bdrv_flags, drv); >>> > if (ret< 0) { >>> > error_report("could not open disk image %s: %s", >>> > @@ -606,6 +610,12 @@ int do_snapshot_blkdev(Monitor *mon, const QDict >>> > *qdict, QObject **ret_data) >>> > goto out; >>> > } >>> > >>> > + if (bdrv_is_fd_protocol(bs)) { >>> > + qerror_report(QERR_UNSUPPORTED); >>> > + ret = -1; >>> > + goto out; >>> > + } >>> > + >>> > pstrcpy(old_filename, sizeof(old_filename), bs->filename); >>> > >>> > old_drv = bs->drv; >>> > diff --git a/monitor.c b/monitor.c >>> > index 59a3e76..ea60be2 100644 >>> > --- a/monitor.c >>> > +++ b/monitor.c >>> > @@ -2832,6 +2832,11 @@ int monitor_get_fd(Monitor *mon, const char >>> > *fdname) >>> > return -1; >>> > } >>> > >>> > +int qemu_get_fd(const char *fdname) >>> > +{ >>> > + return cur_mon ? monitor_get_fd(cur_mon, fdname) : -1; >>> > +} >>> > + >>> > static const mon_cmd_t mon_cmds[] = { >>> > #include "hmp-commands.h" >>> > { NULL, NULL, }, >>> > diff --git a/monitor.h b/monitor.h >>> > index 4f2d328..de5b987 100644 >>> > --- a/monitor.h >>> > +++ b/monitor.h >>> > @@ -51,6 +51,7 @@ int monitor_read_bdrv_key_start(Monitor *mon, >>> > BlockDriverState *bs, >>> > void *opaque); >>> > >>> > int monitor_get_fd(Monitor *mon, const char *fdname); >>> > +int qemu_get_fd(const char *fdname); >>> > >>> > void monitor_vprintf(Monitor *mon, const char *fmt, va_list ap) >>> > GCC_FMT_ATTR(2, 0); >>> > diff --git a/qemu-options.hx b/qemu-options.hx >>> > index 1d5ad8b..f9b66f4 100644 >>> > --- a/qemu-options.hx >>> > +++ b/qemu-options.hx >>> > @@ -116,7 +116,7 @@ using @file{/dev/cdrom} as filename >>> > (@pxref{host_drives}). >>> > ETEXI >>> > >>> > DEF("drive", HAS_ARG, QEMU_OPTION_drive, >>> > - "-drive >>> > [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" >>> > + "-drive >>> > [file=[fd:]file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" >>> > " [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n" >>> > " [,cache=writethrough|writeback|none|unsafe][,format=f]\n" >>> > " [,serial=s][,addr=A][,id=name][,aio=threads|native]\n" >>> > @@ -129,10 +129,12 @@ STEXI >>> > Define a new drive. Valid options are: >>> > >>> > @table @option >>> > -@item file=@var{file} >>> > +@item file=[fd:]@var{file} >>> > This option defines which disk image (@pxref{disk_images}) to use >>> > with >>> > this drive. If the filename contains comma, you must double it >>> > -(for instance, "file=my,,file" to use file "my,file"). >>> > +(for instance, "file=my,,file" to use file "my,file"). >>> > @option{fd:}@var{file} >>> > +specifies the file descriptor of an already open disk image. >>> > +@option{format=}@var{format} is required by @option{fd:}@var{file}. >>> > @item if=@var{interface} >>> > This option defines on which type on interface the drive is >>> > connected. >>> > Available types are: ide, scsi, sd, mtd, floppy, pflash, virtio. >>> > diff --git a/qemu-tool.c b/qemu-tool.c >>> > index 41e5c41..8fe6b8c 100644 >>> > --- a/qemu-tool.c >>> > +++ b/qemu-tool.c >>> > @@ -96,3 +96,8 @@ int64_t qemu_get_clock_ns(QEMUClock *clock) >>> > { >>> > return 0; >>> > } >>> > + >>> > +int qemu_get_fd(const char *fdname) >>> > +{ >>> > + return -1; >>> > +} >>> > -- >>> > 1.7.1 >>> > >>> > >>> > > > > Regards, > Corey Bryant > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list