Wrong mail-id used earlier. please refer below -------- Original Message -------- Subject: Re: [RFC] Block Device Xlator Design Date: Wed, 11 Jul 2012 16:24:24 +0530 From: Amar Tumballi <atumball@xxxxxxxxxx> To: M. Mohan Kumar <mohan@xxxxxxxxxx> CC: Shishir Gowda <sgowda@xxxxxxxxxx>, gluster-devel@xxxxxxxxxx
I posted GlusterFS server xlator patches to enable exporting Block Devices (currently only Logical Volumes) as regular files at the client side couple of weeks ago. Here is the link for the patches: http://review.gluster.com/3551 I would to like to discuss about the design of this xlator. Current code uses lvm2-devel library to find out list of logical volumes for the given volume group (in BD xlator each volume file exports on volume group, in future we may extend this to export multiple volume groups if needed). init routine of BD xlator constructs internal data structure holding list of all logical volumes in the VG.
Went through the patchset, and it looks fine. One of the major thing to take care is, the build should not fail or it should not assume that lvm2-devel library is always present. Hence it should have corresponding checks in configure.ac to handle the situation. (ref: you can look into how libibverbs-devel dependency is handled)
When open request comes corresponding open interface in BD xlator opens the intended LV by using this logic: /dev/<vg-name>/<lv-name>. This path is actually a symbolic link to /dev/dm-<x>. Is my assumption about having this /dev/<vg-name>/<lv-name> is it right? Will it always work?
This should be fine. One concern here is how do we keep track of gfid to path mappings. With having proper resolution there, we can guarantee the behavior.
Also if there is a request to create a file (in turn it has to create a LV at the server side), lvm2 api is used to create a logical volume in the given VG but with a pre-determined size ie one logical extent size because create interface does not take size as one of the parameters but size is one of the parameters to create a logical volume. In a typical VM disk image scenario qemu-img first creates a file and then uses truncate command to set the required file size. So this should not be an issue with this kind of usage.
I think creat() followed by ftruncate() should just work fine too.
But there are other issues in the BD xlator code as of now. lvm2 api does not support resizing a LV, creating snapshot of LV. But there are tools available to do the same. So BD xlator code forks and executes the required binary to achieve the functionality. i.e when truncate is called on a BD xlator volume, it will result in running lvresize binary with required parameters. I checked with lvm2-devel mailing list about their plan to support lv resizing and creating snapshots & waiting for the responses. Is it okay to rely on external binaries to create a snapshot of a LV and resize it?
It is ok to call external binaries (security issues are present, but is a different topic of discussion). Two things to take care here: 1. as Avati rightly mentioned, utilize the runner ('libglusterfs/src/run.h') interface. 2. If you are expecting/waiting on return value of these binaries, then we have to make sure we have a mechanism to handle hang situation.
Also when a LV is created out-of-band for example, using gluster cli to create a LV (I am working on the gluster cli patches to create LV and copy/snapshot LVs), BD xlator will not be aware of these changes and I am looking if 'notify' feature of xlator can be used to notify the BD xlator to create a LV, snapshot instead of doing it from gluster management xlators. I have sent a mail to gluster-devel asking some more information about this.
Refer to Shishir's response for this. Hope this serves as a initial review comment. Regards, Amar