On Tue, Nov 15, 2016 at 3:07 PM, Peter Krempa <pkrempa@xxxxxxxxxx> wrote: > On Mon, Nov 14, 2016 at 20:04:16 +0530, Prasanna Kumar Kalever wrote: >> This patch offer, >> 1. Optimizing the calls to glfs_init() and friends >> 2. Temporarily reduce the memory leak appear in libvirt process account, >> even if the root cause for the leaks are glfs_fini() (7 - 10MB per object) > > That is not a valid use case as I've said a few times. > >> [Hopefully gluster should address this in its future releases, not very near] >> >> Currently, a start of a VM will call 2 glfs_new/glfs_init (which will create >> glfs object, once for stat, read headers and next to chown) and then will fork >> qemu process which will call once again (for actual read write IO). >> >> Not that all, in case if we are have 4 extra attached disks, then the total >> calls to glfs_init() and friends will be (4+1)*2 in libvirt and (4+1)*1 in >> qemu space i.e 15 calls. Since we don't have control over qemu process as that >> executes in a different process environment, lets do not bother much about it. >> >> This patch shrinks these 10 calls (i.e objects from above example) to just >> one, by maintaining a cache of glfs objects. >> >> The glfs object is shared across other only if volume name and all the >> volfile servers match. In case of hit glfs object takes a ref and >> only on close unref happens. >> >> Thanks to 'Peter Krempa' for the discussion. >> >> Signed-off-by: Prasanna Kumar Kalever <prasanna.kalever@xxxxxxxxxx> >> --- >> >> WORK IN PROGRESS: (WIP) >> ---------------- >> While initially caching the glfs object, i.e. in >> virStorageBackendGlusterSetPreopened() I have took a ref=2, so on the following >> virStorageBackendGlusterClosePreopened() --ref will make ref=1, which will >> help not cleaning up the object which has to be shared with next coming disks >> (if conditions meat). >> >> Given some context, the idea is that on time-out (or after all disks are >> initiallized) some one should call virStorageBackendGlusterClosePreopened() >> which will ideally make ref=0, hence cached object will be cleaned/deleted >> calling glfs_fini() Apologies for delay w.r.t this patch, was on PTO last week. > > Second option is to add storage driver APIs that will allow to register > and unregister the storage driver volumes when the VM driver will need > to access them. > > This is basically equivalent to calling virStorageFileInit(As) right when > starting the VM and avoiding closing it until cleaning up from the > start. > > This will immediately remove the two openings of the gluster volume when > chmoding and checking that the volume exists. > > Additionally it gives you an convenient point to do reference counting > on similar connections to the gluster node. Wonderful. Was not able to figure out a point for ref counting. This approch unblocks me :-) > > Thankfully gluster's security handling is so terrible that we don't > really need to pass auth objects to it, since it would complicate things > further. > >> >> I had a thought of doing the time-out cleanup call in >> virSecurityManagerSetAllLabel() or similar, but that looks too odd for me? > > No, your endeavour needs to be split into two separate things: > > 1) Unifying of calls to virStorageFileInit and virStorageFileDeinit so > that it's called just once during VM start > > 2) adding connection caching to the gluster backend driver so that > multiple similar connections don't need to open the connection. Yeah, makes perfect sense. > > Adding timed closing of the cached connections would be complicated and > prone to breaking. >> >> Some help ? >> Thanks in advance. >> >> --- >> src/storage/storage_backend_gluster.c | 136 ++++++++++++++++++++++++++++++++-- >> src/storage/storage_backend_gluster.h | 33 ++++++++- >> 2 files changed, 160 insertions(+), 9 deletions(-) >> >> diff --git a/src/storage/storage_backend_gluster.c b/src/storage/storage_backend_gluster.c >> index 8e86704..4f53ebc 100644 >> --- a/src/storage/storage_backend_gluster.c >> +++ b/src/storage/storage_backend_gluster.c >> @@ -47,19 +47,132 @@ struct _virStorageBackendGlusterState { >> char *dir; /* dir from URI, or "/"; always starts and ends in '/' */ >> }; >> >> +virGlusterDefPtr ConnCache = {0,}; > > variables in libvirt don't start with a capital letter. Also the type > name is not accurately chosen. Agree. > >> + >> typedef struct _virStorageBackendGlusterState virStorageBackendGlusterState; >> typedef virStorageBackendGlusterState *virStorageBackendGlusterStatePtr; >> >> +void >> +virStorageBackendGlusterSetPreopened(virStorageSourcePtr src, glfs_t *fs) >> +{ >> + size_t i; >> + virStorageBackendGlusterStatePtrPreopened entry = NULL; >> + >> + if (ConnCache == NULL && (VIR_ALLOC(ConnCache) < 0)) > > This needs to be serialized, otherwise it can create races between > multiple threads. > > Please use the virOnce infrastructure for this. Okay! > >> + return; >> + >> + for (i = 0; i < ConnCache->nConn; i++) { >> + if (STREQ(ConnCache->Conn[i]->volname, src->volume)) > > As I've pointed out, just matching the volume name is not enough. All > hosts and ports need to be checked as well, otherwise you can't be sure > that the volume is the same. > > I've reiterated this point multiple times during our chat. I remember this, sorry for not mentioning this in WIP list, will address this in next spin. Are there any util functions to compare IP vs FQDN Hostname ? Else I need to write one. > >> + return; >> + } >> + >> + if (VIR_ALLOC(entry) < 0) >> + goto L1; >> + >> + if (VIR_STRDUP(entry->volname, src->volume) < 0) >> + goto L1; >> + >> + entry->nhosts = src->nhosts; >> + for (i = 0; i < src->nhosts; i++) { >> + if (VIR_ALLOC_N(entry->hosts[i], strlen(src->hosts[i].name) + 1) < 0) >> + goto L2; >> + strcpy(entry->hosts[i], src->hosts[i].name); > > > VIR_STRDUP instead of all the above. Uhhhh... My bad. > >> + } >> + >> + entry->fs = fs; >> + entry->ref = 2; /* persist glfs obj per volume until a final timeout >> + virStorageBackendGlusterClosePreopened() is called */ >> + >> + if (VIR_INSERT_ELEMENT(ConnCache->Conn, -1, ConnCache->nConn, entry) < 0) >> + goto L2; >> + >> + return; >> + >> +L2: >> + for (i = 0; i < entry->nhosts; i++) >> + VIR_FREE(entry->hosts[i]); >> +L1: > > These won't pass syntax check. Please follow the libvirt coding rules > and choose sensible names for the labels. sure, Looks like I have edited some portions post to make syntax-check execution. sorry, will try not to repeat this. > >> + if(ConnCache->nConn == 0) >> + VIR_FREE(ConnCache); >> + VIR_FREE(entry->volname); >> + VIR_FREE(entry); >> +} >> + >> +glfs_t * >> +virStorageBackendGlusterFindPreopened(virStorageSourcePtr src) >> +{ >> + size_t i, j, k, ret = 0; >> + size_t min, max; >> + >> + if (ConnCache == NULL) >> + return NULL; >> + >> + virStorageBackendGlusterStatePtrPreopened entry; >> + >> + for (i = 0; i < ConnCache->nConn; i++) { >> + entry = ConnCache->Conn[i]; >> + if (STREQ(entry->volname, src->volume)) { >> + min = entry->nhosts < src->nhosts ? entry->nhosts : src->nhosts; >> + max = entry->nhosts >= src->nhosts ? entry->nhosts : src->nhosts; > > If the number of hosts does not match, there's no point in checking that > they are equal ... since they certainly are not. Yeah, I was also thinking if I can convince you with the subset servers match ? Example: Disk1: IPa, IPc, IPe, IPg Disk2: IPa, IPe Since all addresses in Disk2 matches with Disk1, therefore ... ? > >> + for (j = 0; j< min; j++) { >> + if (entry->nhosts == min) { >> + for (k = 0; k < max; k++) { >> + if (STREQ(entry->hosts[j], src->hosts[k].name)) { > > You are not checking port numbers. Should also take care about transport type as well. > >> + ret = 1; >> + break; >> + } >> + } >> + if (!ret) >> + return NULL; >> + } else { >> + for (k = 0; k < max; k++) { >> + if (STREQ(src->hosts[j].name, entry->hosts[k])) { >> + ret = 1; >> + break; >> + } >> + } >> + if (!ret) >> + return NULL; >> + } >> + } >> + entry->ref++; >> + return entry->fs; >> + } >> + } >> + return NULL; >> +} >> + >> +int >> +virStorageBackendGlusterClosePreopened(glfs_t *fs) >> +{ >> + size_t i; >> + int ret = 0; >> + >> + if (fs == NULL) >> + return ret; >> + >> + for (i = 0; i < ConnCache->nConn; i++) { >> + if (ConnCache->Conn[i]->fs == fs) { >> + if (--ConnCache->Conn[i]->ref) >> + return ret; > > Please use libvirt's reference counted objects instead of hand-rolled > reference counting stuff. > >> + >> + ret = glfs_fini(ConnCache->Conn[i]->fs); >> + VIR_FREE(ConnCache->Conn[i]->volname); >> + VIR_FREE(ConnCache->Conn[i]); >> + >> + VIR_DELETE_ELEMENT(ConnCache->Conn, i, ConnCache->nConn); > > This needs locked access to the cache. This would introduce a rather > nasty race possibility. Yeah, something like virMutexLock(connCache->lock); > >> + } >> + } >> + return ret; >> +} >> + >> static void >> virStorageBackendGlusterClose(virStorageBackendGlusterStatePtr state) >> { >> if (!state) >> return; >> >> - /* Yuck - glusterfs-api-3.4.1 appears to always return -1 for >> - * glfs_fini, with errno containing random data, so there's no way >> - * to tell if it succeeded. 3.4.2 is supposed to fix this.*/ >> - if (state->vol && glfs_fini(state->vol) < 0) >> + if (state->vol && virStorageBackendGlusterClosePreopened(state->vol) < 0) >> VIR_DEBUG("shutdown of gluster volume %s failed with errno %d", >> state->volname, errno); >> >> @@ -556,8 +669,7 @@ virStorageFileBackendGlusterDeinit(virStorageSourcePtr src) >> src, src->hosts->name, src->hosts->port ? src->hosts->port : "0", >> src->volume, src->path); >> >> - if (priv->vol) >> - glfs_fini(priv->vol); >> + virStorageBackendGlusterClosePreopened(priv->vol); >> VIR_FREE(priv->canonpath); >> >> VIR_FREE(priv); >> @@ -630,11 +742,20 @@ virStorageFileBackendGlusterInit(virStorageSourcePtr src) >> src, priv, src->volume, src->path, >> (unsigned int)src->drv->uid, (unsigned int)src->drv->gid); >> >> + >> + priv->vol = virStorageBackendGlusterFindPreopened(src); >> + if (priv->vol) { >> + src->drv->priv = priv; >> + return 0; >> + } >> + >> if (!(priv->vol = glfs_new(src->volume))) { >> virReportOOMError(); >> goto error; >> } >> >> + virStorageBackendGlusterSetPreopened(src, priv->vol); >> + >> for (i = 0; i < src->nhosts; i++) { >> if (virStorageFileBackendGlusterInitServer(priv, src->hosts + i) < 0) >> goto error; >> @@ -652,8 +773,7 @@ virStorageFileBackendGlusterInit(virStorageSourcePtr src) >> return 0; >> >> error: >> - if (priv->vol) >> - glfs_fini(priv->vol); >> + virStorageBackendGlusterClosePreopened(priv->vol); >> VIR_FREE(priv); >> >> return -1; >> diff --git a/src/storage/storage_backend_gluster.h b/src/storage/storage_backend_gluster.h >> index 6796016..a0326aa 100644 >> --- a/src/storage/storage_backend_gluster.h >> +++ b/src/storage/storage_backend_gluster.h >> @@ -22,9 +22,40 @@ >> #ifndef __VIR_STORAGE_BACKEND_GLUSTER_H__ >> # define __VIR_STORAGE_BACKEND_GLUSTER_H__ >> >> -# include "storage_backend.h" >> +#include "storage_backend.h" >> +#include <glusterfs/api/glfs.h> > > NACK to this hunk. The gluster stuff needs to stay contained in the > backend driver and should not leak to other parts of libvirt. Will remove. > >> >> extern virStorageBackend virStorageBackendGluster; >> extern virStorageFileBackend virStorageFileBackendGluster; >> >> + >> +struct _virStorageBackendGlusterStatePreopened { >> + char *volname; >> + size_t nhosts; >> + char *hosts[1024]; /* FIXME: 1024 ? */ > > You need to allocate this dynamically. That's the idea, since this was initial patch, just gotneglected. > >> + glfs_t *fs; > > This needs to be internal. > >> + int ref; >> +}; >> + >> +typedef struct _virStorageBackendGlusterStatePreopened virStorageBackendGlusterStatePreopened; >> +typedef virStorageBackendGlusterStatePreopened *virStorageBackendGlusterStatePtrPreopened; >> + >> +struct _virGlusterDef { >> + size_t nConn; >> + virStorageBackendGlusterStatePtrPreopened *Conn; >> +}; >> + >> +typedef struct _virGlusterDef virGlusterDef; >> +typedef virGlusterDef *virGlusterDefPtr; >> + >> +extern virGlusterDefPtr ConnCache; >> + >> +void >> +virStorageBackendGlusterSetPreopened(virStorageSourcePtr src, glfs_t *fs); >> + >> +glfs_t* >> +virStorageBackendGlusterFindPreopened(virStorageSourcePtr src); >> + >> +int >> +virStorageBackendGlusterClosePreopened(glfs_t *fs); >> #endif /* __VIR_STORAGE_BACKEND_GLUSTER_H__ */ > > > Again these can't be exported since they take gluster data types. Also > it looks that they are internal to the gluster impl so there's no need > to export them at all. Make Sense. Thanks a lot Peter! Will post a modified version sometime tomorrow. -- Prasanna > > Peter -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list