Re: dm-userspace memory consumption in remap cache

Dan Smith <danms@xxxxxxxxxx> · Fri, 25 Aug 2006 11:43:03 -0700

BG> I've been playing with a program which uses the
BG> libdmu/libdevmapper interface to map a block device through
BG> dm-userspace.  (I haven't been using cowd; I'm looking to
BG> integrate dmu support into an existing program.)

Very cool!

BG> I noticed that after I wrote 1 GB of data to a dmu device with a 4
BG> KB blocksize, the dm-userspace-remaps slab cache consumed about 39
BG> MB of memory.  

Ah, right.

BG> Looking at alloc_remap_atomic(), dmu makes no attempt to reuse
BG> dmu_maps until a memory allocation fails, so that potentially dmu
BG> could force a large amount of data out of the page cache to make
BG> room for its map.

That's true.  Good point.

BG> 1. Periodically invalidate the entire table.  When cowd does this
BG> right now (on SIGHUP), it invalidates each page individually,
BG> which is not very pleasant.  I suppose this could be done by
BG> loading a new dm table.

Right, invalidating the entire table, one remap at a time, would be a
bad thing.  The current SIGHUP behavior was just intended to be a
mechanism for me to test the invalidation process.

BG> 2. Periodically trigger block invalidations from userspace, fired
BG> by either the completion notification mechanism or a periodic
BG> timer. Userspace couldn't do this in an LRU fashion, since it
BG> doesn't see remap cache hits.

Right.  We could push statistic information back to cowd when there
was nothing else to do.  That might be interesting, but probably not
the best way to solve this particular issue.

BG> (As an aside, I haven't been able to figure out the semantics of
BG> the completion notification mechanism.  Could you provide an
BG> example of how you expect it to be used from the userspace side?)

Recent versions of cowd use this to prevent the completion (endio)
From firing until it has flushed its internal metadata mapping to
disk, to prevent the data from being written and the completion event
sent, when the data isn't really on the disk (well, it's on the disk,
but if we crash before we write our metadata, we can't tell that it's
really there during recovery).

BG> 3. Map in dm-linear when there are large consecutive ranges, to
BG> try to keep the table size down.  Some of the early dm-cow design
BG> notes mentioned this approach*, but I notice that the current cowd
BG> doesn't use it.  Is this still a recommended procedure?

I don't think this is the best approach, because if you want to
invalidate a mapping, you'd have to split the dm-linear back up,
suspend/resume the device, etc.

Initially, I was planning to take a cow-centric approach, where
dm-linear could be used to map the sections that were already mapped.
Now that I'm focusing on a more generic approach, we want it to be
more flexible, which is why I implemented a hash table for the remaps
(my initial plan was to remap with dm-linear for performance reasons).

BG> From the kernel side -- if the remap cache in the kernel is
BG> expected to be a subset of the mapping information maintained by
BG> userspace, it seems as though it should be possible to more
BG> aggressively reuse the LRU dmu_maps.  

Yes.

BG> That would impose a performance penalty for the extra map requests
BG> to userspace, but I wonder how that balances against having a
BG> larger page cache.

Correct.

BG> Thoughts?

So, my preference would be to put a limit on the number of remaps that
we maintain "cached" in the kernel.  The existing MRU list (which is
an LRU list if you traverse it backwards) would allow us to more
aggressively re-use remaps as we approached the limit.  Setting a
higher limit at device creation time would allow for more memory
usage, but better performance.

My testing shows that communication with userspace (i.e. to refresh a
mapping that we expired to make room for another) is not as much of a
performance hit as I would have initially imagined.  Thus, I think the
above would be a good way to limit a full-scale memory takeover by
dm-userspace :)

Now that I know at least someone is paying attention, I'll try to get
my latest dm-userspace and cowd versions out on this list.  A small
fix has been made to dm-userspace, and several improvements and fixes
have been made to cowd.  After I post my current code, I'll implement
the memory limit/aggressive reuse functionality and post that as well.

Thanks!

-- 
Dan Smith
IBM Linux Technology Center
Open Hypervisor Team
email: danms@xxxxxxxxxx
Attachment:
pgph8n6FQn3r3.pgp

Description: PGP signature
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel