BG> I've been playing with a program which uses the BG> libdmu/libdevmapper interface to map a block device through BG> dm-userspace. (I haven't been using cowd; I'm looking to BG> integrate dmu support into an existing program.) Very cool! BG> I noticed that after I wrote 1 GB of data to a dmu device with a 4 BG> KB blocksize, the dm-userspace-remaps slab cache consumed about 39 BG> MB of memory. Ah, right. BG> Looking at alloc_remap_atomic(), dmu makes no attempt to reuse BG> dmu_maps until a memory allocation fails, so that potentially dmu BG> could force a large amount of data out of the page cache to make BG> room for its map. That's true. Good point. BG> 1. Periodically invalidate the entire table. When cowd does this BG> right now (on SIGHUP), it invalidates each page individually, BG> which is not very pleasant. I suppose this could be done by BG> loading a new dm table. Right, invalidating the entire table, one remap at a time, would be a bad thing. The current SIGHUP behavior was just intended to be a mechanism for me to test the invalidation process. BG> 2. Periodically trigger block invalidations from userspace, fired BG> by either the completion notification mechanism or a periodic BG> timer. Userspace couldn't do this in an LRU fashion, since it BG> doesn't see remap cache hits. Right. We could push statistic information back to cowd when there was nothing else to do. That might be interesting, but probably not the best way to solve this particular issue. BG> (As an aside, I haven't been able to figure out the semantics of BG> the completion notification mechanism. Could you provide an BG> example of how you expect it to be used from the userspace side?) Recent versions of cowd use this to prevent the completion (endio) From firing until it has flushed its internal metadata mapping to disk, to prevent the data from being written and the completion event sent, when the data isn't really on the disk (well, it's on the disk, but if we crash before we write our metadata, we can't tell that it's really there during recovery). BG> 3. Map in dm-linear when there are large consecutive ranges, to BG> try to keep the table size down. Some of the early dm-cow design BG> notes mentioned this approach*, but I notice that the current cowd BG> doesn't use it. Is this still a recommended procedure? I don't think this is the best approach, because if you want to invalidate a mapping, you'd have to split the dm-linear back up, suspend/resume the device, etc. Initially, I was planning to take a cow-centric approach, where dm-linear could be used to map the sections that were already mapped. Now that I'm focusing on a more generic approach, we want it to be more flexible, which is why I implemented a hash table for the remaps (my initial plan was to remap with dm-linear for performance reasons). BG> From the kernel side -- if the remap cache in the kernel is BG> expected to be a subset of the mapping information maintained by BG> userspace, it seems as though it should be possible to more BG> aggressively reuse the LRU dmu_maps. Yes. BG> That would impose a performance penalty for the extra map requests BG> to userspace, but I wonder how that balances against having a BG> larger page cache. Correct. BG> Thoughts? So, my preference would be to put a limit on the number of remaps that we maintain "cached" in the kernel. The existing MRU list (which is an LRU list if you traverse it backwards) would allow us to more aggressively re-use remaps as we approached the limit. Setting a higher limit at device creation time would allow for more memory usage, but better performance. My testing shows that communication with userspace (i.e. to refresh a mapping that we expired to make room for another) is not as much of a performance hit as I would have initially imagined. Thus, I think the above would be a good way to limit a full-scale memory takeover by dm-userspace :) Now that I know at least someone is paying attention, I'll try to get my latest dm-userspace and cowd versions out on this list. A small fix has been made to dm-userspace, and several improvements and fixes have been made to cowd. After I post my current code, I'll implement the memory limit/aggressive reuse functionality and post that as well. Thanks! -- Dan Smith IBM Linux Technology Center Open Hypervisor Team email: danms@xxxxxxxxxx
Attachment:
pgph8n6FQn3r3.pgp
Description: PGP signature
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel