On 06/13/2012 12:43 AM, Emmanuel Dreyfus wrote: > Hi > > I have a concern with how NetBSD FUSE hande nodes TTL and FORGET. > > When we create/mkdir/mknod/lookup a node, we get a TTL for it (this is > the entry_valid field in struct fuse_entry_out). The kernel should not > lookup the node again until the TTL expires. > > This means that once the TTL is expired, the kernel must do a lookup, > again, and therefore that the previously obtained node should not be > used. That suggests a FUSE FORGET can be sent for it as soon as (the > kernel does not reference it AND TTL is expired). > > For now, NetBSD is lazy about the FUSE FORGET. It is sent when it > reaches the vnode limit and needs to make room. This means there are a > lot of stale nodes that remain in the kernel and in glusterfs, consuming > memory. I can switch to a behavior where the FUSE FORGET are sent > aggressively as soon as (kernel reference drops to 0 and TTL is > expired), but this will cause a lot of useless FUSE messages, with an > impact on the performance front. > > To make it clear, here is what we have with lazy FORGET policy: > LOOKUP a > INACTIVE a > (ttl expires) > LOOKUP a > INACTIVE a > (ttl expires) > LOOKUP a > ... > > And here is what I get wth aggressive FORGET policy > LOOKUP a > INACTIVE a > (ttl expires) > FORGET a <- extra useless FORGET > LOOKUP a > INACTIVE a > (ttl expires) > FORGET a <- extra useless FORGET > LOOKUP a > ... > > What do you think is the best approach? How does the Linux kernel > handles the situation? > I don't know all the details of the VFS, but linux fuse appears to do some batching of forget requests for performance reasons. The relevant commit in linux fuse is: 07e77dca fuse: separate queue for FORGET requests ... and it describes an ominous situation where high cached inode counts on large memory machines lead to up to 30 minute (!) stalls just doing cache eviction. It implements some fairness between processing forget and non-forget requests to avoid that situation. I don't know how susceptible netbsd might be to such a problem (the description seems to imply a lazy model prior to this change, but I don't know the history), but I'd be concerned about the context switching if we hooked every kernel level forget to an independent fuse request. Do you have anything that allows you to test the performance of such behavior? Brian