I am trying to figure out how the crash happens. We now local->fd is valid at the begining of dht_migration_complete_check_task() since it is dereferenced there aithout a hitch. Then it becomes NULL before the function exits, which leads to a crash. That suggests a race condition. I checked local->fd locking and it seems fine. I therefore come to the conclusion that dht_migration_complete_check_task() fails to hold a reference on local->fd. I am now running tests with the change below. Does it makes sense? Is it possible that local->fd get unreferenced and freed from some other thread between the time dht_migration_complete_check_task() is entered and the time fd_ref() is called? --- xlators/cluster/dht/src/dht-helper.c.orig +++ xlators/cluster/dht/src/dht-helper.c src_node = local->cached_subvol; if (!local->loc.inode && !local->fd) - goto out; + return -1; + + if (!local->loc.inode) + fd_ref(local->fd); /* getxattr on cached_subvol for 'linkto' value */ if (!local->loc.inode) ret = syncop_fgetxattr (src_node, local->fd, &dict, @@ -836,8 +839,10 @@ } ret = 0; out: + if (!local->loc.inode) + fd_unref(local->fd); return ret; } -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz manu@xxxxxxxxxx