On 05/25/2010 11:30 PM, Pete Zaitcev wrote:
If a chunkserver goes down, tabled sometimes throws a phantom "object not found". It happens because we keep hitting the same down node and exhaust the retries. The existing code calls rand() every time and hopes for the best, but this is too likely to end poorly. The fix is to only randomize once before the retry loop, and then cycle through all available nodes deterministically. The same fix would apply even if we used a better technique to select an available chunkserver than just random. Also, we refactor the code just a little bit, so that the enormous function object_get_body gets somewhat easier to follow. Signed-off-by: Pete Zaitcev<zaitcev@xxxxxxxxxx>
applied -- To unsubscribe from this list: send the line "unsubscribe hail-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html