Actually, looks like Xiaoxi beat you to it for infernalis! 42a3ab95ec459042e92198fb061c8393146bd1b4 -Sam On Thu, Nov 19, 2015 at 12:30 PM, Marcin Gibuła <m.gibula@xxxxxxxxx> wrote: >> Judging from debug output, the problem is in journal recovery, when it >> tries to delete object with huge (several milion keys - it is radosgw >> index* for bucket with over 50mln objects) amount of keys, using >> leveldb's rmkeys_by_prefix() method. >> >> Looking at the source code, rmkeys_by_prefix() batches all operations >> into one list and then submit_transaction() executes them all atomically. >> >> I'd love to write a patch for this issue, but it seems unfixable (or is >> it?) with current API and method behaviour. Could you offer any advice >> on how to proceed? > > > Answering myself, could anyone verify if attached patch looks ok? Should > reduce memory footprint a bit. > > When I first read this code, I assumed that data pointed by leveldb::Slice > have to be reachable until db->Write is called. > > However, looking into leveldb and into its source code, there is no such > requirement - leveldb makes its own copy of key, so we're effectivly > doubling memory footprint for no reason. > > -- > mg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html