Hi Andreas, I gave a shot at implementing a cache to avoid computing the decoding matrix every time a 4KB stripe needs it, for the jerasure plugin, in the same way you did it for the ISA plugin. The draft is at https://github.com/dachary/ceph/commit/a6fb5257fabd810704405c8bc13743d1592ecc54 if you're curious. Then I did some benchmarking and was quite disappointed. It looks like whenever the matrix needs to be computed jerasure_invert_matrix needs ~4000 cycles. Compared to the cost of galois_w08_region_multiply (~4 millions cycles), it is very small [1]. With the ISA plugin ec_init_table is less expensive than jerasure_invert_matrix with ~1200 cycles as well as the the function ec_encode_data_avx (1.5 millions cycles) [2]. In both cases though the order of magnitude remains (1000 to 1) and makes me wonder if I'm not missing something. What do you think ? Cheers [1] jerasure profiling: make -j4 ceph_erasure_code_benchmark && rm bench.callgrind && valgrind --tool=callgrind --callgrind-out-file=bench.callgrind ./ceph_erasure_code_benchmark --plugin jerasure --parameter directory=.libs --workload decode --verbose --parameter technique=reed_sol_van --parameter k=4 --parameter m=2 --iterations 1024 --erased 1 --erased 2 && kcachegrind bench.callgrind [2] isa profiling: make -j4 ceph_erasure_code_benchmark && rm bench.callgrind && valgrind --tool=callgrind --callgrind-out-file=bench.callgrind ./ceph_erasure_code_benchmark --plugin isa --parameter directory=.libs --workload decode --verbose --parameter technique=reed_sol_van --parameter k=4 --parameter m=2 --iterations 1024 --erased 1 --erased 2 && kcachegrind bench.callgrind -- Loïc Dachary, Artisan Logiciel Libre
Attachment:
signature.asc
Description: OpenPGP digital signature