On Fri, Oct 23, 2015 at 9:06 PM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > On Fri, Oct 23, 2015 at 9:00 PM, ronny.hegewald@xxxxxxxxx > <ronny.hegewald@xxxxxxxxx> wrote: >>> Could you share the entire log snippet for those 10 minutes? >> >> Thats all in the logs. But if more information would be useful tell me which logs >> to activate and i will give it another run. At least this part is easy to reproduce. >> >>> Which kernel was this on? >> >> The latest kernel i used which produced the corruption was 3.19.8. >> The earliest one was 3.11. > > No need for now, I'll poke around and report back. So the "bad crc" errors are of course easily reproducible, but I haven't managed to reproduce ext4 corruptions. I amended your patch to only require stable pages in case we actually compute checksums, see https://github.com/ceph/ceph-client/commit/4febcceb866822c1a1aee2836c9c810e3ef29bbf. Any other data points you can share? Can you describe your cluster (boxes, OSDs, clients, rbds mapped - where, how many, ext4 mkfs and mount options, etc) in more detail? Is there anything special about your setup that you can think of? You've mentioned that the best test case in your experience is kernel compilation. What .config are you using, how many threads (make -jX) and how long does it take to build a kernel with that .config and that number of threads? You have more than one rbd device mapped on the same box - how many exactly, do you put any load on the rest while the kernel is compiling on one of them? What about rbd devices mapped on other boxes? You get the idea - every bit counts. Thanks, Ilya -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html