On Tue, Jul 23, 2019 at 6:58 AM Marc Schöchlin <ms@xxxxxxxxxx> wrote: > > > Am 23.07.19 um 07:28 schrieb Marc Schöchlin: > > > > Okay, i already experimented with high timeouts (i.e 600 seconds). As i can remember this leaded to pretty unusable system if i put high amounts of io on the ec volume. > > This system also runs als krbd volume which saturates the system with ~30-60% iowait - this volume never had a problem. > > > > A comment writer in https://tracker.ceph.com/issues/40822#change-141205 suggests me to reduce the rbd cache. > > What do you think about that? > > Test with reduce rbd cache still fail, therefore i made tests with disabled rbd cache: > > - i disabled rbd cache with "rbd cache = false" > - unmounted and unmapped the image > - mapped and mounted the image > - re-executed my test' > find /srv_ec type f -name "*.sql" -exec gzip -v {} \; > > > It took several hours, but at the end i have the same error situation. > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com Can you please test a consistent Ceph release w/ a known working kernel release? It sounds like you have changed two variables, so it's hard to know which one is broken. We need *you* to isolate what specific Ceph or kernel release causes the break. We really haven't made many changes to rbd-nbd, but the kernel has had major changes to the nbd driver. As Mike pointed out on the tracker ticket, one of those major changes effectively capped the number of devices at 256. Can you repeat this with a single device? Can you repeat this on Ceph rbd-nbd 12.2.11 with an older kernel? -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com