Re: 14.2.10 QE Nautilus validation status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jun 19, 2020 at 12:19 AM Yuri Weinstein <yweinste@xxxxxxxxxx> wrote:
>
> Details of this release summarized here:
> https://tracker.ceph.com/issues/46039#note-2
>
> rados - FAILED approved Neha?
> rgw - FAILED approved Casey?
> rbd - FAILED approved Jason?
> krbd - FAILED approved Jason, Ilya?

xfstests with msgr-failures/many.yaml timed out because some OSDs
crashed on out of order ops:

  src/osd/PrimaryLogPG.cc: 4050: ceph_abort_msg("out of order op")

I looked at one of the OSDs and this appears to be a failure
injection corner case.  The kernel attempted to resend the op in
question over a thousand times, with either the request message or
the reply message not making it due to session resets and eventually
the PG log got trimmed in PGLog::IndexedLog::trim():

  osd.1 ... do_op osd_op(client.4551.0:679461 ... RETRY=1217
  osd.1 ... do_op dup client.4551.0:679461

  trim ... modify ... by client.4551.0:679461

  osd.1 ... do_op osd_op(client.4551.0:679461 ... RETRY=1218
  osd.1 ... bad op order, already applied 680251 > this 679461

Neha, Josh, let me know if I'm off the rails here ;)

Approved.

Thanks,

                Ilya
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx



[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux