Re: 16.2.7 pacific rocksdb Corruption: CURRENT

Andrej Filipcic <andrej.filipcic@xxxxxx> · Mon, 20 Dec 2021 13:59:45 +0100

On 12/20/21 13:48, Igor Fedotov wrote:

Andrej,

do you remember about those OSDs crashing other days and failed to 
start - did they finally expose (similar?) BlueFS/RocksDB issues or 
that was something completely different?

it was different, at least for the crashes bellow. though to me it 
seems, there is ~0.1% probability that an OSD would not restart OK and 
would get corrupted at some point.

And generally - do you think your cluster is susceptible to OSD being 
corrupted on restart/failure? I.e the probabiliy of that event is 
pretty notable? If so would it be possible to try to trigger the issue 
intentionally at some OSD with a more verbose logging when the cluster 
is healthy again?

In general, the corruptions are quite rare and typically related to osd 
restarts or crashes. So, it's not a problem if one OSD is triggered for 
corruption, the annoying thing is that the EC recovery time is quite 
long. I use 16+3 erasure, so even with 5 or 6  failed OSDs, the data 
loss probability is pretty low.

Best regards,
Andrej

Thanks,

Igor

On 12/20/2021 3:25 PM, Andrej Filipcic wrote:
On 12/20/21 13:14, Igor Fedotov wrote:

On 12/20/2021 2:58 PM, Andrej Filipcic wrote:
On 12/20/21 12:47, Igor Fedotov wrote:

Thanks for the info.

Just in case - is write caching disabled for the disk in question? 
What's the output for "hdparm -W </path-to-disk-dev>" ?

no, it is enabled. Shall I disable that on all OSDs?

I can't tell you for sure if this is the root cause. Generally 
upstream recommends to disable write caching due to multiple 
performance issues we observed. I don't recall any one about data 
corruption though. But still can imagine something like that. On the 
other hand as far as I could see from the initial log there were 
rather no node reboot/shutdown on upgrade hence  hardware write 
caching is unlikely to be involved. Am I right about no node 
shutdown in you case?
yes, only the ceph services were restarted.

And it would be an interesting experiment whether it data corruption 
is related indeed. So it would be great if you can test that...

One more question please - is this a bare metal deployment or 
containerized (Rook?) one?
bare metal on rhel8.3, with 5.11.4 elrepo kernel that needs updating 
at some point. It was initially deployed with ceph ansible.

And I presume OSD restart  is a rare event in your cluster, isn't 
it? That's why you probably haven't faced the issue before...
OSD restarts are rare, they have been running from September. well, I 
had several crashes in the meantime, and some of them also caused 
corruptions.

Actually, when updating to 16.2.7, many osds had this crash, but 
recovered OK:
[root@lcst0001 ~]# ceph crash info 
2021-12-20T05:28:07.001230Z_bd286ae8-7867-4040-89c6-8d2de7794a76
{
   "assert_condition": 
"(sharded_in_flight_list.back())->ops_in_flight_sharded.empty()",
   "assert_file": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigant
ic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/common/TrackedOp.cc",
   "assert_func": "OpTracker::~OpTracker()",
   "assert_line": 173,
   "assert_msg": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/giganti
c/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/common/TrackedOp.cc: 
In function 'OpTracker::~OpTracker()' thread 7fbd3373e080 time 
2021-12-20T06:28:06.99380
7+0100\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/1
6.2.6/rpm/el8/BUILD/ceph-16.2.6/src/common/TrackedOp.cc: 173: FAILED 
ceph_assert((sharded_in_flight_list.back())->ops_in_flight_sharded.empty())\n",
   "assert_thread_name": "ceph-osd",
   "backtrace": [
       "/lib64/libpthread.so.0(+0x12b20) [0x7fbd316ecb20]",
       "gsignal()",
       "abort()",
       "(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1a9) [0x5593fc36a59d]",
       "/usr/bin/ceph-osd(+0x56a766) [0x5593fc36a766]",
       "(OpTracker::~OpTracker()+0x39) [0x5593fc755559]",
       "(OSD::~OSD()+0x304) [0x5593fc4a9994]",
       "(OSD::~OSD()+0xd) [0x5593fc4a9b5d]",
       "main()",
       "__libc_start_main()",
       "_start()"
   ],
   "ceph_version": "16.2.6",
   "crash_id": 
"2021-12-20T05:28:07.001230Z_bd286ae8-7867-4040-89c6-8d2de7794a76",
   "entity_name": "osd.1396",
   "os_id": "rhel",
   "os_name": "Red Hat Enterprise Linux",
   "os_version": "8.3 (Ootpa)",
   "os_version_id": "8.3",
   "process_name": "ceph-osd",
   "stack_sig": 
"d247f79a887d3f92ed5377a4aabc407a8e8ab4392f99134800755c6450b8ce6f",
   "timestamp": "2021-12-20T05:28:07.001230Z",
   "utsname_hostname": "lcst0057",
   "utsname_machine": "x86_64",
   "utsname_release": "5.11.4-1.el8.elrepo.x86_64",
   "utsname_sysname": "Linux",
   "utsname_version": "#1 SMP Sun Mar 7 08:41:44 EST 2021"
}

Another series of crashes appeared when I disabled scrubbing, and 
some of the OSDs had to be reinitialized after

[root@lcst0001 ~]# ceph crash info 
2021-12-04T14:19:28.102548Z_9b2606a1-a334-4a97-95ec-169cd013bf0b
{
   "assert_condition": "state_cast<const NotActive*>()",
   "assert_file": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigant
ic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/osd/scrub_machine.cc",
   "assert_func": "void Scrub::ScrubMachine::assert_not_active() const",
   "assert_line": 55,
   "assert_msg": 
"/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/giganti
c/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/osd/scrub_machine.cc: 
In function 'void Scrub::ScrubMachine::assert_not_active() const' 
thread 7fcde174e700 t
ime 
2021-12-04T15:19:28.092559+0100\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MA
CHINE_SIZE/gigantic/release/16.2.6/rpm/el8/BUILD/ceph-16.2.6/src/osd/scrub_machine.cc: 
55: FAILED ceph_assert(state_cast<const NotActive*>())\n",
   "assert_thread_name": "tp_osd_tp",
   "backtrace": [
       "/lib64/libpthread.so.0(+0x12b20) [0x7fce05769b20]",
       "gsignal()",
       "abort()",
       "(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x1a9) [0x563f50b6a59d]",
       "/usr/bin/ceph-osd(+0x56a766) [0x563f50b6a766]",
       "/usr/bin/ceph-osd(+0x9e4dcf) [0x563f50fe4dcf]",
       "(PgScrubber::replica_scrub_op(boost::intrusive_ptr<OpRequest>)+0x4bf) 
[0x563f50fd530f]",
       "(PG::replica_scrub(boost::intrusive_ptr<OpRequest>, 
ThreadPool::TPHandle&)+0x62) [0x563f50d209e2]",
       "(PrimaryLogPG::do_request(boost::intrusive_ptr<OpRequest>&, 
ThreadPool::TPHandle&)+0x7bb) [0x563f50de5f4b]",
       "(OSD::dequeue_op(boost::intrusive_ptr<PG>, 
boost::intrusive_ptr<OpRequest>, ThreadPool::TPHandle&)+0x309) 
[0x563f50c6f1b9]",
       "(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, 
boost::intrusive_ptr<PG>&, ThreadPool::TPHandle&)+0x68) 
[0x563f50ecc868]",
       "(OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0xa58) [0x563f50c8f1e8]",
       "(ShardedThreadPool::shardedthreadpool_worker(unsigned 
int)+0x5c4) [0x563f512fa6c4]",
       "(ShardedThreadPool::WorkThreadSharded::entry()+0x14) 
[0x563f512fd364]",
       "/lib64/libpthread.so.0(+0x814a) [0x7fce0575f14a]",
       "clone()"
   ],
   "ceph_version": "16.2.6",
   "crash_id": 
"2021-12-04T14:19:28.102548Z_9b2606a1-a334-4a97-95ec-169cd013bf0b",
   "entity_name": "osd.209",
   "os_id": "rhel",
   "os_name": "Red Hat Enterprise Linux",
   "os_version": "8.3 (Ootpa)",
   "os_version_id": "8.3",
   "process_name": "ceph-osd",
   "stack_sig": 
"42f4a4f71dfb4c78153e86327c6b3213f94806652d846a9002bd7ebcb05552cf",
   "timestamp": "2021-12-04T14:19:28.102548Z",
   "utsname_hostname": "lcst0007",
   "utsname_machine": "x86_64",
   "utsname_release": "5.11.4-1.el8.elrepo.x86_64",
   "utsname_sysname": "Linux",
   "utsname_version": "#1 SMP Sun Mar 7 08:41:44 EST 2021"
}

Best regards,
Andrej

Thanks in advance,

--
_____________________________________________________________
    prof. dr. Andrej Filipcic,   E-mail:Andrej.Filipcic@xxxxxx
    Department of Experimental High Energy Physics - F9
    Jozef Stefan Institute, Jamova 39, P.o.Box 3000
    SI-1001 Ljubljana, Slovenia
    Tel.: +386-1-477-3674    Fax: +386-1-477-3166
-------------------------------------------------------------
--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx

--
_____________________________________________________________
   prof. dr. Andrej Filipcic,   E-mail:Andrej.Filipcic@xxxxxx
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674    Fax: +386-1-477-3166
-------------------------------------------------------------
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx