Re: slow pacific osd startup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14/02/2022 16:07, Igor Fedotov wrote:
Hi Andrej,


On 2/12/2022 9:56 AM, Andrej Filipcic wrote:
On 11/02/2022 15:22, Igor Fedotov wrote:
Hi Andrej,

you might want to set debug_bluestore and debug_bluefs to 10 and check what's happening during the startup...

Alternatively you might try to compact slow OSD's DB using ceph_kvstore_tool and check if it helps to speedup the startup...

with debug 10,

you set both debug bluefs and bluestore to 10, right?
yes.

We have drained already most of those OSDs and recreated them. Some are still there and can be further debugged (the log you have asked is mentioned below).  One more thing I found strange is that even these drained OSDs (osd out wit no pgs) are very slow to start, takes about the same as when they were 1/2 full.



...
2022-02-12T07:50:18.672+0100 7ff0ad32b700  4 rocksdb: [compaction/compaction_job.cc:1349] [default] [JOB 14] Generated table #867263: 84569 keys, 45766553 bytes 2022-02-12T07:50:18.672+0100 7ff0ad32b700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1644648618672869, "cf_name": "default", "job": 14, "event": "table_file_creation", "file_number": 867263, "file_size": 45766553, "table_properties": {"data_size": 44720602, "index_size": 833544, "index_partitions": 0, "top_level_index_size": 0, "index_key_is_user_key": 0, "index_value_is_delta_encoded": 0, "filter_size": 211525, "raw_key_size": 5705417, "raw_average_key_size": 67, "raw_value_size": 42291491, "raw_average_value_size": 500, "num_data_blocks": 11468, "num_entries": 84569, "num_deletions": 0, "num_merge_operands": 0, "num_range_deletions": 0, "format_version": 0, "fixed_key_len": 0, "filter_policy": "rocksdb.BuiltinBloomFilter", "column_family_name": "default", "column_family_id": 0, "comparator": "leveldb.BytewiseComparator", "merge_operator": ".T:int64_array.b:bitwise_xor", "prefix_extractor_name": "nullptr", "property_collectors": "[]", "compression": "NoCompression", "compression_options": "window_bits=-14; level=32767; strategy=0; max_dict_bytes=0; zstd_max_train_bytes=0; enabled=0; ", "creation_time": 1626461453, "oldest_key_time": 0, "file_creation_time": 1644648616}} 2022-02-12T07:50:18.693+0100 7ff0ad32b700  4 rocksdb: [compaction/compaction_job.cc:1415] [default] [JOB 14] Compacted 1@3 + 25@4 files to L4 => 1625871428 bytes 2022-02-12T07:50:18.827+0100 7ff0ad32b700  4 rocksdb: (Original Log Time 2022/02/12-07:50:18.827605) [compaction/compaction_job.cc:760] [default] compacted to: files[0 1 30 348 1853 0 0] max score 0.95, MB/sec: 30.0 rd, 28.7 wr, level 4, files in(1, 25) out(24) MB in(50.9, 1574.2) out(1550.6), read-write-amplify(62.3) write-amplify(30.4) OK, records in: 3428443, records dropped: 404881 output_compression: NoCompression

2022-02-12T07:50:18.827+0100 7ff0ad32b700  4 rocksdb: (Original Log Time 2022/02/12-07:50:18.827635) EVENT_LOG_v1 {"time_micros": 1644648618827623, "job": 14, "event": "compaction_finished", "compaction_time_micros": 56736833, "compaction_time_cpu_micros": 4401723, "output_level": 4, "num_output_files": 24, "total_output_size": 1625871428, "num_input_records": 3428443, "num_output_records": 3023562, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 1, 30, 348, 1853, 0, 0]} 2022-02-12T07:50:18.827+0100 7ff0ad32b700  4 rocksdb: EVENT_LOG_v1 {"time_micros": 1644648618828078, "job": 14, "event": "table_file_deletion", "file_number": 867202} 2022-02-12T07:50:18.827+0100 7ff0acb2a700  4 rocksdb: (Original Log Time 2022/02/12-07:50:18.828165) [db_impl/db_impl_compaction_flush.cc:2611] Compaction nothing to do

the above is reached  in ~4 minutes, and then the extreme reading takes over with no additional logging. So I guess compaction is not the issue.

Can I see the bluestore/bluefs output preceding rocksdb output, please?
I have copied the full log here, it's rather large
http://www-f9.ijs.si/~andrej/ceph-osd.0.log-20220213.xz

Best regards,
Andrej




Just in case - is bluefs_buffered_io set to true for these OSDs?
yes, that's set.

Andrej


Thanks,

Igor

On 2/11/2022 11:14 AM, Andrej Filipcic wrote:

Hi,

with 16.2.7, some OSDs are very slow to start, eg it takes ~30min for an hdd (12TB, 5TB used) to become active. After initialization, there is 20-40min of extreme reading at ~150MB/s from the OSD, just after
-------
Uptime(secs): 602.2 total, 0.0 interval
Flush(GB): cumulative 0.101, interval 0.000
AddFile(GB): cumulative 0.000, interval 0.000
AddFile(Total Files): cumulative 0, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 0, interval 0
Cumulative compaction: 0.72 GB write, 1.22 MB/s write, 0.71 GB read, 1.21 MB/s read, 33.1 seconds Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count

** File Read Latency Histogram By Level [default] **
-------
and the log (no special debug conf) is not updated during that time.

These slow OSDs were created with nautilus, and I am not quite sure when this slow startup appeared. the newer OSDs are fast to start. The minimum OSD release is set to pacific.

Any hints before I start debuging the problem? The history of installs: 2020-09-15T09:42:29Z SUBDEBUG Installed: ceph-base-2:15.2.4-0.el8.x86_64
2020-12-23T11:02:29Z SUBDEBUG Upgrade: ceph-base-2:15.2.8-0.el8.x86_64
2021-08-31T08:46:29Z SUBDEBUG Upgrade: ceph-base-2:16.2.5-0.el8.x86_64
2022-02-09T09:38:42+0100 SUBDEBUG Upgrade: ceph-base-2:16.2.7-0.el8.x86_64

Best regards,
Andrej





--
_____________________________________________________________
   prof. dr. Andrej Filipcic,   E-mail: Andrej.Filipcic@xxxxxx
   Department of Experimental High Energy Physics - F9
   Jozef Stefan Institute, Jamova 39, P.o.Box 3000
   SI-1001 Ljubljana, Slovenia
   Tel.: +386-1-477-3674    Fax: +386-1-425-7074
-------------------------------------------------------------

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux