Re: scrub loadavg too high

huang jun <hjwsm1989@xxxxxxxxx> · Mon, 4 Jul 2011 14:51:46 +0800



thanks, Colin
as you said, when detect  the system average workload is too high, it
can from the low-level ext3 writing, because it can not return a
result in 10 minutes, bring the COSD process
to suicide.
we do not use btrfs, we use ext3 instead. does this make any difference?
and another phenomen, sometimes the client transfers 4MB data every 5 secs,
2011-07-04 14:35:17.512670    pg v582: 396 pgs: 396
active+clean+degraded; 1844 MB data, 6020 MB used, 124 GB / 137 GB
avail; 482/964 degraded (50.000%)
2011-07-04 14:35:22.514094    pg v583: 396 pgs: 396
active+clean+degraded; 1848 MB data, 6024 MB used, 124 GB / 137 GB
avail; 483/966 degraded (50.000%)
2011-07-04 14:35:27.513259    pg v584: 396 pgs: 396
active+clean+degraded; 1852 MB data, 6032 MB used, 124 GB / 137 GB
avail; 484/968 degraded (50.000%)
2011-07-04 14:35:32.513605    pg v585: 396 pgs: 396
active+clean+degraded; 1856 MB data, 6036 MB used, 124 GB / 137 GB
avail; 485/970 degraded (50.000%)
2011-07-04 14:35:37.513930    pg v586: 396 pgs: 396
active+clean+degraded; 1860 MB data, 6040 MB used, 124 GB / 137 GB
avail; 486/972 degraded (50.000%)
2011-07-04 14:35:42.514776    pg v587: 396 pgs: 396
active+clean+degraded; 1864 MB data, 6040 MB used, 124 GB / 137 GB
avail; 487/974 degraded (50.000%)
2011-07-04 14:35:47.514993    pg v588: 396 pgs: 396
active+clean+degraded; 1868 MB data, 6048 MB used, 124 GB / 137 GB
avail; 488/976 degraded (50.000%)
 but sometime it reach 100MB/s in continous few secs.
does this related to the OSD it writes to?

thanks !


2011/7/4 Colin McCabe <cmccabe@xxxxxxxxxxxxxx>:
> On Sun, Jul 3, 2011 at 6:42 AM, huang jun <hjwsm1989@xxxxxxxxx> wrote:
>> hi,all
>> I test ceph 0.30 on linux-2.6.37 recently,after i build the cluster
>> bsd12:/# ceph -s
>> 2011-07-04 09:37:42.920166    pg v66: 198 pgs: 198
>> active+clean+degraded; 1008 MB data, 11363 MB used, 2986 MB / 15118 MB
>> avail; 273/546 degraded (50.000%)
>> 2011-07-04 09:37:42.920674   mds e4: 1/1/1 up {0=0=up:active}
>> 2011-07-04 09:37:42.920723   osd e2: 1 osds: 1 up, 1 in
>> 2011-07-04 09:37:42.920786   log 2011-07-04 09:15:47.239098 osd0
>> 192.168.1.102:6801/7646 73 : [INF] 1.6 scrub ok
>> 2011-07-04 09:37:42.920860   mon e1: 1 mons at {0=192.168.1.102:6789/0}
>>
>> then, mount ceph fs on /mnt
>> bsd12:/mnt/dd# dd if=/dev/zero of=sa bs=4M count=200
>> and get nothing, is dead
>>
>> during the writing, use sar to monitor the eth0,
>> but find there isn't any data transfered at all, like:
>> 09:31:29 PM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s
>> rxcmp/s   txcmp/s  rxmcst/s
>> 09:31:30 PM        lo      0.00      0.00      0.00      0.00
>> 0.00      0.00      0.00
>> 09:31:30 PM      eth0      0.00      0.00      0.00      0.00
>> 0.00      0.00      0.00
>> 09:31:30 PM      eth1      4.00      2.00      0.41      0.26
>> 0.00      0.00      0.00
>>
>> it seems OSD didn't do write, so result in client can not go on writing.
>> from the osd log, the scrub loadavg is very high:
>> 2011-07-04 09:25:52.163742 7f0b56f2c700 osd0 2 tick
>> 2011-07-04 09:25:52.163788 7f0b56f2c700 osd0 2 scrub_should_schedule
>> loadavg 2 >= max 0.5 = no, load too high
>> 2011-07-04 09:25:52.163804 7f0b56f2c700 osd0 2 do_mon_report
>> 2011-07-04 09:25:52.163819 7f0b56f2c700 osd0 2 send_alive up_thru
>> currently 0 want 0
>> 2011-07-04 09:25:52.163833 7f0b56f2c700 osd0 2 send_pg_stats
>> 2011-07-04 09:25:52.782851 7f0b4d517700 osd0 2 update_osd_stat
>> osd_stat(11363 MB used, 2986 MB avail, 15118 MB total, peers []/[])
>> 2011-07-04 09:25:52.782887 7f0b4d517700 osd0 2 heartbeat:
>> stat(2011-07-04 09:25:52.782813 oprate=0.339098 qlen=0 recent_qlen=0
>> rdlat=0 / 0 fshedin=0)
>> 2011-07-04 09:25:52.782902 7f0b4d517700 osd0 2 heartbeat:
>> osd_stat(11363 MB used, 2986 MB avail, 15118 MB total, peers []/[])
>> 2011-07-04 09:25:53.012934 7f0b51f22700 FileStore: sync_entry timed
>> out after 600 seconds.
>> 2011-07-04 09:25:53.012969 1: (SafeTimer::timer_thread()+0x311) [0x6028d1]
>> 2011-07-04 09:25:53.012976 2: (SafeTimerThread::entry()+0xd) [0x604f3d]
>> 2011-07-04 09:25:53.012985 3: (()+0x68ba) [0x7f0b5bba78ba]
>> 2011-07-04 09:25:53.012992 4: (clone()+0x6d) [0x7f0b5a80302d]
>> 2011-07-04 09:25:53.012997 *** Caught signal (Aborted) **
>>  in thread 0x7f0b51f22700
>>
>> i'd like to know, does scrub workload too high results in OSD abort?
>> and why we design scrub here ? to protect the consistence of PG?
>
> That error only occurs when 10 minutes go past without any activity.
> It seems unlikely that any amount of scrubbing could cause the
> filesystem to pause for that long. Are there any backtraces from btrfs
> in the syslog? Also, you might try mounting the btrfs filesystem
> yourself to see if it works for you.
>
> regards,
> Colin
>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html