Re: Watch for fstrim running on your Ubuntu systems

Wido den Hollander <wido@xxxxxxxx> · Sat, 8 Jul 2017 09:11:42 +0200 (CEST)

> Op 7 juli 2017 om 2:20 schreef Reed Dier <reed.dier@xxxxxxxxxxx>:
> 
> 
> I could easily see that being the case, especially with Micron as a common thread, but it appears that I am on the latest FW for both the SATA and the NVMe:
> 
> > $ sudo ./msecli -L | egrep 'Device|FW'
> > Device Name          : /dev/sda
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sdb
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sdc
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sdd
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sde
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sdf
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sdg
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sdh
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sdi
> > FW-Rev               : D0MU027
> > Device Name          : /dev/sdj
> > FW-Rev               : D0MU027
> > Device Name          : /dev/nvme0
> > FW-Rev               : 0091634
> 
> D0MU027 and 1634 are the latest reported FW from Micron, current as of 04/12/2017 and 12/07/2016, respectively.
> 
> Could be current FW doesn’t play nice, so thats on the table. But for now, its a thread that can’t be pulled any further.
> 

Also keep in mind that only with SATA 3.1 there is Queued TRIM. Many controllers out there are SATA 3.0 and they block when trimming.

So it could either be a firmware or the SATA controller. I also have PM863a SSDs which block during TRIM because the controller (Intel X99) doesn't support it.

Wido

> Appreciate the feedback,
> 
> Reed
> 
> > On Jul 6, 2017, at 1:18 PM, Peter Maloney <peter.maloney@xxxxxxxxxxxxxxxxxxxx> wrote:
> > 
> > Hey,
> > 
> > I have some SAS Micron S630DC-400 which came with firmware M013 which did the same or worse (takes very long... 100% blocked for about 5min for 16GB trimmed), and works just fine with firmware M017 (4s for 32GB trimmed). So maybe you just need an update.
> > 
> > Peter
> > 
> > 
> > 
> > On 07/06/17 18:39, Reed Dier wrote:
> >> Hi Wido,
> >> 
> >> I came across this ancient ML entry with no responses and wanted to follow up with you to see if you recalled any solution to this.
> >> Copying the ceph-users list to preserve any replies that may result for archival.
> >> 
> >> I have a couple of boxes with 10x Micron 5100 SATA SSD’s, journaled on Micron 9100 NVMe SSD’s; ceph 10.2.7; Ubuntu 16.04 4.8 kernel.
> >> 
> >> I have noticed now twice that I’ve had SSD’s flapping due to the fstrim eating up the io 100%.
> >> It eventually righted itself after a little less than 8 hours.
> >> Noout flag was set, so it didn’t create any unnecessary rebalance or whatnot.
> >> 
> >> Timeline showing that only 1 OSD ever went down at a time, but they seemed to go down in a rolling fashion during the fstrim session.
> >> You can actually see in the OSD graph all 10 OSD’s on this node go down 1 by 1 over time.
> >> 
> >> 
> >> And the OSD’s were going down because of:
> >> 
> >>> 2017-07-02 13:47:32.618752 7ff612721700  1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7ff5ecd0c700' had timed out after 15
> >>> 2017-07-02 13:47:32.618757 7ff612721700  1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7ff608d9e700' had timed out after 60
> >>> 2017-07-02 13:47:32.618760 7ff612721700  1 heartbeat_map is_healthy 'FileStore::op_tp thread 0x7ff608d9e700' had suicide timed out after 180
> >>> 2017-07-02 13:47:32.624567 7ff612721700 -1 common/HeartbeatMap.cc <http://heartbeatmap.cc/>: In function 'bool ceph::HeartbeatMap::_check(const ceph::heartbeat_handle_d*, const char*, time_t)' thread 7ff612721700 time 2017-07-02 13:47:32.618784
> >>> common/HeartbeatMap.cc <http://heartbeatmap.cc/>: 86: FAILED assert(0 == "hit suicide timeout")
> >> 
> >> 
> >> I am curious if you were able to nice it or something similar to mitigate this issue?
> >> Oddly, I have similar machines with Samsung SM863a’s with Intel P3700 journals that do not appear to be affected by the fstrim load issue despite identical weekly cron jobs enabled. Only the Micron drives (newer) have had these issues.
> >> 
> >> Appreciate any pointers,
> >> 
> >> Reed
> >> 
> >>> Wido den Hollander wido at 42on.com  <mailto:ceph-users%40lists.ceph.com?Subject=Re%3A%20%5Bceph-users%5D%20Watch%20for%20fstrim%20running%20on%20your%20Ubuntu%20systems&In-Reply-To=%3C5486BF08.3010505%4042on.com%3E>
> >>> Tue Dec 9 01:21:16 PST 2014
> >>> Hi,
> >>> 
> >>> Last sunday I got a call early in the morning that a Ceph cluster was
> >>> having some issues. Slow requests and OSDs marking each other down.
> >>> 
> >>> Since this is a 100% SSD cluster I was a bit confused and started
> >>> investigating.
> >>> 
> >>> It took me about 15 minutes to see that fstrim was running and was
> >>> utilizing the SSDs 100%.
> >>> 
> >>> On Ubuntu 14.04 there is a weekly CRON which executes fstrim-all. It
> >>> detects all mountpoints which can be trimmed and starts to trim those.
> >>> 
> >>> On the Intel SSDs used here it caused them to become 100% busy for a
> >>> couple of minutes. That was enough for them to no longer respond on
> >>> heartbeats, thus timing out and being marked down.
> >>> 
> >>> Luckily we had the "out interval" set to 1800 seconds on that cluster,
> >>> so no OSD was marked as "out".
> >>> 
> >>> fstrim-all does not execute fstrim with a ionice priority. From what I
> >>> understand, but haven't tested yet, is that running fstrim with ionice
> >>> -c Idle should solve this.
> >>> 
> >>> It's weird that this issue didn't come up earlier on that cluster, but
> >>> after killing fstrim all problems we resolved and the cluster ran
> >>> happily again.
> >>> 
> >>> So watch out for fstrim on early Sunday mornings on Ubuntu!
> >>> 
> >>> -- 
> >>> Wido den Hollander
> >>> 42on B.V.
> >>> Ceph trainer and consultant
> >> 
> >> 
> >> _______________________________________________
> >> ceph-users mailing list
> >> ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxxxxxx>
> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> > 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com