Re: journal or cache tier on SSDs ?

Christian Balzer <chibi@xxxxxxx> · Wed, 11 May 2016 10:35:04 +0900

On Tue, 10 May 2016 17:51:24 +0200 Yoann Moulin wrote:

[snip]
> >>>> Journal or cache Storage : 2 x SSD 400GB Intel S3300 DC (no Raid)
> >>>
> >>> These SSDs do not exist according to the Intel site and the only
> >>> references I can find for them are on "no longer available" European
> >>> sites.
> >>
> >> I made a mistake, it's not 400 but 480GB, smartctl give me Model
> >> SSDSC2BB480H4
> >>
> > OK, that's not good.
> > Firstly, that model number still doesn't get us any hits from Intel,
> > strangely enough.
> > 
> > Secondly, it is 480GB (instead of 400, which denotes overprovisioning)
> > and matches the 3510 480GB model up to the last 2 characters.
> > And that has an endurance of 275TBW, not something you want to use for
> > either journals or cache pools.
> 
> I see, here the information from the resseler :
> 
> "The S3300 series is the OEM version of S3510 and 1:1 the same drive"
> 
Given the SMART output below, it seems to be 3500 based, but that doesn't
change things.

> >>> Without knowing the specifications for these SSDs, I can't recommend
> >>> them. I'd use DC S3610 or 3710 instead, this very much depends on how
> >>> much endurance (TPW) you need.
> >>
> >> As I write above, I already have those SSDs so I look for the best
> >> setup with the hardware I have.
> >>
> > 
> > Unless they have at least an endurance of 3 DWPD like the 361x (and
> > their model number, size and the 3300 naming suggests they do NOT),
> > your 480GB SSDs aren't suited for intense Ceph usage.
> > 
> > How much have you used them yet and what is their smartctl status, in
> > particular these values (from a 800GB DC S3610 in my cache pool):
> > ---
> > 232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail
> > Always       -       0 233 Media_Wearout_Indicator 0x0032   100
> > 100   000    Old_age   Always       -       0 241
> > Host_Writes_32MiB       0x0032   100   100   000    Old_age
> > Always       -       869293 242 Host_Reads_32MiB        0x0032   100
> > 100   000    Old_age   Always       -       43435 243
> > NAND_Writes_32MiB       0x0032   100   100   000    Old_age
> > Always       -       1300884 ---
> > 
> > Not even 1% down after 40TBW, at which point your SSDs are likely to be
> > 15% down...
> 
> More or less the same value on the 10 hosts I have on my beta cluster :
> 
> 232 Available_Reservd_Space 0x0033 100 100 010 Pre-fail Always - 0
> 233 Media_Wearout_Indicator 0x0032 100 100 000 Old_age  Always - 0
> 241 Total_LBAs_Written      0x0032 100 100 000 Old_age  Always - 233252
> 242 Total_LBAs_Read         0x0032 100 100 000 Old_age  Always - 13
> 

>From the read count it's obvious that you used those as journals. ^.^

As I hinted above, if these were 3510 based they also should have the 243
attribute, as in my 3610 example.
You may want to upgrade your smartctl and/or it's definition DB (on Debian
that can be done with "update-smart-drivedb").

Intel's calculation of the media wearout always seems to be very fuzzy to
me, given your 7TB written I'd expect it to be 98%, at least 99%.

But then again a 200GB DC S3700 of mine has written 90TB out of 3650TB
total and is at 99%, when I would expect it to be at 98%. 

Either way, those SSDs are designed for 275TBW (or 0.3 DWPD), and if they
are used as journals they will expire quickly when those 100TB+ datasets
get updated.

They _might_ survive longer with a very carefully tuned cache tier
(promote only really hot objects), but the risk of loosing SSDs there can
be even higher than with journals.

[snap]

Regards,

Christian
-- 
Christian Balzer        Network/Systems Engineer                
chibi@xxxxxxx   	Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com