Re: OSD and Journal Files

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Excellent overview Mike!

Mark

On 09/18/2013 10:03 AM, Mike Dawson wrote:
Ian,

There are two schools of thought here. Some people say, run the journal
on a separate partition on the spinner alongside the OSD partition, and
don't mess with SSDs for journals. This may be the best practice for an
architecture of high-density chassis.

The other design is to use SSDs for journals, but design with an
appropriate ratio of journals per SSD. Plus, you need to understand
losing an SSD will cause the loss of ALL of the OSDs which had their
journal on the failed SSD.

For now, I'll assume you want to use SSDs and offer some suggestions.

First, you probably don't want RAID1 for the journal SSDs. It isn't
particularly needed for resiliency and certainly isn't beneficial from a
throughput perspective.

Next, the best practice is to have enough throughput in the Journals
(SSDs) so your OSDs (spinners) aren't starved. Let's assume your SSDs
sustain writes at 450MB/s and the spinners can do 120MB/s.

450MB/s divided by 120MB/s = 3.75

Which I would round to a ratio of four OSD Journals on each SSD.

Since it appears you are using 24-drive chassis and the first two drives
are taken by the RAID1 set for the OS, you have 22 drives left. You
could do:

- 4 SSDs, each with 4 Journals
- 16 spinners, each running an OSD process
- 2 RAID1 OS
- 2 Empty

Or, if you want to push the ratio a bit farther (6 OSD journals on an SSD):

- 3 SSDs, each with 6 Journals
- 18 spinners, each running an OSD process
- 1 spinner for OS (no RAID1)

Because your 10Gb network will peak at 1,250MB/s the 6:1 ratio shown
above should be fine (as you're limited to ~70MB/s for each OSD by the
network anyway).

I think you'll be OK on CPU and RAM.

Journals are small (default of 1GB, I run 10GB). Create a 10GB
unformatted partition for each journal and leave the rest of the SSD
unallocated (it will be used for wear-leveling). If you use
high-endurance SSDs, you could certainly consider smaller drives as long
as they maintain sufficient performance characteristics.

Thanks,

Mike Dawson
Co-Founder & Director of Cloud Architecture
Cloudapt LLC


On 9/18/2013 9:52 AM, Ian_M_Porter@xxxxxxxx wrote:
*Dell - Internal Use - Confidential *

Hi,

I read in the ceph documentation that one of the main performance snags
in ceph was running the OSDs and journal files on the same disks and you
should consider at a minimum running the journals on SSDs.

Given I am looking to design a 150 TB cluster, I’m considering the
following configuration for the storage nodes

No of replicas: 3

Each node

·18 x 1 TB for storage (1 OSD per node, journals for each OSD are stored
to volume on SSD)

·2  x 512 GB SSD drives configured as RAID 1  to store the journal files
(assuming journal files are not replicated, correct me if Im wrong)

·2 x 300 GB drives for OS/software (RAID 1)

·48 GB RAM

·2 x 10 Gb for public and storage network

·1 x 1 Gb for management network

·Dual E2660 CPU

No of nodes required for 150 TB = 150*3/(18*1) = 25

Unfortunately I don’t have any metrics on the throughput into the
cluster so I can’t tell whether 512 GB for journal files will be
sufficient so it’s a best guess and may be overkill. Also, any concerns
regarding number of OSDs running on each node, ive seen some articles on
the web saying the sweet spot is around 8 OSDs per node?

Thanks

Ian

Dell Corporation Limited is registered in England and Wales. Company
Registration Number: 2081369
Registered address: Dell House, The Boulevard, Cain Road, Bracknell,
Berkshire, RG12 1LF, UK.
Company details for other Dell UK entities can be found on
www.dell.co.uk.



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com





[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux