Re: Journal / WAL drive size?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanx for the advice. 

We're aiming for the highest possible performance on the cluster. The servers have 4 port 10Gbe NIC's and I have seen traffic upto about 9GB/s on )the current single port during testing. So I want to make sure the "cache drive" can handle the current load (9x fairly heavily loaded physical servers) and accommodate future growth with ease. 

In your testing, once you have reached 50GB DB, did you grow it bigger to see if it made any difference? Our storage is about 87TB. 

There is a lot of network traffic between the mail servers and spam servers. There's a Postfix and MS Exchange server. The mail 3 "mail" servers are constantly busy. As mail comes in, it goes to the spam filter, then runs through a couple different programs and custorm scripts to check for spam, virusses, malware, etc, and then delivered to either the Postfix or Exchange server depending on destination. Outgoing mail has an email footer injected, and also checked for spam and viruses. The delay on email, even internal, right now is very bad. And then there are 4 MS Windows servers with MS SQL and a lot of database traffic between the servers, and between the servers and office staff. Month-end accounting cause the servers to go into "overload frenzy" due to stock take and invoicing (both internal and to clients) 

On Fri, Nov 24, 2017 at 3:33 AM, David Byte <dbyte@xxxxxxxx> wrote:
The answer tends to be “it depends”. For my test system with 144 6TB drives, I use a 50GB DB partition. In another case of testing, we have a ratio of about 10GB per TB.   What you have to watch out for is the performance difference when the DB overruns the SSD partition.  

For the WAL, I provision 2GB and haven’t experienced any issues with that. You also probably will need to adjust the ratios, but that was covered in other threads previously. 

David Byte
Sr. Technical Strategist
IHV Alliances and Embedded
SUSE

Sent from my iPhone. Typos are Apple's fault. 

On Nov 23, 2017, at 3:19 PM, Rudi Ahlers <rudiahlers@xxxxxxxxx> wrote:

Hi Richard, 

So do you rely on the CEPH to automatically decide the WAL device's location and size?

On Thu, Nov 23, 2017 at 4:04 PM, Richard Hesketh <richard.hesketh@xxxxxxxxxxxx> wrote:
Keep in mind that as yet we don't really have good estimates for how large bluestore metadata DBs may become, but it will be somewhat proportional to your number of objects. Considering the size of your OSDs, a 15GB block.db partition is almost certainly too small. Unless you've got a compelling reason not to you should probably partition your SSDs to use all the available space. Personally, I manually partition my block.db devices into even partitions and then invoke creation with

ceph-disk prepare --bluestore /dev/sdX --block.db /dev/disk/by-partuuid/whateveritis

Invoking by UUID is done because if presented an existing partition rather than the root block device it will just symlink precisely to the argument given, so using /dev/sdXYZ arguments is dangerous as they may not be consistent across hardware changes or even just reboots depending on your system.

Rich

On 23/11/17 09:53, Caspar Smit wrote:
> Rudi,
>
> First off all do not deploy an OSD specifying the same seperate device for DB and WAL:
>
> Please read the following why:
>
> http://docs.ceph.com/docs/master/rados/configuration/bluestore-config-ref/
>
>
> That said you have a fairly large amount of SSD size available so i recommend using it as block.db:
>
> You can specify a fixed size block.db size in ceph.conf using:
>
> [global]
> bluestore_block_db_size = 16106127360
>
> The above is a 15GB block.db size
>
> Now when you deploy an OSD with a seperate block.db device the partition will be 15GB.
>
> The default size is a percentage of the device i believe and not always a usable amount.
>
> Caspar
>
> Met vriendelijke groet,
>
> Caspar Smit
> Systemengineer
> SuperNAS
> Dorsvlegelstraat 13
> 1445 PA Purmerend
>
> t: (+31) 299 410 414
> e: casparsmit@xxxxxxxxxxx <mailto:casparsmit@xxxxxxxxxxx>
> w: www.supernas.eu <http://www.supernas.eu>
>
> 2017-11-23 10:27 GMT+01:00 Rudi Ahlers <rudiahlers@xxxxxxxxx <mailto:rudiahlers@xxxxxxxxx>>:
>
>     Hi, 
>
>     Can someone please explain this to me in layman's terms. How big a WAL drive do I really need?
>
>     I have a 2x 400GB SSD drives used as WAL / DB drive and 4x 8TB HDD's used as OSD's. When I look at the drive partitions the DB / WAL partitions are only 576Mb & 1GB respectively. This feels a bit small. 
>
>
>     root@virt1:~# lsblk
>     NAME               MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>     sda                  8:0    0   7.3T  0 disk
>     ├─sda1               8:1    0   100M  0 part /var/lib/ceph/osd/ceph-0
>     └─sda2               8:2    0   7.3T  0 part
>     sdb                  8:16   0   7.3T  0 disk
>     ├─sdb1               8:17   0   100M  0 part /var/lib/ceph/osd/ceph-1
>     └─sdb2               8:18   0   7.3T  0 part
>     sdc                  8:32   0   7.3T  0 disk
>     ├─sdc1               8:33   0   100M  0 part /var/lib/ceph/osd/ceph-2
>     └─sdc2               8:34   0   7.3T  0 part
>     sdd                  8:48   0   7.3T  0 disk
>     ├─sdd1               8:49   0   100M  0 part /var/lib/ceph/osd/ceph-3
>     └─sdd2               8:50   0   7.3T  0 part
>     sde                  8:64   0 372.6G  0 disk
>     ├─sde1               8:65   0     1G  0 part
>     ├─sde2               8:66   0   576M  0 part
>     ├─sde3               8:67   0     1G  0 part
>     └─sde4               8:68   0   576M  0 part
>     sdf                  8:80   0 372.6G  0 disk
>     ├─sdf1               8:81   0     1G  0 part
>     ├─sdf2               8:82   0   576M  0 part
>     ├─sdf3               8:83   0     1G  0 part
>     └─sdf4               8:84   0   576M  0 part
>     sdg                  8:96   0   118G  0 disk
>     ├─sdg1               8:97   0     1M  0 part
>     ├─sdg2               8:98   0   256M  0 part /boot/efi
>     └─sdg3               8:99   0 117.8G  0 part
>       ├─pve-swap       253:0    0     8G  0 lvm  [SWAP]
>       ├─pve-root       253:1    0  29.3G  0 lvm  /
>       ├─pve-data_tmeta 253:2    0    68M  0 lvm
>       │ └─pve-data     253:4    0  65.9G  0 lvm
>       └─pve-data_tdata 253:3    0  65.9G  0 lvm
>         └─pve-data     253:4    0  65.9G  0 lvm
>
>
>
>
>     --
>     Kind Regards
>     Rudi Ahlers
>     Website: http://www.rudiahlers.co.za
>
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>


--
Richard Hesketh
Systems Engineer, Research Platforms
BBC Research & Development


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




--
Kind Regards
Rudi Ahlers
Website: http://www.rudiahlers.co.za
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Kind Regards
Rudi Ahlers
Website: http://www.rudiahlers.co.za
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux