RE: [LSF/MM/BPF TOPIC] Cloud storage optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Martin K. Petersen [mailto:martin.petersen@xxxxxxxxxx]
> Sent: Thursday, March 9, 2023 2:28 PM
> To: Luis Chamberlain <mcgrof@xxxxxxxxxx>
> Cc: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>; Dan
> Helmick <dan.helmick@xxxxxxxxxxx>; Martin K. Petersen
> <martin.petersen@xxxxxxxxxx>; Javier González
> <javier.gonz@xxxxxxxxxxx>; Matthew Wilcox <willy@xxxxxxxxxxxxx>;
> Theodore Ts'o <tytso@xxxxxxx>; Hannes Reinecke <hare@xxxxxxx>; Keith
> Busch <kbusch@xxxxxxxxxx>; Pankaj Raghav <p.raghav@xxxxxxxxxxx>;
> Daniel Gomez <da.gomez@xxxxxxxxxxx>; lsf-pc@xxxxxxxxxxxxxxxxxxxxxxxxxx;
> linux-fsdevel@xxxxxxxxxxxxxxx; linux-mm@xxxxxxxxx; linux-
> block@xxxxxxxxxxxxxxx
> Subject: Re: [LSF/MM/BPF TOPIC] Cloud storage optimizations
> 
> 
> Luis,
> 
> > A big future question is of course how / when to use these for
> > filesystems.  Should there be, for instance a 'mkfs --optimal-bs' or
> > something which may look whatever hints the media uses ? Or do we just
> > leaves the magic incantations to the admins?
> 
> mkfs already considers the reported queue limits (for the filesystems most
> people use, anyway).
> 
> The problem is mainly that the devices don't report them. At least not very
> often in the NVMe space. For SCSI devices, reporting these parameters is
> quite common.
> 
> --
> Martin K. Petersen	Oracle Linux Engineering

Support for the NVMe Optimal Performance parameters is increasing in the vendor ecosystem.  Customers are requiring this more and more from the vendors.  For example, the OCP DC NVMe SSD spec has NVMe-AD-2 and NVMe-OPT-7 [1].  Momentum is continuing as Optimal Read parameters were recently added to NVMe too.  More companies adding these parameters as a drive requirement to drive vendors would definitely help the momentum further.  

I think there has been confusion among the vendors in the past on how to set various values for the best Host behavior.  There are multiple (sometimes minor) inflection points in the performance of a drive.  Sure.  4KB is too small to report by the drive, but shall we report our 16KB, 128KB, or some other inflection?  How big of a value can we push this?  We would always favor the bigger number.  

There are benefits for both Host and Drive (HDD and SSD) to have larger IOs.  Even if you have a drive reporting incorrect optimal parameters today, one can incubate the SW changes with larger IOs.  If nothing else, you'll instantly save on the overheads of communicating the higher number of commands.  Further doing an IO sized to be a multiple of the optimal parameters is also optimal.  Enabling anything in the range 16KB - 64KB would likely be a great start.  

[1] https://www.opencompute.org/documents/datacenter-nvme-ssd-specification-v2-0r21-pdf


Dan




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux