On Wed, Jan 25, 2023 at 5:57 PM Stefan Hajnoczi <stefanha@xxxxxxxxxx> wrote: > > Hi, > What sort of memory usage is expected under heavy I/O to an rbd block > device with O_DIRECT? > > For example: > - Page cache: none (O_DIRECT) > - Socket snd/rcv buffers: yes Hi Stefan, There is a socket open to each OSD (object storage daemon). A Ceph cluster may have tens, hundreds or even thousands of OSDs (although the latter is rare -- usually folks end up with several smaller clusters instead a single large cluster). Under heavy random I/O and given a big enough RBD image, it's reasonable to assume that most if not all OSDs would be involved and therefore their sessions would be active. A thing to note is that, by default, OSD sessions are shared between RBD devices. So as long as all RBD images that are mapped on a node belong to the same cluster, the same set of sockets would be used. Idle OSD sockets get closed after 60 seconds of inactivity. > - Internal rbd buffers? > > I am trying to understand how similar Linux rbd block devices behave > compared to local block device memory consumption (like NVMe PCI). RBD doesn't do any internal buffering. Data is read from/written to the wire directly to/from BIO pages. The only exception to that is the "secure" mode -- built-in encryption for Ceph on-the-wire protocol. In that case the data is buffered, partly because RBD obviously can't mess with plaintext data in the BIO and partly because the Linux kernel crypto API isn't flexible enough. There is some memory overhead associated with each I/O (OSD request metadata encoding, mostly). It's surely larger than in the NVMe PCI case. I don't have the exact number but it should be less than 4K per I/O in almost all cases. This memory is coming out of private SLAB caches and could be reclaimable had we set SLAB_RECLAIM_ACCOUNT on them. Thanks, Ilya