On Wed, Jul 14, 2021 at 5:45 AM ronnie sahlberg <ronniesahlberg@xxxxxxxxx> wrote: > > On Wed, Jul 14, 2021 at 8:58 AM Pavel Shilovsky <piastryyy@xxxxxxxxx> wrote: > > > > пн, 12 июл. 2021 г. в 22:41,Steve French <smfrench@xxxxxxxxx>: > > > > > > And don't forget "esize" mount option to allow offload of decryption... > > > > > > On Tue, Jul 13, 2021, 00:37 Shyam Prasad N <nspmangalore@xxxxxxxxx> wrote: > > >> > > >> Hi all, > > >> > > >> I'm trying to understand our read path (big picture is to integrate > > >> the new netfs helpers into cifs.ko), and I wanted to get some > > >> understanding about why things are the way they are. @Pavel Shilovsky > > >> I'm guessing that you've worked on most of this code, so hoping that > > >> you'll be able to answer some of these: > > > > Thanks for taking a look at this. > > > > >> > > >> 1. for cases where both encryption and signing are disabled, it looks > > >> like we read from the socket page-by-page directly into pages and map > > >> those to the inode address space > > > > Yes, we read directly into pre-allocated pages. For direct IO we read > > into pages supplied by the user and for non-direct IO (including page > > IO) we allocate intermediate pages. > > > > >> > > >> 2. for cases where encryption is enabled, we read the encrypted data > > >> into a set of pages first, then call crypto APIs, which decrypt the > > >> data in-place to the same pages. We then copy out the decrypted data > > >> from these pages into the pages in the address space that are already > > >> mapped. > > > > Correct. Regardless of whether offloading is used or not there is an extra copy. > > > > >> > > >> 3. similarly for the signing case, we read the data into a set of > > >> pages first, then verify signature, then copy the pages to the mapped > > >> pages in the inode address space. > > > > Yes. > > > > >> > > >> for case 1, can't we read from the socket directly into the mapped > > >> pages? > > > > Yes - assuming no signing or encryption, we should be able to read > > directly into the mapped pages. But I doubt many people use SMB2 > > without signing or encryption although there may be some use cases > > requiring performance over safety. > > > > >> for case 2 and 3, instead of allocating temporary pages to read > > >> into, can't we read directly into the pages of the inode address > > >> space? the only risk I see is that if decryption or sign verification > > >> fails, it would overwrite the data that is already present in the page > > >> cache. but I see that as acceptable, because we're reading from the > > >> server, since the data we have in the cache is stale anyways. > > > > Theoretically we may try doing this with signing but we need to make > > sure that no one can modify those pages locally which may lead to > > signing errors. I don't think today we reconnect an SMB session on > > those today but we probably should. > > How can someone modify those pages? Don't we keep the pages locked when the read is in progress? > > For encryption, we do not know where read data starts in an encrypted > > SMB packet, so there is almost no point to read directly because we > > would need to shift (copy) the data afterwards anyway. > Understood. > Yes. But even if we knew where the data starts, it will not be aligned > to a page anyway so we need to memcpy() it anyway. > > But I think it is possible, with a lot of effort. And probably not > worth the effort. > > But, I have been toying with the idea of trying to do "zero-copy" SMB2 > READ in libsmb2. > It, does read all the unencrypted READ payloads straight into the > application buffer without a copy > already, but just like cifs.ko it uses an intermediate buffer for > encrypted packets.And thus an extra memcpy() > > The solution I was thinking of would be something like : > 1, Read the first 4 + 64 bytes into a buffer. This will be > enough to hold, either > 1a, unencrypted case: > inspect the SMB2 header to find the opcode, then read the > response header (16 bytes for smb2 read/write responses) > then read the payload straight into the application buffer. > I do this already and it works without too much pain. > Now we are finished for the unencrypted case. > 2, encrypted case: > In this case we have the full transform header and also part of > an encrypted smb2 header. > We know how big the transform header is so we can copy the > partial smb2 header into a 64 byte > temp buffer and then read the remaining part of the smb2 header. > 3, Start decryption, but only run it for the first 64 bytes, i.e. > the smb2 header. > 3a, OPCODE != READ: Inspect the decrypted smb2 header, IF it is NOT a > read response then just read all remaining data into > a temp buffer, continue the decryption and handle it the normal way. > 4, If the opcode is a READ, then read an additional 16 bytes into a > temp buffer and continue the decryption of these 16 bytes. > 5, What follows now is the read response payload so read this > decrypted data straight into the application buffer (or pages) > and again, continue the decryption and decrypt the data in-place. > Voila, zero-copy encrypted read handling. Which would really > help on a PS2 since memcpy() is pretty expensive on its very slow > memory. > > > I think this would work, and it would be very interesting to implement. > Not sure if it is worth it to do in kernel code because there will be > hairy special conditions to handle. > Very interesting. Understood that this will be significant changes. But curious as to what special conditions you're referring to? > > For kernel code I think it would be better/safer to have fast hw > offload for memcpy() > Understood. > > regards > ronnie sahlberg > > > > > > -- > > Best regards, > > Pavel Shilovsky -- Regards, Shyam