[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GSOC Project discussion : Async sftp implementation
[Thread Prev] | [Thread Next]
- Subject: Re: GSOC Project discussion : Async sftp implementation
- From: Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
- Reply-to: libssh@xxxxxxxxxx
- Date: Tue, 7 Mar 2023 16:12:34 +0530
- To: libssh@xxxxxxxxxx
- Cc: tom.deseyn@xxxxxxxxx
Hi Tom, > It should be possible to implement sftp using the existing async > channel write and read apis, without the need for additional threads > or io_uring. > When an operation completes, a callback is emitted. > I tried to find the existing async channel read and write apis but couldn't find any, It would help a lot if you provide name of one of the functions of that api since the ssh_channel_read and ssh_channel_write defined in src/channels.c are both synchronous. As with all async things, you need to track on-going state. > Yes but that means some library code has to run concurrently along with the code of the user after the async api function returns to monitor and/or change the on-going state as the operation proceeds. This makes the use of something like threads necessary. > Depending on whether you want to allow 1 on-going operation per sftp > channel or several, things become more complex. > If it's 1, you can track the state together with the sftp channel. > If it's multiple, the user will need to hold some variable that > captures the state per on-going operation. > I believe an async api would make sense if we allow more than one operation per sftp channel, assuming a scenario where the user requests to download chunks from 10 different files(say) in a single sftp session with the connected server. That would mean 10 async operations for that channel. Please comment on these thoughts Thanks, Eshan Kelkar On Tue, Mar 7, 2023 at 3:30 PM Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx> wrote: > > > ---------- Forwarded message --------- > From: Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx> > Date: Tue, Mar 7, 2023 at 3:28 PM > Subject: Re: GSOC Project discussion : Async sftp implementation > To: Jakub Jelen <jjelen@xxxxxxxxxx> > > > Hello Jakub, > > I would suggest starting from the benchmark code >> we have in tests/benchmarks/, which has an example of async donwload >> (but there is only the download part -- the upload part does not exist >> now). >> > > As per your suggestion, I have read that example and have gone through the > source code to understand how the current async read is working using > sftp_async_read_begin and sftp_async_read. Please correct me if I am wrong > but the way I understand how the api currently works is like this. > > First let me go through the synchronous read function which will help us > to better understand how the async read api works : > > [sync_read refers to the sftp_read function in normal(blocking mode) in > src/sftp.c] > sync_read(file_handle, buffer_to_read_into, count) > { > NOTE : In the text sftp session refers to the session corresponding to the > file_handle received in the parameter > > //Phase-1 (Registering the request) > //---------------------------------------------- > Step- 1 : Get an id for this read request with respect to the sftp session > corresponding > to the file whose handle is received in the parameter. > Step-2 : Pack a ssh_buffer with the id, file handle, offset and count of > bytes to read. > Step-3 : Write a packet using sftp_packet_write to send a read request > along with > this ssh_buffer. > Step-4 : After a successful Step-3 free this ssh_buffer as its no longer > needed and > its data has been sent along with the packet > > //Phase-2(Waiting for the request to get processed) > -------------------------------------------------------------------- > Now the packet gets sent along with the request id. The server processes it > and sends a packet back again containing the same id as in the request. > The id is kept the same to match the request and response. The incoming > packets > from the server are read using sftp_packet_read_and_dispatch which reads a > packet > and adds it to a queue of messages from the server corresponding to this > sftp session. > > The waiting is done like this > msg=NULL; > while(msg==NULL) > { > read_and_dispatch - read a response packet and add message to queue > sftp_dequeue - try to dequeue a message from the queue corresponding to > the id we got in Phase-1 and assign whatever it returns to the msg > variable. In case > the queue doesn't contain any message with the id of Phase-1 the dequeue > function > returns NULL which gets assigned to msg and the loop runs again. > }//while(msg==NULL) ends > > After we receive the message corresponding to the send request. > If the message contains some data(i.e msg->packet_type is SSH_FXP_DATA), > write that data in the location whose address has been received in the > parameter. > > So the loop of Phase-2 is responsible for the waiting, which is > essentially reading the > response packet adding the message received in it to a queue(corresponding > to a sftp session) > and then the loop checks whether the response received was corresponding > to the request sent > (matching of request and response based on the id of Phase-1) and if not > then the loop runs again. > }//sync_read ends > > This explanation of synchronous read is important to understand how the > asynchronous read currently > works, asynchronous read api is divided into two functions > sftp_async_read_begin and sftp_async_read > > sftp_async_read_begin(file_handle, count) > { > Phase-1's code(Registering request by sending a packet) of sync_read comes > here > and return the id for the request to the user which he'll pass while > calling sftp_async_read > } > > sftp_async_read(file_handle, where_to_store, bytes_to_read, request_id) > { > Phase-2's code to wait for the response message corresponding to the > request with that > request_id as received in parameter and then writing the data which came > in that message > comes here > } > > Now one problem I notice in this approach is that Phase-1's code for > sending/registering > a request involves the use of sftp_packet_write and sftp_packet_write > internally > uses ssh_channel_write (defined in src/channels.c) and this > ssh_channel_write is synchronous > write (it may block if unable to write to the channel). Hence according to > me sftp_async_read_begin > may also block if say the channel corresponding to that sftp session is > too much saturated/dirty > with pending writes. So technically this operation is not an asynchronous > one, its synchronous in > the sense that the control will return after the packet for registering > the read request has been written > to the underlying send buffer (corresponding to the channel). > > So is this current state of async read acceptable ? > > One may argue that the chances of blocking in case of registering a read > request are less > because we're sending less info in the packet : request id, from where we > have to read and from > what offset, but still a scope for blocking still exists and certainly > this kind of approach won't work for async write. > As in the write request we also send the data to write using > sftp_packet_write and in this case the chances of > blocking are significant if too much data is to be sent. > > Kindly comment on my interpretation of the code, and answer whether the > current state of async > read is as desired or not. If not, please give a rough overview of how it > should be and > what is expected out of the async libssh api. > (For example - my interpretation of the async read is that user issues a > read > request using an api function call, the function returns and he continues > to do > what he wants to do while the the read request is handled by the api and > the > data received from the server is written to the buffer supplied by the > user. API has some > means to communicate[via callbacks or some data structure of which user > has access] > to the user about the state of the operation) > > Thanks, > Eshan Kelkar > > > > > On Sun, Mar 5, 2023 at 5:30 PM Jakub Jelen <jjelen@xxxxxxxxxx> wrote: > >> On 3/3/23 06:36, Eshan Kelkar wrote: >> > Hi, I am Eshan Kelkar and would like to create the async sftp for >> libssh >> > as the GSOC project. I have gone through liburing to understand what >> > async i/o is and how it is implemented using io_uring. So in this async >> > sftp implementation I believe we can place calls to the liburing api >> > functions from inside of the async sftp api functions so that things >> > occur asynchronously. >> > >> > Another approach that comes to my mind is that on a call to async sftp >> > api function a seperate thread gets created which does the waiting and >> > all and on completion places a call to the callback function notifying >> > that the operation has occurred. The second approach is async >> > conceptually as the user of api can continue his job after the call as >> > the waiting occurs on the separate thread but this approach seems a bit >> > "naive" as for each api call a new thread gets created which is >> resource >> > expensive. >> > >> > Kindly comment on these two approaches and suggest any other approach >> > which you have in mind to implement the async sftp api, those >> > suggestions will help me prepare better before sending in the proposal. >> >> Hello Eshan, >> first of all, sorry for late reply. I saw your message on IRC, but >> before I got back to reply, you were already away so thank you for >> patience to reach out to us on other channels. >> >> The async SFTP implementation is one of our priorities and one of the >> more complicated tasks. I would suggest starting from the benchmark code >> we have in tests/benchmarks/, which has an example of async donwload >> (but there is only the download part -- the upload part does not exist >> now). >> >> I do not think it is a good idea to spawn more threads as it would >> require a lot of synchronization. The example in the benchmarks can run >> several download requests from a single thread, which can help saturate >> the network connection without the need for threads. >> >> I did not read much about io_uring yet, but it sounds like it solves the >> issues we have with speed of synchronous writes/reads caused by context >> switching so this would be our preference. There are already some >> hints/comments in the following issues, so if you will have some more >> questions, comments, feel free to ask here or in either of the following >> issues: >> >> https://gitlab.com/libssh/libssh-mirror/-/issues/65 >> https://gitlab.com/libssh/libssh-mirror/-/issues/124 >> >> Regards, >> -- >> Jakub Jelen >> Crypto Team, Security Engineering >> Red Hat, Inc. >> >>
GSOC Project discussion : Async sftp implementation | Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx> |
Re: GSOC Project discussion : Async sftp implementation | Jakub Jelen <jjelen@xxxxxxxxxx> |
Fwd: GSOC Project discussion : Async sftp implementation | Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx> |