[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: GSOC Project discussion : Async sftp implementation

Subject: Re: GSOC Project discussion : Async sftp implementation
From: Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
Reply-to: libssh@xxxxxxxxxx
Date: Tue, 7 Mar 2023 16:12:34 +0530
To: libssh@xxxxxxxxxx
Cc: tom.deseyn@xxxxxxxxx
Hi Tom,


> It should be possible to implement sftp using the existing async
> channel write and read apis, without the need for additional threads
> or io_uring.
> When an operation completes, a callback is emitted.
>

I tried to find the existing async channel read and write apis
but couldn't find any, It would help a lot if you provide name
of one of the functions of that api since the ssh_channel_read
and ssh_channel_write defined in src/channels.c are both synchronous.

As with all async things, you need to track on-going state.
>

Yes but that means some library code has to run concurrently
along with the code of the user after the async api function
returns to monitor and/or change the on-going state as the
operation proceeds. This makes the use of something like
threads necessary.


> Depending on whether you want to allow 1 on-going operation per sftp
> channel or several, things become more complex.
> If it's 1, you can track the state together with the sftp channel.
> If it's multiple, the user will need to hold some variable that
> captures the state per on-going operation.
>

I believe an async api would make sense if we allow more than
one operation per sftp channel, assuming a scenario where the user
requests to download chunks from 10 different files(say) in a single sftp
session with the connected server. That would mean 10 async operations
for that channel.

Please comment on these thoughts
Thanks,
Eshan Kelkar




On Tue, Mar 7, 2023 at 3:30 PM Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
wrote:

>
>
> ---------- Forwarded message ---------
> From: Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
> Date: Tue, Mar 7, 2023 at 3:28 PM
> Subject: Re: GSOC Project discussion : Async sftp implementation
> To: Jakub Jelen <jjelen@xxxxxxxxxx>
>
>
> Hello Jakub,
>
> I would suggest starting from the benchmark code
>> we have in tests/benchmarks/, which has an example of async donwload
>> (but there is only the download part -- the upload part does not exist
>> now).
>>
>
> As per your suggestion, I have read that example and have gone through the
> source code to understand how the current async read is working using
> sftp_async_read_begin and sftp_async_read. Please correct me if I am wrong
> but the way I understand how the api currently works is like this.
>
> First let me go through the synchronous read function which will help us
> to better understand how the async read api works :
>
> [sync_read refers to the sftp_read function in normal(blocking mode) in
> src/sftp.c]
> sync_read(file_handle, buffer_to_read_into, count)
> {
> NOTE : In the text sftp session refers to the session corresponding to the
> file_handle received in the  parameter
>
> //Phase-1 (Registering the request)
> //----------------------------------------------
> Step- 1 : Get an id for this read request with respect to the sftp session
> corresponding
> to the file whose handle is received in the parameter.
> Step-2 : Pack  a ssh_buffer with the id, file handle, offset and count of
> bytes to read.
> Step-3 : Write a packet using sftp_packet_write to send a read request
> along with
> this ssh_buffer.
> Step-4 : After a successful Step-3 free this ssh_buffer as its no longer
> needed and
> its data has been sent along with the packet
>
> //Phase-2(Waiting for the request to get processed)
> --------------------------------------------------------------------
> Now the packet gets sent along with the request id. The server processes it
> and sends a packet back again containing the same id as in the request.
> The id is kept the same to match the request and response. The incoming
> packets
> from the server are read using sftp_packet_read_and_dispatch which reads a
> packet
> and adds it to a queue of messages from the server corresponding to this
> sftp session.
>
> The waiting is done like this
> msg=NULL;
> while(msg==NULL)
> {
> read_and_dispatch - read a response packet and add message to queue
> sftp_dequeue - try to dequeue a message from the queue corresponding to
> the id we got in Phase-1 and assign whatever it returns to the msg
> variable. In case
> the queue doesn't contain any message with the id of Phase-1 the dequeue
> function
> returns NULL which gets assigned to msg and the loop runs again.
> }//while(msg==NULL) ends
>
> After we receive the message corresponding to the send request.
> If the message contains some data(i.e msg->packet_type is SSH_FXP_DATA),
> write that data in the location whose address has been received in the
> parameter.
>
> So the loop of Phase-2 is responsible for the waiting, which is
> essentially reading the
> response packet adding the message received in it to a queue(corresponding
> to a sftp session)
> and then the loop checks whether the response received was corresponding
> to the request sent
> (matching of request and response based on the id of Phase-1) and if not
> then the loop runs again.
> }//sync_read ends
>
> This explanation of synchronous read is important to understand how the
> asynchronous read currently
> works, asynchronous read api is divided into two functions
> sftp_async_read_begin and sftp_async_read
>
> sftp_async_read_begin(file_handle, count)
> {
> Phase-1's code(Registering request by sending a packet) of sync_read comes
> here
> and return the id for the request to the user which he'll pass while
> calling sftp_async_read
> }
>
> sftp_async_read(file_handle, where_to_store, bytes_to_read, request_id)
> {
> Phase-2's code to wait for the response message corresponding to the
> request with that
> request_id as received in parameter and then writing the data which came
> in that message
> comes here
> }
>
> Now one problem I notice in this approach is that Phase-1's code for
> sending/registering
> a request involves the use of sftp_packet_write and sftp_packet_write
> internally
> uses ssh_channel_write (defined in src/channels.c) and this
> ssh_channel_write is synchronous
> write (it may block if unable to write to the channel). Hence according to
> me sftp_async_read_begin
> may also block if say the channel corresponding to that sftp session is
> too much saturated/dirty
> with pending writes. So technically this operation is not an asynchronous
> one, its synchronous in
> the sense that the control will return after the packet for registering
> the read request has been written
> to the underlying send buffer (corresponding to the channel).
>
> So is this current state of async read acceptable ?
>
> One may argue that the chances of blocking in case of registering a read
> request are less
> because we're sending less info in the packet  : request id, from where we
> have to read and from
> what offset, but still a scope for blocking still exists and certainly
> this kind of approach won't work for async write.
> As in the write request we also send the data to write using
> sftp_packet_write and in this case the chances of
> blocking are significant if too much data is to be sent.
>
> Kindly comment on my interpretation of the code, and answer whether the
> current state of async
> read is as desired or not. If not, please give a rough overview of how it
> should be and
> what is expected out of the async libssh api.
> (For example - my interpretation of the async read is that user issues a
> read
> request using an api function call, the function returns and he continues
> to do
> what he wants to do while the the read request is handled by the api and
> the
> data received from the server is written to the buffer supplied by the
> user. API has some
> means to communicate[via callbacks or some data structure of which user
> has access]
> to the user about the state of the operation)
>
> Thanks,
> Eshan Kelkar
>
>
>
>
> On Sun, Mar 5, 2023 at 5:30 PM Jakub Jelen <jjelen@xxxxxxxxxx> wrote:
>
>> On 3/3/23 06:36, Eshan Kelkar wrote:
>> > Hi, I am Eshan Kelkar and would like to create the async sftp for
>> libssh
>> > as the GSOC project. I have gone through liburing to understand what
>> > async i/o is and how it is implemented using io_uring. So in this async
>> > sftp implementation I believe we can place calls to the liburing api
>> > functions from inside of the async sftp api functions so that things
>> > occur asynchronously.
>> >
>> > Another approach that comes to my mind is that on a call to async sftp
>> > api function a seperate thread gets created which does the waiting and
>> > all and on completion places a call to the callback function notifying
>> > that the operation has occurred. The second approach is async
>> > conceptually as the user of api can continue his job after the call as
>> > the waiting occurs on the separate thread but this approach seems a bit
>> > "naive" as for each api call a new thread gets created which is
>> resource
>> > expensive.
>> >
>> > Kindly comment on these two approaches and suggest any other approach
>> > which you have in mind to implement the async sftp api, those
>> > suggestions will help me prepare better before sending in the proposal.
>>
>> Hello Eshan,
>> first of all, sorry for late reply. I saw your message on IRC, but
>> before I got back to reply, you were already away so thank you for
>> patience to reach out to us on other channels.
>>
>> The async SFTP implementation is one of our priorities and one of the
>> more complicated tasks. I would suggest starting from the benchmark code
>> we have in tests/benchmarks/, which has an example of async donwload
>> (but there is only the download part -- the upload part does not exist
>> now).
>>
>> I do not think it is a good idea to spawn more threads as it would
>> require a lot of synchronization. The example in the benchmarks can run
>> several download requests from a single thread, which can help saturate
>> the network connection without the need for threads.
>>
>> I did not read much about io_uring yet, but it sounds like it solves the
>> issues we have with speed of synchronous writes/reads caused by context
>> switching so this would be our preference. There are already some
>> hints/comments in the following issues, so if you will have some more
>> questions, comments, feel free to ask here or in either of the following
>> issues:
>>
>> https://gitlab.com/libssh/libssh-mirror/-/issues/65
>> https://gitlab.com/libssh/libssh-mirror/-/issues/124
>>
>> Regards,
>> --
>> Jakub Jelen
>> Crypto Team, Security Engineering
>> Red Hat, Inc.
>>
>>
References:
GSOC Project discussion : Async sftp implementation	Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
Re: GSOC Project discussion : Async sftp implementation	Jakub Jelen <jjelen@xxxxxxxxxx>
Fwd: GSOC Project discussion : Async sftp implementation	Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
Archive administrator: postmaster@lists.cynapses.org