[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for High level sftp api for uploads and downloads

Subject: Re: Proposal for High level sftp api for uploads and downloads
From: Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
Reply-to: libssh@xxxxxxxxxx
Date: Tue, 6 Jun 2023 20:04:43 +0530
To: libssh@xxxxxxxxxx
Thanks a lot for the inputs.

Based on the suggestions the api will look something as follows :

enum sftp_file_transfer_direction_e
{
SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_LOCAL,
SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_REMOTE,
SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_LOCAL,
SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_REMOTE
} ;

struct sftp_file_transfer_struct
{
enum sftp_file_transfer_direction_e direction;

union
{
int local_fd;
sftp_file remote_file;
} source, target;

size_t chunk_size;
size_t in_flight_requests;

void *user_data;
void (*progress_callback)(------parameters not yet decided--------);
};

typedef struct sftp_file_transfer_struct* sftp_file_transfer;

sftp_file_transfer sftp_file_transfer_new()
{
/* Allocate a new file transfer structure, assign default
 * values to the members and return the structure's
 * address
 */
}

enum sftp_file_transfer_options_e
{
SFTP_FILE_TRANSFER_OPTIONS_DIRECTION,

SFTP_FILE_TRANSFER_OPTIONS_SOURCE_LOCAL,
SFTP_FILE_TRANSFER_OPTIONS_SOURCE_REMOTE,
SFTP_FILE_TRANSFER_OPTIONS_TARGET_LOCAL,
SFTP_FILE_TRANSFER_OPTIONS_TARGET_REMOTE,

SFTP_FILE_TRANSFER_OPTIONS_CHUNK_SIZE,
SFTP_FILE_TRANSFER_OPTIONS_IN_FLIGHT_REQESTS,

SFTP_FILE_TRANSFER_OPTIONS_USER_DATA
SFTP_FILE_TRANSFER_OPTIONS_PROGRESS_CALLBACK
} ;

int sftp_file_transfer_options_set  (sftp_file_transfer ft, enum
sftp_file_transfer_options_e option,  void *value);

void sftp_file_transfer_free(sftp_file_transfer ft);

If the short reads occurs for normal files, I think the right thing is
> to fail and report the issue to the user. If the file was already
> partially transferred, it might make sense to provide some information
> how much was transferred and maybe provide a way to resume partial
> transfers, but I would probably keep this for later time. But it would
> be good to think about this possible extension where we would need to
> update the offset in the file structure with the size of the target file
> (or again check how openssh does this).
>

For being able to support partial transfers, I think we should
begin read-writes on the files from current offsets of the local file,
remote sftp file instead of necessarily starting from offset 0.

 For example, say we are reading a file (for the sake of this sftp file
transfer) and the read offset for that file initially is 10 (when
sftp_file_transfer()
is called) and the file size is 1000 bytes. Then the read operation will be
started not from offset 0 but from offset 10 and the transfer will occur
for
(1000 - 10) bytes.

This will help to support and resume partial transfers, but it puts the
responsibility on
the user to set offsets depending on where he/she wants to begin read/write
from
before calling this sftp_file_transfer() to start read/writes from that
offset. If user
wants to start the transfer of a file from the offset 100 not 0, then he
must set the offset
of that file to 100 (using sftp_seek() for remote files and lseek() for
local files) and
then call sftp_file_transfer().

The user should also be made aware that a call to sftp_file_transfer()
updates the offset of the files. i.e if initially offset is 0 and he calls
the
sftp_file_transfer() for the file and it successfully transfers the whole
file
then the offset after sftp_file_transfer() returns will be at the end of
the
file.

The sftp_file_transfer() function will return the number of bytes
transferred
successfully and -1 on encountering some error before transferring even
a single byte provided the file to transfer had > 0 bytes. If the file to
transfer
is a 0 byte file the function will simply return 0.

(We can use off_t or int64_t for type of number of bytes transferred
[not unsigned as on error we have to return SSH_ERROR i.e -1], off_t
is the standard type for representing file size/file offsets, but on 32 bit
systems
its of 32 bits and signed which limits the file size it can store to 2^31-1
bytes
i.e 2147483647 bytes which is approx 2GB, but since files that are to be
transferred
can certainly be greater than 2GB therefore to be able to handle larger
files we
have to define a macro _FILE_OFFSET_BITS and set it to 64 or specify
-D_FILE_OFFSET=64 during compilation so that files larger than 2GB can be
handled on 32 bit systems)

off_t sftp_file_transfer(sftp file_transfer ft)
{
off_t bytes;

switch(ft->direction) {

case SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_LOCAL :
bytes  = sftp_file_transfer_local_to_local(ft);
break;

case SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_REMOTE :
bytes = sftp_file_transfer_local_to_remote(ft);
break;

case SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_REMOTE :
bytes = sftp_file_transfer_remote_to_remote(ft);
break;

default:
bytes = SSH_ERROR;
}

return bytes;
}


If you use these function which look simple to use you want to also add a
> progress callback passed down and a userdata pointer.
>
> This way a UI (also cmdline) could display some progress.
>

I think this suggestion was given so that a callback is called
as the bytes are transferred so that the user of the api can
display the progress of the file transfer on command line/UI.
Based on this suggestion two members userdata and progress_callback
have been added in the struct sftp_file_transfer_struct.

progress_callback() set by the user will be called after a
successful transfer (i.e on getting a successful response
for our request) and will be passed info which will help the
user to do something/act upon the progress in the transfer.

struct sftp_file_transfer_progress_struct
{
off_t bytes_transferred;
off_t bytes_total;
void *user_data;
};

void my_callback(const struct sftp_file_transfer_progress_struct *p)
{
This function will be written by the library user.

This function can access the members of the structure but
cannot change the value of the members of the structure p points to.
The structure that p points to is allocated by our api.
}

Note that for partial transfers bytes_total will not be the same as
the file_size.

This is the approach I prefer, but other ways to implement this
progress_callback() is passing it  :

 *  Percentage of transfer occurred along with user_data pointer.
 *  Passing the number of chunks successfully transferred (which can also
be
called as iteration number) along with the user_data pointer.

The problems (that currently come to my mind) with the introduction of
this progress_callback() are :

1) If copy-data extension is supported in the case of remote
to remote copy, then according to the protocol
(https://github.com/openssh/openssh-portable/blob/master/PROTOCOL#L581)
if we specify 0 as read-data-length then the server will read data
from the read handle till EOF is reached. So ideally in one copy-data
request to the server we can perform the whole remote to remote transfer.
But since the progress_callback() is called on the client side, it will
only come
to know about progress after the whole transfer has been completed (i.e on
getting a successful response to the previous copy-data request) and won't
be
able to show any actual progress as the remote to remote copy is occurring
on
the server side.

The solution to this is to send copy-data requests with read-data-length as
chunk-size and issue multiple copy-data requests, on successful completion
of each request call the progress_callback(). But this solution isn't
the best as
those many requests weren't  really needed when we could have done the
same job with just one request and with better performance (due to lesser
client-server interactions).

2) In case of local to local file copy, on linux the fastest way to do that
is to
use the sendfile() system call (
https://man7.org/linux/man-pages/man2/sendfile.2.html),
but using it has the same problem as 1), i.e we'll get to know about the
status of
the transfer after the whole transfer has completed and the
progress_callback()
will be called after the whole transfer occurs, and hence won't be of much
use to
show progress.

Again, the solution to this is similar to solution of 1) i.e calling
sendfile() for
chunk_size bytes and not for the whole file_size(or total_bytes < file_size
for partial transfers), and on each successful sendfile() call, we call
progress_callback().
But this solution is not the best as multiple calls to sendfile() weren't
needed
when we could've done the same thing with just one sendfile() call.

Currently the return type of the progress_callback() is kept void. We can
also keep its return type int and make it return 0(SSH_OK) if it worked
successfully and -1(SSH_ERROR) if it failed. And if the progress_callback()
fails then the sftp_file_transfer() does not continue further and returns.

Suggestions for the implementation of progress_callback() and the
problems associated with it as well as general suggestions regarding
the design of the discussed api are welcome.
Follow-Ups:
Re: Proposal for High level sftp api for uploads and downloads	Andreas Schneider <asn@xxxxxxxxxxxxxx>
Re: Proposal for High level sftp api for uploads and downloads	Jakub Jelen <jjelen@xxxxxxxxxx>
References:
Re: Proposal for High level sftp api for uploads and downloads	Andreas Schneider <asn@xxxxxxxxxxxxxx>
Archive administrator: postmaster@lists.cynapses.org