[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for High level sftp api for uploads and downloads

Subject: Re: Proposal for High level sftp api for uploads and downloads
From: Jakub Jelen <jjelen@xxxxxxxxxx>
Reply-to: libssh@xxxxxxxxxx
Date: Wed, 7 Jun 2023 11:19:11 +0200
To: libssh@xxxxxxxxxx
On Tue, Jun 6, 2023 at 4:41 PM Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx> wrote:
>
> Thanks a lot for the inputs.
>
> Based on the suggestions the api will look something as follows :
>
> enum sftp_file_transfer_direction_e
> {
> SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_LOCAL,
> SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_REMOTE,
> SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_LOCAL,
> SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_REMOTE
> } ;
>
> struct sftp_file_transfer_struct
> {
> enum sftp_file_transfer_direction_e direction;
>
> union
> {
> int local_fd;
> sftp_file remote_file;
> } source, target;
>
> size_t chunk_size;
> size_t in_flight_requests;
>
> void *user_data;
> void (*progress_callback)(------parameters not yet decided--------);
> };
>
> typedef struct sftp_file_transfer_struct* sftp_file_transfer;
>
> sftp_file_transfer sftp_file_transfer_new()
> {
> /* Allocate a new file transfer structure, assign default
>  * values to the members and return the structure's
>  * address
>  */
> }
>
> enum sftp_file_transfer_options_e
> {
> SFTP_FILE_TRANSFER_OPTIONS_DIRECTION,
>
> SFTP_FILE_TRANSFER_OPTIONS_SOURCE_LOCAL,
> SFTP_FILE_TRANSFER_OPTIONS_SOURCE_REMOTE,
> SFTP_FILE_TRANSFER_OPTIONS_TARGET_LOCAL,
> SFTP_FILE_TRANSFER_OPTIONS_TARGET_REMOTE,
>
> SFTP_FILE_TRANSFER_OPTIONS_CHUNK_SIZE,
> SFTP_FILE_TRANSFER_OPTIONS_IN_FLIGHT_REQESTS,
>
> SFTP_FILE_TRANSFER_OPTIONS_USER_DATA
> SFTP_FILE_TRANSFER_OPTIONS_PROGRESS_CALLBACK
> } ;


Note here that I am a bit worried about the length of the names as
they already have ~40 characters, fitting them reasonably on one line
will be hard.

I would probably propose to drop the FILE_TRANSFER part or shorten it
to FT or similar. The same for the direction enum.

> int sftp_file_transfer_options_set  (sftp_file_transfer ft, enum sftp_file_transfer_options_e option,  void *value);
>
> void sftp_file_transfer_free(sftp_file_transfer ft);
>
>> If the short reads occurs for normal files, I think the right thing is
>> to fail and report the issue to the user. If the file was already
>> partially transferred, it might make sense to provide some information
>> how much was transferred and maybe provide a way to resume partial
>> transfers, but I would probably keep this for later time. But it would
>> be good to think about this possible extension where we would need to
>> update the offset in the file structure with the size of the target file
>> (or again check how openssh does this).
>
>
> For being able to support partial transfers, I think we should
> begin read-writes on the files from current offsets of the local file,
> remote sftp file instead of necessarily starting from offset 0.
>
>  For example, say we are reading a file (for the sake of this sftp file
> transfer) and the read offset for that file initially is 10 (when sftp_file_transfer()
> is called) and the file size is 1000 bytes. Then the read operation will be
> started not from offset 0 but from offset 10 and the transfer will occur for
> (1000 - 10) bytes.
>
> This will help to support and resume partial transfers, but it puts the responsibility on
> the user to set offsets depending on where he/she wants to begin read/write from
> before calling this sftp_file_transfer() to start read/writes from that offset. If user
> wants to start the transfer of a file from the offset 100 not 0, then he must set the offset
> of that file to 100 (using sftp_seek() for remote files and lseek() for local files) and
> then call sftp_file_transfer().

I think this approach is suitable for the low-level api, but the
high-level API needs to handle this somehow internally automatically.
I think by default it should truncate the target file and reset the
offset of the source file to 0. If the user elects to use some offset,
I think it should be given as an high-level API option.

One of the high-level options might be the attempt to resume partial
transfers, similarly as you can see in the sftp from openssh. I do not
think how practical would it be to read files from other specific
offsets or write to some specific offsets using sftp.

> The user should also be made aware that a call to sftp_file_transfer()
> updates the offset of the files. i.e if initially offset is 0 and he calls the
> sftp_file_transfer() for the file and it successfully transfers the whole file
> then the offset after sftp_file_transfer() returns will be at the end of the
> file.
>
> The sftp_file_transfer() function will return the number of bytes transferred
> successfully and -1 on encountering some error before transferring even
> a single byte provided the file to transfer had > 0 bytes. If the file to transfer
> is a 0 byte file the function will simply return 0.
>
> (We can use off_t or int64_t for type of number of bytes transferred
> [not unsigned as on error we have to return SSH_ERROR i.e -1], off_t
> is the standard type for representing file size/file offsets, but on 32 bit systems
> its of 32 bits and signed which limits the file size it can store to 2^31-1 bytes
> i.e 2147483647 bytes which is approx 2GB, but since files that are to be transferred
> can certainly be greater than 2GB therefore to be able to handle larger files we
> have to define a macro _FILE_OFFSET_BITS and set it to 64 or specify
> -D_FILE_OFFSET=64 during compilation so that files larger than 2GB can be
> handled on 32 bit systems)
>
> off_t sftp_file_transfer(sftp file_transfer ft)
> {
> off_t bytes;
>
> switch(ft->direction) {
>
> case SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_LOCAL :
> bytes  = sftp_file_transfer_local_to_local(ft);
> break;
>
> case SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_REMOTE :
> bytes = sftp_file_transfer_local_to_remote(ft);
> break;
>
> case SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_REMOTE :
> bytes = sftp_file_transfer_remote_to_remote(ft);
> break;
>
> default:
> bytes = SSH_ERROR;
> }
>
> return bytes;
> }
>
>
>> If you use these function which look simple to use you want to also add a
>> progress callback passed down and a userdata pointer.
>>
>> This way a UI (also cmdline) could display some progress.
>
>
> I think this suggestion was given so that a callback is called
> as the bytes are transferred so that the user of the api can
> display the progress of the file transfer on command line/UI.
> Based on this suggestion two members userdata and progress_callback
> have been added in the struct sftp_file_transfer_struct.
>
> progress_callback() set by the user will be called after a
> successful transfer (i.e on getting a successful response
> for our request) and will be passed info which will help the
> user to do something/act upon the progress in the transfer.
>
> struct sftp_file_transfer_progress_struct
> {
> off_t bytes_transferred;
> off_t bytes_total;
> void *user_data;
> };
>
> void my_callback(const struct sftp_file_transfer_progress_struct *p)
> {
> This function will be written by the library user.
>
> This function can access the members of the structure but
> cannot change the value of the members of the structure p points to.
> The structure that p points to is allocated by our api.
> }
>
> Note that for partial transfers bytes_total will not be the same as
> the file_size.
>
> This is the approach I prefer, but other ways to implement this
> progress_callback() is passing it  :
>
>  *  Percentage of transfer occurred along with user_data pointer.
>  *  Passing the number of chunks successfully transferred (which can also be
> called as iteration number) along with the user_data pointer.
>
> The problems (that currently come to my mind) with the introduction of
> this progress_callback() are :
>
> 1) If copy-data extension is supported in the case of remote
> to remote copy, then according to the protocol
> (https://github.com/openssh/openssh-portable/blob/master/PROTOCOL#L581)
> if we specify 0 as read-data-length then the server will read data
> from the read handle till EOF is reached. So ideally in one copy-data
> request to the server we can perform the whole remote to remote transfer.
> But since the progress_callback() is called on the client side, it will only come
> to know about progress after the whole transfer has been completed (i.e on
> getting a successful response to the previous copy-data request) and won't be
> able to show any actual progress as the remote to remote copy is occurring on
> the server side.
>
> The solution to this is to send copy-data requests with read-data-length as
> chunk-size and issue multiple copy-data requests, on successful completion
> of each request call the progress_callback(). But this solution isn't the best as
> those many requests weren't  really needed when we could have done the
> same job with just one request and with better performance (due to lesser
> client-server interactions).

I think it should be ok to document this limitation. The `copy-data`
requests will be faster than the network transfers anyway so just best
effort calling the callback at the beginning and end of the "transfer"
should be enough.

> 2) In case of local to local file copy, on linux the fastest way to do that is to
> use the sendfile() system call (https://man7.org/linux/man-pages/man2/sendfile.2.html),
> but using it has the same problem as 1), i.e we'll get to know about the status of
> the transfer after the whole transfer has completed and the progress_callback()
> will be called after the whole transfer occurs, and hence won't be of much use to
> show progress.
>
> Again, the solution to this is similar to solution of 1) i.e calling sendfile() for
> chunk_size bytes and not for the whole file_size(or total_bytes < file_size
> for partial transfers), and on each successful sendfile() call, we call progress_callback().
> But this solution is not the best as multiple calls to sendfile() weren't needed
> when we could've done the same thing with just one sendfile() call.
>
> Currently the return type of the progress_callback() is kept void. We can
> also keep its return type int and make it return 0(SSH_OK) if it worked
> successfully and -1(SSH_ERROR) if it failed. And if the progress_callback()
> fails then the sftp_file_transfer() does not continue further and returns.

I think the progress callback should be a passive listener and should
not be able to make the file transfer to fail by returning an error.

> Suggestions for the implementation of progress_callback() and the
> problems associated with it as well as general suggestions regarding
> the design of the discussed api are welcome.

Jakub
References:
Re: Proposal for High level sftp api for uploads and downloads	Andreas Schneider <asn@xxxxxxxxxxxxxx>
Re: Proposal for High level sftp api for uploads and downloads	Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
Archive administrator: postmaster@lists.cynapses.org