[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for High level sftp api for uploads and downloads
[Thread Prev] | [Thread Next]
- Subject: Re: Proposal for High level sftp api for uploads and downloads
- From: Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx>
- Reply-to: libssh@xxxxxxxxxx
- Date: Tue, 6 Jun 2023 20:04:43 +0530
- To: libssh@xxxxxxxxxx
Thanks a lot for the inputs. Based on the suggestions the api will look something as follows : enum sftp_file_transfer_direction_e { SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_LOCAL, SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_REMOTE, SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_LOCAL, SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_REMOTE } ; struct sftp_file_transfer_struct { enum sftp_file_transfer_direction_e direction; union { int local_fd; sftp_file remote_file; } source, target; size_t chunk_size; size_t in_flight_requests; void *user_data; void (*progress_callback)(------parameters not yet decided--------); }; typedef struct sftp_file_transfer_struct* sftp_file_transfer; sftp_file_transfer sftp_file_transfer_new() { /* Allocate a new file transfer structure, assign default * values to the members and return the structure's * address */ } enum sftp_file_transfer_options_e { SFTP_FILE_TRANSFER_OPTIONS_DIRECTION, SFTP_FILE_TRANSFER_OPTIONS_SOURCE_LOCAL, SFTP_FILE_TRANSFER_OPTIONS_SOURCE_REMOTE, SFTP_FILE_TRANSFER_OPTIONS_TARGET_LOCAL, SFTP_FILE_TRANSFER_OPTIONS_TARGET_REMOTE, SFTP_FILE_TRANSFER_OPTIONS_CHUNK_SIZE, SFTP_FILE_TRANSFER_OPTIONS_IN_FLIGHT_REQESTS, SFTP_FILE_TRANSFER_OPTIONS_USER_DATA SFTP_FILE_TRANSFER_OPTIONS_PROGRESS_CALLBACK } ; int sftp_file_transfer_options_set (sftp_file_transfer ft, enum sftp_file_transfer_options_e option, void *value); void sftp_file_transfer_free(sftp_file_transfer ft); If the short reads occurs for normal files, I think the right thing is > to fail and report the issue to the user. If the file was already > partially transferred, it might make sense to provide some information > how much was transferred and maybe provide a way to resume partial > transfers, but I would probably keep this for later time. But it would > be good to think about this possible extension where we would need to > update the offset in the file structure with the size of the target file > (or again check how openssh does this). > For being able to support partial transfers, I think we should begin read-writes on the files from current offsets of the local file, remote sftp file instead of necessarily starting from offset 0. For example, say we are reading a file (for the sake of this sftp file transfer) and the read offset for that file initially is 10 (when sftp_file_transfer() is called) and the file size is 1000 bytes. Then the read operation will be started not from offset 0 but from offset 10 and the transfer will occur for (1000 - 10) bytes. This will help to support and resume partial transfers, but it puts the responsibility on the user to set offsets depending on where he/she wants to begin read/write from before calling this sftp_file_transfer() to start read/writes from that offset. If user wants to start the transfer of a file from the offset 100 not 0, then he must set the offset of that file to 100 (using sftp_seek() for remote files and lseek() for local files) and then call sftp_file_transfer(). The user should also be made aware that a call to sftp_file_transfer() updates the offset of the files. i.e if initially offset is 0 and he calls the sftp_file_transfer() for the file and it successfully transfers the whole file then the offset after sftp_file_transfer() returns will be at the end of the file. The sftp_file_transfer() function will return the number of bytes transferred successfully and -1 on encountering some error before transferring even a single byte provided the file to transfer had > 0 bytes. If the file to transfer is a 0 byte file the function will simply return 0. (We can use off_t or int64_t for type of number of bytes transferred [not unsigned as on error we have to return SSH_ERROR i.e -1], off_t is the standard type for representing file size/file offsets, but on 32 bit systems its of 32 bits and signed which limits the file size it can store to 2^31-1 bytes i.e 2147483647 bytes which is approx 2GB, but since files that are to be transferred can certainly be greater than 2GB therefore to be able to handle larger files we have to define a macro _FILE_OFFSET_BITS and set it to 64 or specify -D_FILE_OFFSET=64 during compilation so that files larger than 2GB can be handled on 32 bit systems) off_t sftp_file_transfer(sftp file_transfer ft) { off_t bytes; switch(ft->direction) { case SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_LOCAL : bytes = sftp_file_transfer_local_to_local(ft); break; case SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_REMOTE : bytes = sftp_file_transfer_local_to_remote(ft); break; case SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_REMOTE : bytes = sftp_file_transfer_remote_to_remote(ft); break; default: bytes = SSH_ERROR; } return bytes; } If you use these function which look simple to use you want to also add a > progress callback passed down and a userdata pointer. > > This way a UI (also cmdline) could display some progress. > I think this suggestion was given so that a callback is called as the bytes are transferred so that the user of the api can display the progress of the file transfer on command line/UI. Based on this suggestion two members userdata and progress_callback have been added in the struct sftp_file_transfer_struct. progress_callback() set by the user will be called after a successful transfer (i.e on getting a successful response for our request) and will be passed info which will help the user to do something/act upon the progress in the transfer. struct sftp_file_transfer_progress_struct { off_t bytes_transferred; off_t bytes_total; void *user_data; }; void my_callback(const struct sftp_file_transfer_progress_struct *p) { This function will be written by the library user. This function can access the members of the structure but cannot change the value of the members of the structure p points to. The structure that p points to is allocated by our api. } Note that for partial transfers bytes_total will not be the same as the file_size. This is the approach I prefer, but other ways to implement this progress_callback() is passing it : * Percentage of transfer occurred along with user_data pointer. * Passing the number of chunks successfully transferred (which can also be called as iteration number) along with the user_data pointer. The problems (that currently come to my mind) with the introduction of this progress_callback() are : 1) If copy-data extension is supported in the case of remote to remote copy, then according to the protocol (https://github.com/openssh/openssh-portable/blob/master/PROTOCOL#L581) if we specify 0 as read-data-length then the server will read data from the read handle till EOF is reached. So ideally in one copy-data request to the server we can perform the whole remote to remote transfer. But since the progress_callback() is called on the client side, it will only come to know about progress after the whole transfer has been completed (i.e on getting a successful response to the previous copy-data request) and won't be able to show any actual progress as the remote to remote copy is occurring on the server side. The solution to this is to send copy-data requests with read-data-length as chunk-size and issue multiple copy-data requests, on successful completion of each request call the progress_callback(). But this solution isn't the best as those many requests weren't really needed when we could have done the same job with just one request and with better performance (due to lesser client-server interactions). 2) In case of local to local file copy, on linux the fastest way to do that is to use the sendfile() system call ( https://man7.org/linux/man-pages/man2/sendfile.2.html), but using it has the same problem as 1), i.e we'll get to know about the status of the transfer after the whole transfer has completed and the progress_callback() will be called after the whole transfer occurs, and hence won't be of much use to show progress. Again, the solution to this is similar to solution of 1) i.e calling sendfile() for chunk_size bytes and not for the whole file_size(or total_bytes < file_size for partial transfers), and on each successful sendfile() call, we call progress_callback(). But this solution is not the best as multiple calls to sendfile() weren't needed when we could've done the same thing with just one sendfile() call. Currently the return type of the progress_callback() is kept void. We can also keep its return type int and make it return 0(SSH_OK) if it worked successfully and -1(SSH_ERROR) if it failed. And if the progress_callback() fails then the sftp_file_transfer() does not continue further and returns. Suggestions for the implementation of progress_callback() and the problems associated with it as well as general suggestions regarding the design of the discussed api are welcome.
Re: Proposal for High level sftp api for uploads and downloads | Andreas Schneider <asn@xxxxxxxxxxxxxx> |
Re: Proposal for High level sftp api for uploads and downloads | Jakub Jelen <jjelen@xxxxxxxxxx> |
Re: Proposal for High level sftp api for uploads and downloads | Andreas Schneider <asn@xxxxxxxxxxxxxx> |