[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Proposal for High level sftp api for uploads and downloads
[Thread Prev] | [Thread Next]
- Subject: Re: Proposal for High level sftp api for uploads and downloads
- From: Jakub Jelen <jjelen@xxxxxxxxxx>
- Reply-to: libssh@xxxxxxxxxx
- Date: Wed, 7 Jun 2023 11:19:11 +0200
- To: libssh@xxxxxxxxxx
On Tue, Jun 6, 2023 at 4:41 PM Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx> wrote: > > Thanks a lot for the inputs. > > Based on the suggestions the api will look something as follows : > > enum sftp_file_transfer_direction_e > { > SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_LOCAL, > SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_REMOTE, > SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_LOCAL, > SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_REMOTE > } ; > > struct sftp_file_transfer_struct > { > enum sftp_file_transfer_direction_e direction; > > union > { > int local_fd; > sftp_file remote_file; > } source, target; > > size_t chunk_size; > size_t in_flight_requests; > > void *user_data; > void (*progress_callback)(------parameters not yet decided--------); > }; > > typedef struct sftp_file_transfer_struct* sftp_file_transfer; > > sftp_file_transfer sftp_file_transfer_new() > { > /* Allocate a new file transfer structure, assign default > * values to the members and return the structure's > * address > */ > } > > enum sftp_file_transfer_options_e > { > SFTP_FILE_TRANSFER_OPTIONS_DIRECTION, > > SFTP_FILE_TRANSFER_OPTIONS_SOURCE_LOCAL, > SFTP_FILE_TRANSFER_OPTIONS_SOURCE_REMOTE, > SFTP_FILE_TRANSFER_OPTIONS_TARGET_LOCAL, > SFTP_FILE_TRANSFER_OPTIONS_TARGET_REMOTE, > > SFTP_FILE_TRANSFER_OPTIONS_CHUNK_SIZE, > SFTP_FILE_TRANSFER_OPTIONS_IN_FLIGHT_REQESTS, > > SFTP_FILE_TRANSFER_OPTIONS_USER_DATA > SFTP_FILE_TRANSFER_OPTIONS_PROGRESS_CALLBACK > } ; Note here that I am a bit worried about the length of the names as they already have ~40 characters, fitting them reasonably on one line will be hard. I would probably propose to drop the FILE_TRANSFER part or shorten it to FT or similar. The same for the direction enum. > int sftp_file_transfer_options_set (sftp_file_transfer ft, enum sftp_file_transfer_options_e option, void *value); > > void sftp_file_transfer_free(sftp_file_transfer ft); > >> If the short reads occurs for normal files, I think the right thing is >> to fail and report the issue to the user. If the file was already >> partially transferred, it might make sense to provide some information >> how much was transferred and maybe provide a way to resume partial >> transfers, but I would probably keep this for later time. But it would >> be good to think about this possible extension where we would need to >> update the offset in the file structure with the size of the target file >> (or again check how openssh does this). > > > For being able to support partial transfers, I think we should > begin read-writes on the files from current offsets of the local file, > remote sftp file instead of necessarily starting from offset 0. > > For example, say we are reading a file (for the sake of this sftp file > transfer) and the read offset for that file initially is 10 (when sftp_file_transfer() > is called) and the file size is 1000 bytes. Then the read operation will be > started not from offset 0 but from offset 10 and the transfer will occur for > (1000 - 10) bytes. > > This will help to support and resume partial transfers, but it puts the responsibility on > the user to set offsets depending on where he/she wants to begin read/write from > before calling this sftp_file_transfer() to start read/writes from that offset. If user > wants to start the transfer of a file from the offset 100 not 0, then he must set the offset > of that file to 100 (using sftp_seek() for remote files and lseek() for local files) and > then call sftp_file_transfer(). I think this approach is suitable for the low-level api, but the high-level API needs to handle this somehow internally automatically. I think by default it should truncate the target file and reset the offset of the source file to 0. If the user elects to use some offset, I think it should be given as an high-level API option. One of the high-level options might be the attempt to resume partial transfers, similarly as you can see in the sftp from openssh. I do not think how practical would it be to read files from other specific offsets or write to some specific offsets using sftp. > The user should also be made aware that a call to sftp_file_transfer() > updates the offset of the files. i.e if initially offset is 0 and he calls the > sftp_file_transfer() for the file and it successfully transfers the whole file > then the offset after sftp_file_transfer() returns will be at the end of the > file. > > The sftp_file_transfer() function will return the number of bytes transferred > successfully and -1 on encountering some error before transferring even > a single byte provided the file to transfer had > 0 bytes. If the file to transfer > is a 0 byte file the function will simply return 0. > > (We can use off_t or int64_t for type of number of bytes transferred > [not unsigned as on error we have to return SSH_ERROR i.e -1], off_t > is the standard type for representing file size/file offsets, but on 32 bit systems > its of 32 bits and signed which limits the file size it can store to 2^31-1 bytes > i.e 2147483647 bytes which is approx 2GB, but since files that are to be transferred > can certainly be greater than 2GB therefore to be able to handle larger files we > have to define a macro _FILE_OFFSET_BITS and set it to 64 or specify > -D_FILE_OFFSET=64 during compilation so that files larger than 2GB can be > handled on 32 bit systems) > > off_t sftp_file_transfer(sftp file_transfer ft) > { > off_t bytes; > > switch(ft->direction) { > > case SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_LOCAL : > bytes = sftp_file_transfer_local_to_local(ft); > break; > > case SFTP_FILE_TRANSFER_DIRECTION_LOCAL_TO_REMOTE : > bytes = sftp_file_transfer_local_to_remote(ft); > break; > > case SFTP_FILE_TRANSFER_DIRECTION_REMOTE_TO_REMOTE : > bytes = sftp_file_transfer_remote_to_remote(ft); > break; > > default: > bytes = SSH_ERROR; > } > > return bytes; > } > > >> If you use these function which look simple to use you want to also add a >> progress callback passed down and a userdata pointer. >> >> This way a UI (also cmdline) could display some progress. > > > I think this suggestion was given so that a callback is called > as the bytes are transferred so that the user of the api can > display the progress of the file transfer on command line/UI. > Based on this suggestion two members userdata and progress_callback > have been added in the struct sftp_file_transfer_struct. > > progress_callback() set by the user will be called after a > successful transfer (i.e on getting a successful response > for our request) and will be passed info which will help the > user to do something/act upon the progress in the transfer. > > struct sftp_file_transfer_progress_struct > { > off_t bytes_transferred; > off_t bytes_total; > void *user_data; > }; > > void my_callback(const struct sftp_file_transfer_progress_struct *p) > { > This function will be written by the library user. > > This function can access the members of the structure but > cannot change the value of the members of the structure p points to. > The structure that p points to is allocated by our api. > } > > Note that for partial transfers bytes_total will not be the same as > the file_size. > > This is the approach I prefer, but other ways to implement this > progress_callback() is passing it : > > * Percentage of transfer occurred along with user_data pointer. > * Passing the number of chunks successfully transferred (which can also be > called as iteration number) along with the user_data pointer. > > The problems (that currently come to my mind) with the introduction of > this progress_callback() are : > > 1) If copy-data extension is supported in the case of remote > to remote copy, then according to the protocol > (https://github.com/openssh/openssh-portable/blob/master/PROTOCOL#L581) > if we specify 0 as read-data-length then the server will read data > from the read handle till EOF is reached. So ideally in one copy-data > request to the server we can perform the whole remote to remote transfer. > But since the progress_callback() is called on the client side, it will only come > to know about progress after the whole transfer has been completed (i.e on > getting a successful response to the previous copy-data request) and won't be > able to show any actual progress as the remote to remote copy is occurring on > the server side. > > The solution to this is to send copy-data requests with read-data-length as > chunk-size and issue multiple copy-data requests, on successful completion > of each request call the progress_callback(). But this solution isn't the best as > those many requests weren't really needed when we could have done the > same job with just one request and with better performance (due to lesser > client-server interactions). I think it should be ok to document this limitation. The `copy-data` requests will be faster than the network transfers anyway so just best effort calling the callback at the beginning and end of the "transfer" should be enough. > 2) In case of local to local file copy, on linux the fastest way to do that is to > use the sendfile() system call (https://man7.org/linux/man-pages/man2/sendfile.2.html), > but using it has the same problem as 1), i.e we'll get to know about the status of > the transfer after the whole transfer has completed and the progress_callback() > will be called after the whole transfer occurs, and hence won't be of much use to > show progress. > > Again, the solution to this is similar to solution of 1) i.e calling sendfile() for > chunk_size bytes and not for the whole file_size(or total_bytes < file_size > for partial transfers), and on each successful sendfile() call, we call progress_callback(). > But this solution is not the best as multiple calls to sendfile() weren't needed > when we could've done the same thing with just one sendfile() call. > > Currently the return type of the progress_callback() is kept void. We can > also keep its return type int and make it return 0(SSH_OK) if it worked > successfully and -1(SSH_ERROR) if it failed. And if the progress_callback() > fails then the sftp_file_transfer() does not continue further and returns. I think the progress callback should be a passive listener and should not be able to make the file transfer to fail by returning an error. > Suggestions for the implementation of progress_callback() and the > problems associated with it as well as general suggestions regarding > the design of the discussed api are welcome. Jakub
Re: Proposal for High level sftp api for uploads and downloads | Andreas Schneider <asn@xxxxxxxxxxxxxx> |
Re: Proposal for High level sftp api for uploads and downloads | Eshan Kelkar <eshankelkar@xxxxxxxxxxxxx> |