[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Libssh blocking instead of timing out


Hello all,

I wrote a few months ago about a problem I was having with libssh 0.2 hanging under certain conditions, and at the time we decided I should upgrade our code onto 0.3 to see what happened.

So I did that back then, pretty uneventfully, and it's been fine until last night. Last night we had two automated processes hang trying to connect to an SSH host. In both cases, the situation seems to have played out like this: 1) We dialed up and connected to a remote system, and established a PPP session
2) We tried to connect to the remote host over the PPP link
3) The PPP link failed almost right away, with PPPD ultimately terminating with "Peer not responding to echo requests"
4) libssh never returned (we found it over 12 hours later)

Dial up links with some of our customers can be flakey, so the fact that the PPP link was never entirely stable isn't shocking. However, we can't have libssh hanging in a failure case. I took this stack trace from the process; it was identical on both of the machines that had a hang:

ssh_connect
	ssh_get_kex
		packet_wait
			packet_read2
				ssh_socket_wait_for_data
					ssh_socket_completeread
						ssh_socket_unbuffered_read
							recvfrom
								recvfrom

Reading the man page for recvfrom (ssh_socked_unbuffered_read actually calls recv, but the man page says that call is redundant and may go away, so I'm guessing it's just implemented as recvfrom) it sounds like the ability to have a timeout is dependent on the socket being created/opened with a timeout flag set.

So... is there something I should be doing with the library to enable timeouts, or is it something that libssh needs to be doing?

Thanks for any help.

Chris Backas
Software Engineer
Bristol Capital Inc.


Archive administrator: postmaster@lists.cynapses.org