[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Infinite loop with concurrent ssh_connect() on CentOS 6


It looks like I can get around this last problem by calling ssh_init() in
the beginning of the program before spawning any threads.  From an API
design standpoint, however, if a call to ssh_init() is prerequisite for
calling ssh_connect() concurrently, then ssh_connect() should verify that
ssh_init() has been called.  If it hasn't been called it should fail with
an informative error code/message.

- Doug


On Tue, Oct 28, 2014 at 8:36 AM, Doug Judd <doug@xxxxxxxxxxxxxx> wrote:

> Here's another infinite loop problem when calling ssh_connect()
> concurrently.  This time it looks like it's in the libcrypto initialization
> code:
>
> Thread 13 (Thread 0x7f833a002700 (LWP 24506)):
> #0  0x00007f83420af3e9 in lh_insert () from /opt/hypertable/doug/
> 0.9.8.3/lib/libcrypto.so.1.0.0
> #1  0x00007f834200e7e5 in OBJ_NAME_add () from /opt/hypertable/doug/
> 0.9.8.3/lib/libcrypto.so.1.0.0
> #2  0x00007f83420c0176 in OpenSSL_add_all_ciphers () from
> /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> #3  0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf () from
> /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> #4  0x00007f8342675515 in ssh_crypto_init () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #5  0x00007f83426776a2 in ssh_init () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #6  0x00007f8342673929 in ssh_connect () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #7  0x0000000000435614 in Hypertable::SshSocketHandler::handle(int, int) ()
> #8  0x0000000000484efa in
> Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) ()
> #9  0x0000000000492e24 in Hypertable::ReactorRunner::operator()() ()
> #10 0x00007f83415bace3 in thread_proxy () from /opt/hypertable/doug/
> 0.9.8.3/lib/libboost_thread.so.1.54.0
> #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0
> #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6
> Thread 12 (Thread 0x7f8339601700 (LWP 24507)):
> #0  0x00007f83420af3e9 in lh_insert () from /opt/hypertable/doug/
> 0.9.8.3/lib/libcrypto.so.1.0.0
> #1  0x00007f834200e7e5 in OBJ_NAME_add () from /opt/hypertable/doug/
> 0.9.8.3/lib/libcrypto.so.1.0.0
> #2  0x00007f83420c0176 in OpenSSL_add_all_ciphers () from
> /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> #3  0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf () from
> /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> #4  0x00007f8342675515 in ssh_crypto_init () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #5  0x00007f83426776a2 in ssh_init () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #6  0x00007f8342673929 in ssh_connect () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #7  0x0000000000435614 in Hypertable::SshSocketHandler::handle(int, int) ()
> #8  0x0000000000484efa in
> Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) ()
> #9  0x0000000000492e24 in Hypertable::ReactorRunner::operator()() ()
> #10 0x00007f83415bace3 in thread_proxy () from /opt/hypertable/doug/
> 0.9.8.3/lib/libboost_thread.so.1.54.0
> #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0
> #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6
> Thread 11 (Thread 0x7f8338c00700 (LWP 24508)):
> #0  0x00007f83420af3e9 in lh_insert () from /opt/hypertable/doug/
> 0.9.8.3/lib/libcrypto.so.1.0.0
> #1  0x00007f834200e7e5 in OBJ_NAME_add () from /opt/hypertable/doug/
> 0.9.8.3/lib/libcrypto.so.1.0.0
> #2  0x00007f83420c0176 in OpenSSL_add_all_ciphers () from
> /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> #3  0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf () from
> /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> #4  0x00007f8342675515 in ssh_crypto_init () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #5  0x00007f83426776a2 in ssh_init () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #6  0x00007f8342673929 in ssh_connect () from /opt/hypertable/doug/
> 0.9.8.3/lib/libssh.so.4
> #7  0x0000000000435614 in Hypertable::SshSocketHandler::handle(int, int) ()
> #8  0x0000000000484efa in
> Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) ()
> #9  0x0000000000492e24 in Hypertable::ReactorRunner::operator()() ()
> #10 0x00007f83415bace3 in thread_proxy () from /opt/hypertable/doug/
> 0.9.8.3/lib/libboost_thread.so.1.54.0
> #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0
> #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6
> ...
>
> The process is stuck at 100% CPU utilization.  The version of openssl that
> the program is linked with is 1.0.2-beta3.
>
> - Doug
>
> On Tue, Oct 21, 2014 at 10:32 AM, Doug Judd <doug@xxxxxxxxxxxxxx> wrote:
>
>> I'm developing a multi-host ssh tool using libssh 0.6.3.  The tool
>> establishes connections asynchronously and in parallel.  Intermittently,
>> the tool will get stuck in a busy loop with a stack trace such as the
>> following:
>>
>> Thread 13 (Thread 0x7f65f9b41700 (LWP 13549)):
>> #0  0x0000003ee3adc613 in poll () from /lib64/libc.so.6
>> #1  0x0000003ee3b0fe3c in clntudp_call () from /lib64/libc.so.6
>> #2  0x0000003ee76058bb in do_ypcall () from /lib64/libnsl.so.1
>> #3  0x0000003ee76060ab in yp_match () from /lib64/libnsl.so.1
>> #4  0x00007f65f1f14f79 in _nss_nis_getpwuid_r () from /lib64/libnss_nis.so.2
>> #5  0x0000003ee3aaa4ed in getpwuid_r@@GLIBC_2.2.5 () from /lib64/libc.so.6
>> #6  0x00007f6601b0152e in ssh_path_expand_tilde () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4
>> #7 <http://0.9.8.2/lib/libssh.so.4#7>  0x00007f6601b02bc3 in ssh_options_set () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4
>> #8 <http://0.9.8.2/lib/libssh.so.4#8>  0x00007f6601b036fb in ssh_options_apply () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4
>> #9 <http://0.9.8.2/lib/libssh.so.4#9>  0x00007f6601af68fe in ssh_connect () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4
>> #10 <http://0.9.8.2/lib/libssh.so.4#10> 0x000000000043547a in Hypertable::SshSocketHandler::handle(int, int) ()
>> #11 0x0000000000484b9a in Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) ()
>> #12 0x0000000000492ac4 in Hypertable::ReactorRunner::operator()() ()
>> #13 0x00007f66010f8ce3 in thread_proxy () from /opt/hypertable/doug/0.9.8.2/lib/libboost_thread.so.1.54.0
>> #14 <http://0.9.8.2/lib/libboost_thread.so.1.54.0#14> 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0
>> #15 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6
>>
>> I did a little digging around and came across a ticket filed against sssd
>> <https://fedorahosted.org/sssd/ticket/640> which I believe is the source
>> of the problem.  It appears that getpwuid_r() is not thread safe under
>> certain circumstances.
>>
>> From what I can tell, ssh_connect() will use ~/.ssh as the ssh directory
>> if one is not explicitly supplied.  It's during the expansion of the ~
>> character that getpwuid_r() gets called.  The workaround is to
>> explicitly set the ssh directory using a path that does not include the ~
>> character, for example:
>>
>> char *home = getenv("HOME");
>> if (home == nullptr)
>>   error("Environment variable HOME is not set");
>> string ssh_dir(home);
>> ssh_dir.append("/.ssh");
>> ssh_options_set(m_ssh_session, SSH_OPTIONS_SSH_DIR, ssh_dir.c_str());
>> Attached is a patch to libssh that eliminates the ~ expansion for the
>> default case (~/.ssh).  In my test environment, the problem is very
>> intermittent and I don't have a reproducible test case, so I'm not 100%
>> sure this solution solves the problem.  However, given the evidence, I
>> think it's a safe bet.
>>
>> - Doug
>>
>> --
>> Doug Judd
>> www.hypertable.com
>>
>>
>
>
> --
> Doug Judd
> CEO, Hypertable Inc.
>



-- 
Doug Judd
CEO, Hypertable Inc.

Follow-Ups:
Re: Infinite loop with concurrent ssh_connect() on CentOS 6Aris Adamantiadis <aris@xxxxxxxxxxxx>
References:
Infinite loop with concurrent ssh_connect() on CentOS 6Doug Judd <doug@xxxxxxxxxxxxxx>
Re: Infinite loop with concurrent ssh_connect() on CentOS 6Doug Judd <doug@xxxxxxxxxxxxxx>
Archive administrator: postmaster@lists.cynapses.org