[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Infinite loop with concurrent ssh_connect() on CentOS 6
[Thread Prev] | [Thread Next]
- Subject: Re: Infinite loop with concurrent ssh_connect() on CentOS 6
- From: Doug Judd <doug@xxxxxxxxxxxxxx>
- Reply-to: libssh@xxxxxxxxxx
- Date: Tue, 28 Oct 2014 08:36:06 -0700
- To: libssh@xxxxxxxxxx
Here's another infinite loop problem when calling ssh_connect() concurrently. This time it looks like it's in the libcrypto initialization code: Thread 13 (Thread 0x7f833a002700 (LWP 24506)): #0 0x00007f83420af3e9 in lh_insert () from /opt/hypertable/doug/ 0.9.8.3/lib/libcrypto.so.1.0.0 #1 0x00007f834200e7e5 in OBJ_NAME_add () from /opt/hypertable/doug/ 0.9.8.3/lib/libcrypto.so.1.0.0 #2 0x00007f83420c0176 in OpenSSL_add_all_ciphers () from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0 #3 0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf () from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0 #4 0x00007f8342675515 in ssh_crypto_init () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #5 0x00007f83426776a2 in ssh_init () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #6 0x00007f8342673929 in ssh_connect () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #7 0x0000000000435614 in Hypertable::SshSocketHandler::handle(int, int) () #8 0x0000000000484efa in Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) () #9 0x0000000000492e24 in Hypertable::ReactorRunner::operator()() () #10 0x00007f83415bace3 in thread_proxy () from /opt/hypertable/doug/ 0.9.8.3/lib/libboost_thread.so.1.54.0 #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0 #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x7f8339601700 (LWP 24507)): #0 0x00007f83420af3e9 in lh_insert () from /opt/hypertable/doug/ 0.9.8.3/lib/libcrypto.so.1.0.0 #1 0x00007f834200e7e5 in OBJ_NAME_add () from /opt/hypertable/doug/ 0.9.8.3/lib/libcrypto.so.1.0.0 #2 0x00007f83420c0176 in OpenSSL_add_all_ciphers () from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0 #3 0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf () from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0 #4 0x00007f8342675515 in ssh_crypto_init () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #5 0x00007f83426776a2 in ssh_init () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #6 0x00007f8342673929 in ssh_connect () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #7 0x0000000000435614 in Hypertable::SshSocketHandler::handle(int, int) () #8 0x0000000000484efa in Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) () #9 0x0000000000492e24 in Hypertable::ReactorRunner::operator()() () #10 0x00007f83415bace3 in thread_proxy () from /opt/hypertable/doug/ 0.9.8.3/lib/libboost_thread.so.1.54.0 #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0 #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x7f8338c00700 (LWP 24508)): #0 0x00007f83420af3e9 in lh_insert () from /opt/hypertable/doug/ 0.9.8.3/lib/libcrypto.so.1.0.0 #1 0x00007f834200e7e5 in OBJ_NAME_add () from /opt/hypertable/doug/ 0.9.8.3/lib/libcrypto.so.1.0.0 #2 0x00007f83420c0176 in OpenSSL_add_all_ciphers () from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0 #3 0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf () from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0 #4 0x00007f8342675515 in ssh_crypto_init () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #5 0x00007f83426776a2 in ssh_init () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #6 0x00007f8342673929 in ssh_connect () from /opt/hypertable/doug/ 0.9.8.3/lib/libssh.so.4 #7 0x0000000000435614 in Hypertable::SshSocketHandler::handle(int, int) () #8 0x0000000000484efa in Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) () #9 0x0000000000492e24 in Hypertable::ReactorRunner::operator()() () #10 0x00007f83415bace3 in thread_proxy () from /opt/hypertable/doug/ 0.9.8.3/lib/libboost_thread.so.1.54.0 #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0 #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6 ... The process is stuck at 100% CPU utilization. The version of openssl that the program is linked with is 1.0.2-beta3. - Doug On Tue, Oct 21, 2014 at 10:32 AM, Doug Judd <doug@xxxxxxxxxxxxxx> wrote: > I'm developing a multi-host ssh tool using libssh 0.6.3. The tool > establishes connections asynchronously and in parallel. Intermittently, > the tool will get stuck in a busy loop with a stack trace such as the > following: > > Thread 13 (Thread 0x7f65f9b41700 (LWP 13549)): > #0 0x0000003ee3adc613 in poll () from /lib64/libc.so.6 > #1 0x0000003ee3b0fe3c in clntudp_call () from /lib64/libc.so.6 > #2 0x0000003ee76058bb in do_ypcall () from /lib64/libnsl.so.1 > #3 0x0000003ee76060ab in yp_match () from /lib64/libnsl.so.1 > #4 0x00007f65f1f14f79 in _nss_nis_getpwuid_r () from /lib64/libnss_nis.so.2 > #5 0x0000003ee3aaa4ed in getpwuid_r@@GLIBC_2.2.5 () from /lib64/libc.so.6 > #6 0x00007f6601b0152e in ssh_path_expand_tilde () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4 > #7 <http://0.9.8.2/lib/libssh.so.4#7> 0x00007f6601b02bc3 in ssh_options_set () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4 > #8 <http://0.9.8.2/lib/libssh.so.4#8> 0x00007f6601b036fb in ssh_options_apply () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4 > #9 <http://0.9.8.2/lib/libssh.so.4#9> 0x00007f6601af68fe in ssh_connect () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4 > #10 <http://0.9.8.2/lib/libssh.so.4#10> 0x000000000043547a in Hypertable::SshSocketHandler::handle(int, int) () > #11 0x0000000000484b9a in Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) () > #12 0x0000000000492ac4 in Hypertable::ReactorRunner::operator()() () > #13 0x00007f66010f8ce3 in thread_proxy () from /opt/hypertable/doug/0.9.8.2/lib/libboost_thread.so.1.54.0 > #14 <http://0.9.8.2/lib/libboost_thread.so.1.54.0#14> 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0 > #15 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6 > > I did a little digging around and came across a ticket filed against sssd > <https://fedorahosted.org/sssd/ticket/640> which I believe is the source > of the problem. It appears that getpwuid_r() is not thread safe under > certain circumstances. > > From what I can tell, ssh_connect() will use ~/.ssh as the ssh directory > if one is not explicitly supplied. It's during the expansion of the ~ > character that getpwuid_r() gets called. The workaround is to explicitly > set the ssh directory using a path that does not include the ~ character, > for example: > > char *home = getenv("HOME"); > if (home == nullptr) > error("Environment variable HOME is not set"); > string ssh_dir(home); > ssh_dir.append("/.ssh"); > ssh_options_set(m_ssh_session, SSH_OPTIONS_SSH_DIR, ssh_dir.c_str()); > Attached is a patch to libssh that eliminates the ~ expansion for the > default case (~/.ssh). In my test environment, the problem is very > intermittent and I don't have a reproducible test case, so I'm not 100% > sure this solution solves the problem. However, given the evidence, I > think it's a safe bet. > > - Doug > > -- > Doug Judd > www.hypertable.com > > -- Doug Judd CEO, Hypertable Inc.
Re: Infinite loop with concurrent ssh_connect() on CentOS 6 | Doug Judd <doug@xxxxxxxxxxxxxx> |
Infinite loop with concurrent ssh_connect() on CentOS 6 | Doug Judd <doug@xxxxxxxxxxxxxx> |