[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Infinite loop with concurrent ssh_connect() on CentOS 6


Hi Aris,

Thanks for the reply.  I did read that section, but I may have
mis-interpreted it.  My understanding is that I only need to link with
libssh_threads if using pthreads.  Since my application uses Boost threads
and C++11 threads, I assumed that I was not using pthreads, so I followed
the advice of bullet point #3 and implemented my own threading hooks and
did not link with libssh_threads.   However, grep'ing through the boost
header files, there are over 500 lines that reference pthread, so maybe
Boost threads use pthreads under-the-hood.  What do you recommend for my
situation (combination of Boost threads and C++11 threads)?  Do I still
need to link with libssh_threads, or is implementing my own threading hooks
sufficient?

- Doug


On Tue, Oct 28, 2014 at 2:12 PM, Aris Adamantiadis <aris@xxxxxxxxxxxx>
wrote:

> Le 28/10/14 17:12, Doug Judd a écrit :
> > It looks like I can get around this last problem by calling ssh_init()
> > in the beginning of the program before spawning any threads.  From an
> > API design standpoint, however, if a call to ssh_init() is
> > prerequisite for calling ssh_connect() concurrently, then
> > ssh_connect() should verify that ssh_init() has been called.  If it
> > hasn't been called it should fail with an informative error code/message.
> >
> > - Doug
> Hi Doug,
>
> Thanks for your feedback.
> While this could be a solution, 99% of developers using libssh are not
> using threads and this issue is already extensively covered in the
> documentation (http://api.libssh.org/master/libssh_tutor_threads.html).
> Calling ssh_init() in the beginning of your program is not enough, you
> must explicitly link with libssh_threads & us ssh_threads_set_callbacks
> or implement your own threading hooks.
> I have looked at other alternatives, like doing this automatically, and
> they were not satisfying in a performance/dependency point of view. We
> basically copied the libcrypto model (with providing a shared lib for
> pthread for convenience).
>
> Regards,
>
> Aris
>
> >
> >
> > On Tue, Oct 28, 2014 at 8:36 AM, Doug Judd <doug@xxxxxxxxxxxxxx
> > <mailto:doug@xxxxxxxxxxxxxx>> wrote:
> >
> >     Here's another infinite loop problem when calling ssh_connect()
> >     concurrently.  This time it looks like it's in the libcrypto
> >     initialization code:
> >
> >     Thread 13 (Thread 0x7f833a002700 (LWP 24506)):
> >     #0  0x00007f83420af3e9 in lh_insert () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #1  0x00007f834200e7e5 in OBJ_NAME_add () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #2  0x00007f83420c0176 in OpenSSL_add_all_ciphers () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #3  0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf ()
> >     from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #4  0x00007f8342675515 in ssh_crypto_init () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #5  0x00007f83426776a2 in ssh_init () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #6  0x00007f8342673929 in ssh_connect () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #7  0x0000000000435614 in
> >     Hypertable::SshSocketHandler::handle(int, int) ()
> >     #8  0x0000000000484efa in
> >     Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) ()
> >     #9  0x0000000000492e24 in Hypertable::ReactorRunner::operator()() ()
> >     #10 0x00007f83415bace3 in thread_proxy () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libboost_thread.so.1.54.0
> >     <http://0.9.8.3/lib/libboost_thread.so.1.54.0>
> >     #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0
> >     #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6
> >     Thread 12 (Thread 0x7f8339601700 (LWP 24507)):
> >     #0  0x00007f83420af3e9 in lh_insert () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #1  0x00007f834200e7e5 in OBJ_NAME_add () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #2  0x00007f83420c0176 in OpenSSL_add_all_ciphers () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #3  0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf ()
> >     from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #4  0x00007f8342675515 in ssh_crypto_init () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #5  0x00007f83426776a2 in ssh_init () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #6  0x00007f8342673929 in ssh_connect () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #7  0x0000000000435614 in
> >     Hypertable::SshSocketHandler::handle(int, int) ()
> >     #8  0x0000000000484efa in
> >     Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) ()
> >     #9  0x0000000000492e24 in Hypertable::ReactorRunner::operator()() ()
> >     #10 0x00007f83415bace3 in thread_proxy () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libboost_thread.so.1.54.0
> >     <http://0.9.8.3/lib/libboost_thread.so.1.54.0>
> >     #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0
> >     #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6
> >     Thread 11 (Thread 0x7f8338c00700 (LWP 24508)):
> >     #0  0x00007f83420af3e9 in lh_insert () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #1  0x00007f834200e7e5 in OBJ_NAME_add () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #2  0x00007f83420c0176 in OpenSSL_add_all_ciphers () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #3  0x00007f83420bfd0e in OPENSSL_add_all_algorithms_noconf ()
> >     from /opt/hypertable/doug/0.9.8.3/lib/libcrypto.so.1.0.0
> >     <http://0.9.8.3/lib/libcrypto.so.1.0.0>
> >     #4  0x00007f8342675515 in ssh_crypto_init () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #5  0x00007f83426776a2 in ssh_init () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #6  0x00007f8342673929 in ssh_connect () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libssh.so.4
> >     <http://0.9.8.3/lib/libssh.so.4>
> >     #7  0x0000000000435614 in
> >     Hypertable::SshSocketHandler::handle(int, int) ()
> >     #8  0x0000000000484efa in
> >     Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) ()
> >     #9  0x0000000000492e24 in Hypertable::ReactorRunner::operator()() ()
> >     #10 0x00007f83415bace3 in thread_proxy () from
> >     /opt/hypertable/doug/0.9.8.3/lib/libboost_thread.so.1.54.0
> >     <http://0.9.8.3/lib/libboost_thread.so.1.54.0>
> >     #11 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0
> >     #12 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6
> >     ...
> >
> >     The process is stuck at 100% CPU utilization.  The version of
> >     openssl that the program is linked with is 1.0.2-beta3.
> >
> >     - Doug
> >
> >
> >     On Tue, Oct 21, 2014 at 10:32 AM, Doug Judd <doug@xxxxxxxxxxxxxx
> >     <mailto:doug@xxxxxxxxxxxxxx>> wrote:
> >
> >         I'm developing a multi-host ssh tool using libssh 0.6.3.  The
> >         tool establishes connections asynchronously and in parallel.
> >         Intermittently, the tool will get stuck in a busy loop with a
> >         stack trace such as the following:
> >
> >         Thread 13 (Thread 0x7f65f9b41700 (LWP 13549)):
> >         #0  0x0000003ee3adc613 in poll () from /lib64/libc.so.6
> >         #1  0x0000003ee3b0fe3c in clntudp_call () from /lib64/libc.so.6
> >         #2  0x0000003ee76058bb in do_ypcall () from /lib64/libnsl.so.1
> >         #3  0x0000003ee76060ab in yp_match () from /lib64/libnsl.so.1
> >         #4  0x00007f65f1f14f79 in _nss_nis_getpwuid_r () from
> /lib64/libnss_nis.so.2
> >         #5  0x0000003ee3aaa4ed in getpwuid_r@@GLIBC_2.2.5 () from
> /lib64/libc.so.6
> >         #6  0x00007f6601b0152e in ssh_path_expand_tilde () from
> /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4
> >         #7 <http://0.9.8.2/lib/libssh.so.4#7>  0x00007f6601b02bc3 in
> ssh_options_set () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4
> >         #8 <http://0.9.8.2/lib/libssh.so.4#8>  0x00007f6601b036fb in
> ssh_options_apply () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4
> >         #9 <http://0.9.8.2/lib/libssh.so.4#9>  0x00007f6601af68fe in
> ssh_connect () from /opt/hypertable/doug/0.9.8.2/lib/libssh.so.4
> >         #10 <http://0.9.8.2/lib/libssh.so.4#10> 0x000000000043547a in
> Hypertable::SshSocketHandler::handle(int, int) ()
> >         #11 0x0000000000484b9a in
> Hypertable::IOHandlerRaw::handle_event(epoll_event*, long) ()
> >         #12 0x0000000000492ac4 in
> Hypertable::ReactorRunner::operator()() ()
> >         #13 0x00007f66010f8ce3 in thread_proxy () from
> /opt/hypertable/doug/0.9.8.2/lib/libboost_thread.so.1.54.0
> >         #14 <http://0.9.8.2/lib/libboost_thread.so.1.54.0#14>
> 0x0000003ee42077f1 in start_thread () from /lib64/libpthread.so.0
> >         #15 0x0000003ee3ae5ccd in clone () from /lib64/libc.so.6
> >
> >         I did a little digging around and came across a ticket filed
> >         against sssd <https://fedorahosted.org/sssd/ticket/640> which
> >         I believe is the source of the problem.  It appears that
> >         getpwuid_r() is not thread safe under certain circumstances.
> >
> >         From what I can tell, ssh_connect() will use ~/.ssh as the ssh
> >         directory if one is not explicitly supplied.  It's during the
> >         expansion of the ~ character that getpwuid_r() gets called.
> >         The workaround is to explicitly set the ssh directory using a
> >         path that does not include the ~ character, for example:
> >
> >         char *home = getenv("HOME");
> >         if (home == nullptr)
> >           error("Environment variable HOME is not set");
> >         string ssh_dir(home);
> >         ssh_dir.append("/.ssh");
> >         ssh_options_set(m_ssh_session, SSH_OPTIONS_SSH_DIR,
> >         ssh_dir.c_str());
> >
> >         Attached is a patch to libssh that eliminates the ~ expansion
> >         for the default case (~/.ssh).  In my test environment, the
> >         problem is very intermittent and I don't have a reproducible
> >         test case, so I'm not 100% sure this solution solves the
> >         problem.  However, given the evidence, I think it's a safe bet.
> >
> >         - Doug
> >
> >         --
> >         Doug Judd
> >         www.hypertable.com <http://www.hypertable.com>
> >
> >
> >
> >
> >     --
> >     Doug Judd
> >     CEO, Hypertable Inc.
> >
> >
> >
> >
> > --
> > Doug Judd
> > CEO, Hypertable Inc.
>
>
>
>


-- 
Doug Judd
CEO, Hypertable Inc.

Follow-Ups:
Re: Infinite loop with concurrent ssh_connect() on CentOS 6Doug Judd <doug@xxxxxxxxxxxxxx>
Archive administrator: postmaster@lists.cynapses.org