Lokasi ngalangkungan proxy:   [ UP ]  
[Ngawartoskeun bug]   [Panyetelan cookie]                
Skip to content

ssl: support IO-like object as the underlying transport#736

Open
rhenium wants to merge 13 commits into
ruby:masterfrom
rhenium:ky/ssl-ruby-io
Open

ssl: support IO-like object as the underlying transport#736
rhenium wants to merge 13 commits into
ruby:masterfrom
rhenium:ky/ssl-ruby-io

Conversation

@rhenium

@rhenium rhenium commented Mar 27, 2024

Copy link
Copy Markdown
Member

An implementation of #731. The test suite passes on my box, but it needs more testing especially around the error handling.

This adds support for IO-like object that is not backed by a file descriptor by defining a BIO_METHOD to wrap the following Ruby methods.

  • #read_nonblock(len, exception: false)
  • #write_nonblock(str, exception: false)
  • #wait_readable
  • #wait_writable
  • #flush
  • #closed?
  • #close

Comment thread ext/openssl/ossl_bio.c Outdated
Comment thread ext/openssl/ossl_bio.c Outdated
Comment thread ext/openssl/ossl_bio.c Outdated
BIO_clear_retry_flags(p->bio);

VALUE fargs[] = { INT2NUM(p->dlen), nonblock_kwargs };
VALUE ret = rb_funcallv_public_kw(io, rb_intern("read_nonblock"),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should't this ret be somhow marked so it's not moved? (thinking of compaction here).

probably too early for optimizations, ,but this bit could benefit of a socket-local string to act as a buffer (second argument of :read_nonblock)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The content is copied in this function and the String object won't be kept, so it shouldn't be necessary.

probably too early for optimizations, ,but this bit could benefit of a socket-local string to act as a buffer (second argument of :read_nonblock)

Yes, this is definitely a possible optimization.

Comment thread ext/openssl/ossl_bio.c Outdated
Comment thread ext/openssl/ossl_ssl.c Outdated
Comment thread ext/openssl/ossl_ssl.c
@HoneyryderChuck

Copy link
Copy Markdown
Contributor

@rhenium anything I can help with to move this forward?

@HoneyryderChuck

Copy link
Copy Markdown
Contributor

@rhenium friendly ping

@rhenium

rhenium commented Mar 5, 2025

Copy link
Copy Markdown
Member Author

I just pushed what I have in my local branch so far, rebased on top of current master to resolve merge conflicts. This is not ready to merge yet.

Some unresolved issues:

  • We have many configurable callbacks that may be triggered by various methods on SSLSocket. If a callback attempts a tag jump (e.g., raising an exception), it is caught by rb_protect() and the jump tag is stored in the ID_callback_state ivar. After returning from the OpenSSL function, the ivar is checked and the jump is resumed. Currently, if a callback fails, no other callbacks will be executed from the same OpenSSL function call (unless I'm mistaken), so this is working as expected.

    A jump from some callbacks is considered critical to the connection and a TLS alert must be sent to the peer. For example, an exception in SSLContext#servername_cb should cause an unrecognized_name alert to be sent to the client.

    Since writing to the underlying socket requires running Ruby code, this means we need to execute Ruby code while a jump is still on hold. This PR currently doesn't try to handle this, and it's possible to cause a segfault (as you can see in the two failing test cases I added in a [DO NOT MERGE] commit). How should this be fixed?

    I initially tried saving rb_errinfo() temporarily and restoring later with rb_set_errinfo(), but this can't work for all cases because rb_set_errinfo() only accepts exceptions.

  • If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?

  • OpenSSL suppresses a certain kind of errors when writing TLS 1.3 NewSessionTicket to the underlying socket, when it is harmless (https://github.com/openssl/openssl/blob/e599893a9fec932701ca824d73a794a0c9ce02e9/ssl/statem/statem_srvr.c#L852-L870). While we could rescue exceptions like Errno::ECONNRESET raised by underlying_socket.write_nonblock to replicate the behavior, I'm not sure if that would be a good idea. This PR currently doesn't handle this. This is a slight difference compared to when using the socket BIO.

@HoneyryderChuck

Copy link
Copy Markdown
Contributor

This PR currently doesn't try to handle this, and it's possible to cause a segfault

Indeed, I didn't know that doing that was dangerous, though that rb_protect could deal with it.

One option could be to assume that the IO-like object must indeed be an SSLSocket, and one could deal with it by setting an ivar "flag" telling the internal implementation to skip 2nd-level rb_protect/rb_jump_tag? Other than that, I'm out of ideas.

If both a callback and the underlying socket does a jump in the same single OpenSSL function call

Can they happen at the same time? Perhaps suggestion above deals with it?

An alternative approach could be to go back to the drawing board and try again the approach using BIO_mem. I think this approach could be more pragmatic, at the cost of having to do a lot of "callback to rubyland", just not sure at this point if the entension API allows that.

@rhenium

rhenium commented Mar 10, 2025

Copy link
Copy Markdown
Member Author

Can they happen at the same time?

Please see test_synthetic_io_errors_in_callback_and_socket for an example. SSLContext#servername_cb is triggered when the server receives a ClientHello. If the callback doesn't succeed, the server then has to send an unrecognized_name alert. Both happen inside the same SSL_accept() function call.

I think the segfault is fixable by (ab)using rb_ensure(), but I wasn't sure about in which way I should make it work:

If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?

@rhenium rhenium marked this pull request as draft March 12, 2025 16:33
@HoneyryderChuck

Copy link
Copy Markdown
Contributor

Sorry for not replying sooner, have been stumped with other chores. I think "last exception wins" is reasonable, and considering this is a new feature, it's not really breaking compatibility.

Lmk when you work around the jump tag issue, I can run a few sanity checks on my side as well.

@rhenium

rhenium commented Mar 30, 2025

Copy link
Copy Markdown
Member Author

This branch needs more polishing, but it should be working aside from some edge cases. I'd appreciate if you could test it.

I should have worded more accurately, but the last commit changed it so that the later tag jump (which includes raising an exception) wins against the former one. This should fix the segfault.

One concern is that this can accidentally suppress an important jump like the one created by Thread#kill. Please see this Ruby issue for details: https://bugs.ruby-lang.org/issues/13882

I haven't come up with a nice solution for this.

@HoneyryderChuck

Copy link
Copy Markdown
Contributor

just a heads-up that I've been trying to incorporate this in my branch's CI, but having issues due to C extension compilation and openssl being an stdlib I guess (opened a ticket here, in case you know why that happens and can recommend a workaround).

@rhenium

rhenium commented Apr 8, 2025

Copy link
Copy Markdown
Member Author

just a heads-up that I've been trying to incorporate this in my branch's CI, but having issues due to C extension compilation and openssl being an stdlib I guess (opened a ticket here, in case you know why that happens and can recommend a workaround).

That Gemfile snippet works as expected for me (ruby-build Ruby 3.4 on Linux). I'm not sure why it's failing for you. What happens if you clone the repository, run gem build to make a .gem archive, and try installing it?

@HoneyryderChuck

Copy link
Copy Markdown
Contributor

I did some effort to build openssl from branch in httpx CI, and now I'm getting this error a bit all over the place:

FaradayTest#test_adapter_get_handles_compression:
RangeError: integer 139928495023720 too big to convert to 'int'
    /ruby-openssl/lib/openssl/buffering.rb:433:in 'OpenSSL::SSL::SSLSocket#syswrite_nonblock'
    /ruby-openssl/lib/openssl/buffering.rb:433:in 'OpenSSL::Buffering#write_nonblock'
    lib/httpx/io/tcp.rb:129:in 'HTTPX::TCP#write'

I may only be able to do some investigation in one week.

@rhenium

rhenium commented Apr 16, 2025

Copy link
Copy Markdown
Member Author

RangeError: integer 139928495023720 too big to convert to 'int'

I was able to reproduce it locally. This is a bug in #syswrite{,_nonblock} with a String-convertible object that responds to #to_str (such as HTTPX::Buffer). #881 should fix this.

@HoneyryderChuck

Copy link
Copy Markdown
Contributor

thx, tests are passing now 👍

@liath

liath commented May 14, 2025

Copy link
Copy Markdown

Just spent the day scratching my head over Net::HTTP's behavior around proxies and didn't see the tickets that led to this PR until now lol
Anything I can do to help here?

@rhenium

rhenium commented Jan 23, 2026

Copy link
Copy Markdown
Member Author

Rebased on master to resolve conflicts caused by the recent tab expansion. Two preparatory commits from this PR have been merged separately (46c9223 ac8df7f).

rhenium added 6 commits July 5, 2026 18:22
We can implement this now that we require Ruby 2.7 or later.

The return value of OpenSSL's *_set_ex_data() functions is not checked
since replacing an existing index should not fail.
We will need to store additional internal states in SSLSocket in the
upcoming patches. Prepare a new struct for such purposes and store it
in SSL's ex_data.

Refactor existing internal states that are local to C to use this new
struct:

  - The back reference to the Ruby object, currently in another ex_data
  - The callback jump tag, currently in an instance variable
Do not attempt to execute Ruby callback when another callback has
already raised an exception and we are waiting for a chance to call
rb_jump_tag().

As far as I know, this never happens currently. However, an upcoming
patch to use custom BIO_METHOD in SSLSocket may make it possible. Let's
handle it explicitly.
Set the callback to SSL_CTX instead of SSL so that we do not have to
configure every SSL object, similar to all other existing callbacks.

Also, wrap rb_funcall() in rb_protect() to prevent exiting from
SSL_accept() prematurely.
Re-raise exceptions raised inside client_cert_cb from the originating
Ruby method, which can be any of #connect, #sysread*, or #syswrite*.
Look up the callback procs by checking the corresponding instance
variables on SSLContext directly. The removed methods
SSLSocket#session_new_cb and #session_get_cb are private and have not
been part of the public interface of ruby/openssl.

Also remove the unused method SSLSocket#client_cert_cb.
Prepare the argument array object inside rb_protect(). The callback is
already called in it, but the array allocation can also potentially
fail and needs to be done there.
rhenium added 6 commits July 5, 2026 19:37
Store the IO object in struct ossl_ssl_data instead of an instance
variable that is writable from Ruby.
Implement a bare minimal BIO_METHOD required for SSL/TLS. The
underlying IO-like object must implement the following methods:

 - #read_nonblock(len, exception: false)
 - #write_nonblock(str, exception: false)
 - #flush

A later commit will wire it into OpenSSL::SSL::SSLSocket.
The result value is used for generating an informative error message
for OpenSSL::SSL::SSLError. We can simply skip it if it is not
available.
Assume true (no buffering) if the underlying IO-like object does not
respond to sync.
There are use cases to establish a TLS connection on top of a non-OS
stream, such as another TLS connection. To do this today, a workaround
using dummy socket pairs is necessary.

Currently, OpenSSL::SSL::SSLSocket.new requires a T_FILE object because
we pass the file descriptor the OpenSSL. This patch relaxes this to
allow any Ruby object that responds to non-blocking IO methods, such as
read_nonblock.

OpenSSL's TLS implementation uses an IO abstraction layer called BIO to
interact with the underlying socket. By passing the file descriptor to
SSL_set_fd(), a BIO with the BIO_s_socket() BIO_METHOD is implicitly
created. We can set up a BIO using our own custom BIO_METHOD
implementation to redirect read/write request to Ruby method calls.
The previous patch added such a BIO_METHOD implementation.

We continue to use the BIO_s_socket() BIO_METHOD if the user passes a
real IO object because it is more efficient. Behavior change is not
expected for existing programs.
Let's see what will break with this.
@rhenium rhenium marked this pull request as ready for review July 5, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants