RBS, duck-typing, meta-programming, and typing at httpx
16 Oct 2020Ruby 3 is just around the corner, and with the recent release candidate, there’s been some experimentation in the ruby community, along with the usual posts, comments and unavoidable rants.
Ruby 3 has 3 major features:
- JIT (since 2.6 in experimental mode)
- Ractors
Thread.scheduler
, aka autofibers- Gradual typing (via rbs)
(aaaaand we’re off by one)
From the point of view of httpx
, JIT is implicit, and Ractors won’t do much for it (although I have to make sure if calls can be made across ractors). The autofibers feature seems to be interesting, and will be experimented with at some point.
But typing is where, IMO, a library like httpx
can immediately get the most benefit from.
Typing is a very controversial topic in the Ruby community. Most of us started our journey as Ruby developers by running away from statically-typed languages (mostly Java), and fell in love with the quick feedback loop and fast prototyping that Ruby, and its lack of typing, enabled. Over time, we’ve crossed the “Peak of inflated Expectations” all the way into the “Through of Disillusionnment”, where the monolithic codebases most of us find ourselves working in, fail in the most unpredictable ways due to runtime errors out of the happy path (NoMethodError
s everywhere), and the act of simply updating external dependencies, let alone big refactorings, introduces so much risk, that most businesses prefer to halt upgrades indefinitely, until it’s 2020 and you’re still running Rails 2 in production.
Different cults rose around the one true solution for the conundrum, TDD being the most famous of them all. The belief was that, through industrious unit testing, functional testing, contract testing, load testing, E2E testing, migration testing, and some more, code will be resilient enough. At the cost of a ratio of 100 lines of test code per 1 line of actual code. But still, those pesky NoMethodError
s keep meddling in our affairs.
So we’ve gone full-circle and typing will save us all from this disaster, just like it does all of those Java projects we ran away from years ago!
How can we have our cake, and eat it too? Matz & co. don’t want to do Java. They want Ruby to be Ruby. How could Ruby be Ruby and have types? It took some time, but the result is here: rbs was announced in July 2020. I won’t bother you with details you can read in the linked articles, but in short, rbs
is the “typing language format”, whereas other tools (such as https://github.com/soutaro/steep or sorbet, Stripe’s type checker will use it to perform type analysis. It caught my attention as soon as I got the guarantee that “duck typing” wasn’t going to be left behind. So I started analysing whether I’d need it.
Does a ruby HTTP library benefit from typing? Based on my experience maintaining it, I guess one can make that argument. Public API is a particular example where some strictness can be beneficial both for the end user, and for the maintainer (example: troubleshooting a bug report, where one can ask the reporter to run the example with type check enabled). It can also work well as extra documentation, and can potentially help me avoid some weird bugs in certain edge cases which can’t sadly be fully overlooked when writing a test.
But Ruby is a hard language to type. At any point, a random anonymous class can be created. An existing class can be modified. Also, I love “duck typing”. Any typing I’ll use has to take that into consideration. Also, there’s the coercion protocols, implicit and explicit. And then there are the “common” interfaces. How can I stay true to Ruby, keep my developer happiness and not go full Java?
So I’ve decided to start integrating it. Not go full-types yet. At first as an expirement, to see whether I can “bend” type declarations to my will, while also finding limitations early in the process, and potentially contribute some feedback to the rbs
team.
This is the chronicle of that journey.
Start me up
rbs
type definitions are done in a separate file than the source code being typed. The convention seems to be that, while your code goes to lib/
, signatures go to sig/
(this is at least what the core team has been doing with the stdlib gems). There’s some controversy about maintaining signatures in separate files, but bear in mind that, if you care about backwards-compatibility, that this way, your code can also be run in ruby 2.x free of modifications. More on that later.
My first step was integrating type-checking in the project, and the most obvious process is the test suite. I’ve decided to not go full-blown static analyzer yet, so I’m using plain rbs
runtime signature checks. This will instrument method calls and check whether they’re compliant with the corresponding signature. This strategy only covers the code you run, or in this case, my tests run. You can activate it defining the following environment variables:
# the first if you're using bundler, the second because you need to load it
> export RUBYOPT='-rbundler/setup -rrbs/test/setup'
# raise an exception when there's a type check violation
> export RBS_TEST_RAISE=true
# control log verbosity of rbs
> export RBS_TEST_LOGLEVEL=error
# point where the definitions are. Also, require the stdlib modules whose interfaces you'll use (in my case, uri and json)
> export RBS_TEST_OPT='-Isig -ruri -rjson'
# which namespaces will be checked
> export RBS_TEST_TARGET='HTTPX*'
Then I picked up a class to type check, HTTPX::Session
(do start with things your users use), and I prototyped it:
> bundle exec rbs prototype runtime -rhttpx HTTPX::Session > sig/session.rbs
And then I started the “type - run tests - fix or refactor” loop.
First impressions
My initial work was mostly aided by the guides, which are very sparse right now (hopefully there’ll be some improvement there). It’s a good starting point, nevertheless.
The syntax grammar is a bit more restrictive than plain ruby. For instance:
- interfaces must be prefixed with “_” (ex:
interface _Sweet
); - composite types must start with lower-case (ex:
type candybar
);
I’ve learnt this the hard way, by interpreting the errors being raised while my tests were running, and scratching my head hard at the error messages.
Method signatures also have their quirks. For instance, optional types have a ?
suffix. For instance, Integer?
means “an integer or nil”. However, in parameters, it’s the other way round, as the suffix ?
has a different meaning:
# a(1) #=> valid
# a(nil) #=> valid
# a() #=> invalid
def a(?Integer size) -> Integer
# a() #=> now it's also valid
def a(?Integer? size) -> Integer
# and this return integer or nil
def a(?Integer? size) -> Integer?
# optional kwarg :size, can be integer or nil
def a(?size: Integer?) -> Integer
This was rather confusing at first, and again, error messages didn’t help.
The “module function” method signatures are also confusing, as they again rely on the ?
suffix for self
, which feels like the wrong token to convey meaning:
# this is how you define a signature for
# a module function
# module Math
# module_function
# def sqrt
# ...
def self?.sqrt: (Numeric) -> Numeric
After some time around these concepts, I started moving forward, and all started making more sense.
Plateau of productivity
If you have some experience with type checking, you’ll find a lot of the familiar concepts in rbs
, albeit named differently. You get classes, interfaces, Unions, Intersections, Tuples, type and method alias, literals, Type variables, you name it.
Then you have a few “ruby-related” definitions, such as ivar definitions, singletons, mixins, visibility (only public
and private
, another nail in the protected
coffin), Proc/block signatures, and a few more interesting concepts.
Quacking like a type
Your first though will be to sign a method with class types. You’ll find yourself looking at:
def log(message)
puts "log: #{message}"
end
and you’ll define it as:
def log: (String) -> void
However, in theory you’ll want to log anything, aka its string representation. In “duck typing” lingo, “anything that quacks #to_s
”. So you’ll define something like:
interface Stringable:
def to_s () -> String
end
def log: (Stringable) -> void
Bam, interfaces ftw. But hold on, this is a pretty common interface! Surely rbs
figured that out already! Well, yes it did! It’s not widely documented yet (let’s hold for the official release), but there are common interfaces already defined for you, along with aliased types, which is what rbs
uses in its own stdlib definitions. For instance, for our example above, there’s already a _ToS
interface, and a string
type joining the _ToStr
interface (implements #to_str
) with the String
class:
def log: (string) -> void
(There are analogous int
and real
types, and probably more will follow.)
This also helped me deal with my own definitions. For instance, httpx
request methods receive a uri
parameter. It’s not very clear from the documentation, but besides a string, a URI
object can also be passed as an argument. So my signature for a uri became:
type uri = URI::HTTP | URI::HTTPS | string
Also, a lot of methods receive headers or options, which can be instances of HTTPX::Headers
or HTTPX::Options
, but also plain hashes and arrays. So I’ve also done this:
# for headers
type headers_value = string | Array[string]
type headers_hash = Hash[String | Symbol, headers_value]
type headers = Headers | headers_hash
# for options
type options = Options | Hash[Symbol | String, untyped]
A combination of these strategies and judicious use of existing and custom interfaces allowed me to continue using “duck-typing”.
Learnings from go
httpx
makes heavy use of common IO-related implicit interfaces. For instance, instances of classes defined under lib/httpx/io
, and both request and response bodies, can be used with stdlib methods like IO.select
or IO.copy_stream
, by implementing #read
, #write
and/or #to_io
.
These have all been implicit until now, but we can now make them explicit, by defining their interfaces. This would be very similar to how go defines the Reader and Writer types, both very simple, but with a lot of intrinsic meaning. go’s structural typing have actually been referenced as an example of “duck-typing done right”.
rbs
didn’t have them implemented like that though, but work is underway to make them a reality come ruby 3.
Until then, httpx
defines these interfaces internally.
Under construction
There’s still work to be done, though. Ruby is not Java, and is certainly not Javascript, so one can’t just get away with “importing” what these type systems can do; rbs
-novel ways have to be figured out to express advanced meta-programming. For instance, what can we do with runtime-level module includes, or even anonymous classes? Any complete ruby type system will have to deal at some point with those.
Also, rbs
runtime check module is very recent, so there are a lot of rough edges. I’ve been communicating some of my findings via issue, feature or merge requests to the rbs
team. This is my personal wishlist for Christmas 2020.
Better error messages
This is what happens if a function returns an object from the incorrect type:
# rbs
def bytesize: () -> String
# then your code does smth:
buffer.bytesize
#RBS::Test::Tester::TypeError: TypeError: [HTTPX::Response::Body#bytesize] ReturnTypeError: expected `::string` but returns `3`
# /rbs/lib/rbs/test/tester.rb:156:in `call'
# /rbs/lib/rbs/test/observer.rb:8:in `notify'
# /rbs/lib/rbs/test/hook.rb:146:in `bytesize__with__RBS_TEST_c7b28f'
# test/response_test.rb:78:in `test_response_body_read'
This is when a definition for a type isn’t found:
def bytesize: () -> integer
# RuntimeError: Unknown name for expand_alias: integer
# /rbs/lib/rbs/definition_builder.rb:1154:in `expand_alias'
# /rbs/lib/rbs/test/type_check.rb:304:in `value'
# /rbs/lib/rbs/test/type_check.rb:93:in `return'
# /rbs/lib/rbs/test/type_check.rb:47:in `method_call'
# /rbs/lib/rbs/test/type_check.rb:23:in `block in overloaded_call'
# /rbs/lib/rbs/test/type_check.rb:22:in `map'
# /rbs/lib/rbs/test/type_check.rb:22:in `overloaded_call'
# /rbs/lib/rbs/test/tester.rb:150:in `call'
# /rbs/lib/rbs/test/observer.rb:8:in `notify'
# /rbs/lib/rbs/test/hook.rb:146:in `bytesize__with__RBS_TEST_0d7e38'
# test/response_test.rb:78:in `test_response_body_read'
This is a syntax error in a signature definition:
def bytesize: () - Integer
# parser.y:1380:in `on_error': parse error on value: #<RBS::Parser::LocatedValue:0x0000559907a84418 @location=#<RBS::Location:1860 @buffer=sig/response.rbs, @pos=971...972, source='-', start_line=44, start_column=23>, @value="-"> (tOPERATOR) (RBS::Parser::SyntaxError)
# from (eval):3:in `_racc_do_parse_c'
# from (eval):3:in `do_parse'
# from parser.y:1110:in `parse_signature'
# from /rbs/lib/rbs/environment_loader.rb:134:in `block in each_decl'
# from /rbs/lib/rbs/environment_loader.rb:132:in `each'
# from /rbs/lib/rbs/environment_loader.rb:132:in `each_decl'
# from /rbs/lib/rbs/environment_loader.rb:147:in `load'
# from /rbs/lib/rbs/environment.rb:130:in `block in from_loader'
# from <internal:kernel>:90:in `tap'
# from /rbs/lib/rbs/environment.rb:129:in `from_loader'
# from /rbs/lib/rbs/test/setup.rb:41:in `<top (required)>'
# from /usr/local/bundle/bin/bundle:in `require'
(the right definition is def bytesize: () -> Integer
.)
What’s wrong? Well, they’re just plain exception backtraces. They give you enough information for you to know what and where went wrong (except in the second case, I have grep
to find that keyword), but they’re not user-friendly. A proper type-checker (even a runtime one) will have to do much better than that.
Elm became renowned for the compiler errors UX, so much that Rust made it a goal to reach its standard. I know that it’s still early days, but I think that rbs
can get there too.
Runtime require support
(Reported)
Signatures are loaded at boot time, and break if the typed class/module isn’t available yet. I had to patch this behaviour by loading all plugins ahead of time when rbs
is available, which is obviously something I’d like to avoid.
() -> void alias
(Reported)
def do_that: () -> void
This’ll be a very repeated, albeit pointless, method signature, which begs to be aliased into something shorter. In the spirit of “DRY”, here’s hoping the rbs
team figures out a way to put all of these definnitions in one basket.
Exceptions
As of the time of writing this post, there is not yet a way to declare that a function may raise an exception (or throw
something). This is particularly important in methods that seem harmless, but may fail unexpectedly, such as TCPSocket#close
, which most of times just closes the socket, but may fail with an Errno
, such as when sending the FIN
packet fails. (is it Errno::ECONNRESET
? Can’t remember.)
Although not off the table, it seems that such a feature won’t make it to ruby 3.0.
Delegated methods
Ruby has a few ways to delegate methods to another object (usually an instance variable). This is a very common ruby idiom, and will need an easy way to signal these delagations.
Here’s my “napkin” proposal:
class House
@owner: Person
define_from @owner
# or
define_from @owner, first_name, last_name
...
method_missing
(Reported)
How do you type a method_missing
handler? It’s a bit difficult, as we’re in “shit happens” territory.
In most cases though, method_missing
just dynamically codes delegation to instance variables based on runtime rules. So if using the technique described above could work (we should anyway define respond_to_missing?
along, so we know what methods are accepted most of the times), I’d be a happy dev already.
Subclasses
(I really wish this one makes it to ruby 3.0.)
In rbs
, one uses singleton(MyClass)
to refer to the MyClass
class. For example, if a method returns that class, the signature would be:
def get_class: () -> singleton(MyClass)
However, there’s no way to declare a subclass of MyClass
.
And this is just a small part of my biggest wish for rbs
.
(sorbet
implements this as T.attached_class
).
Dynamic Classes
Probably the most difficult and the most ambitious of my “wish” features, dynamic classes will be a challenge to type, once we (hopefully) get a proposal off the ground.
httpx plugins rely on a heavy dose of meta-programming, with HTTPX
core classes being extended in runtime in a contained way, using anonymized subclasses and mixins. I came up with a “clunky” way of typing them, but it’s a bit limited.
If you think that such meta-programming is rare, I’ll disclaim here that I didn’t come up with this design myself, as I “borrowed” it from sequel
, roda
, rodauth
and shrine
, all of them very popular gems.
I’m not getting my hopes up with this one for the ruby 3.0 release though. I can recognize a difficult task when I see one.
Quick wins
A question I did to myself while typing httpx
was “what do you expect to gain now?”. I mean, rbs
, and typing in Ruby, is a seed, and it’ll take months (years?), until the community reaps tangible benefits. The standard library will have to be fully typed (it’s an ongoing effort as of the time of writing this post, and they need your help), the baseline / most common transitive libraries will have to as well, and one day, the “ripple effect” engulfs us in a sea of typed ruby. So why not wait it out?
I decided to go ahead and type now. Here’s what I found out.
Unknown bugs
Type checking evangelists always mention the necessity of having an ultra-comprehensive test suite in an untyped language, because you just don’t know how your APIs are going to be (ab)used. I always thought there was some truth in this statement, and I build a pretty comprehensive test suite around httpx
public APIs.
And yet, a bug slipped through the cracks: HTTPX::Session#build_requests
was handling 2 or 3 arguments the same way, although the iteration block clearly only handled 2. Besides that, the second parameter should accept any object implementing #each
. No unit test was ever written for this, so I never noticed.
Sure, both cases were probably best considered “edge cases”, and to this day no one complained, so probably the APIs aren’t being abused just yet (or everybody’s “partying hard”, if you know what I mean). Nevertheless, I’m left thinking how would such an error be described by a confused user of the library, and how typing just eliminated that conversation altogether.
Interface segregation
Although APIs in Ruby are notoriously “bloated”, that doesn’t mean your library needs to be. And in fact, I make “keeping APIs frugal” an over-arching goal of the project, and forego the “magic human readable method fatigue”, so pervasive in a lot of ruby libraries in general, HTTP clients in particular (looking at you, all libraries implementing a response.ok?
).
Internally, particular implicit interfaces, such as the Encoder/Decoder
concept, or the several IO “ducks”, were designed with this goal in mind. And they mostly worked. However, while type-checking, I found out that somehow the abstraction leaked.
In the process of building the :compression
plugins, I reused the Encoder/Decoder
APIs internally, and during the implementation process, I let some accidental complexity leak. This can be seen by how the compression/brotli
encoder turned out: finish
and close
methods were defined just because they were needed for the gzip
and deflate
compression plugins, which came first, and these didn’t work around the zlib
APIs.
While typing, it became clear that deferring to the zlib
APIs wasn’t right. So:
- compression plugins now implement the
_Deflater
and_Inflater
, which implement onlyinflate
anddeflate
; - bookkeeping of the “inflating” process does not leak to the
Response
anymore;
Clean out the trash
In the process of typing, I just found out that some methods were needlessly defined. In internal structures, there was no registered use of them. In more “public” data structures, they were just not adding any value, and were neither tested nor documented.
Given that the cost of maintaining unused code just raised (I now have to keep the implementation and the type signature), it made me think about whether it was worth keeping it.
So I just removed them altogether.
Conclusion
httpx
isn’t fully typed yet (and neither is the standard library), and rbs
still has a few rough edges. There’s also not a lot of tooling around to make the experience even more productive (I’ve defined all the signatures in Sublime Text without a proper syntax highlighting plugin). There will probably be a few iterations until there’s sufficient community buy-in and we start to see the compounding benefits paying off.
All that being said, I’m pretty satisfied with this experiment. I’ve extracted enough value from it to make it worthwhile, and can estimate further benefits of typing even more code.
What about Sorbet?
Recently, Brandur Leach from Stripe wrote a post about rbs
and Ruby 3.0, where he expresses some disappointment for sorbet
, Stripe’s ruby type checker and type syntax language, not having been officially adopted for Ruby 3.
Diversity of opinions is a good thing, and I’ll try my best to expose why I don’t agree with some of the statements made in the post. And although I’ve only known rbs
for the last 2/3 weeks, that’s more experience than I have using sorbet
, so feel free to correct me somewhere in the internet where this might get shared.
.rbs files
Instead, developers specify type signatures in separate .rbs files that mirror the declarations of a companion “.rb” file…
You can probably tell by now that I think this is a mistake of fairly colossal proportion…
The first issue the author has with rbs
is defining the signatures in a separate file. It’s a fair argument, as it’s definitely a deviation from the more standard way of typing.
However, “separate signature files” is not a novelty: sorbet
also supports rbi
files, which is the recommended way to type a 3rd-party gem you don’t control and has no type annotations (check sorbet-typed for a collection of types for popular libraries). In the post, *.d.ts
Typescript files are also mentioned, as a way to do the same for 3rd party Javascript code. And although this is suggested “only for code you don’t control”, there are other examples in the wild of holding signatures in two separate files. Like Java Interfaces. Or C header files. Both of them to serve different use cases from one another and from the “rbi/d.ts” case. Not using the examples as “good standards”, just mentioning that rbs
didn’t invent it.
In the rbs
case, keeping signatures separate allows for code to run in older versions of ruby. IF you’ve been following Matz’s talks in the last years, you know he wants to avoid a Python 2-to-3 catastrophe. And personally, I’d like to avoid using a “transpiler” to distribute ruby 2 compatible code (a whole different discussion).
Considering this, is it wrong about keeping those signatures separate?
Humans reading types
And while static analysis is great, we shouldn’t forget that type signatures are for people too. Being able to see what the expected types of any particular variable or method while reading code is a huge boon for comprehension.
I guess every person will see this differently. While the author claims that people want to see the types, lots has been written about how type annotation verbosity makes code very hard to read (see this or this). I guess the truth lies in the middle, and maybe humans want to see type information when it’s relevant to them.
And when is that? I’d argue that function type information gives the most value when one uses it, more than when one reads it; AKA the “Ctrl + Space” feature from most IDEs. And rbs
will deliver on that, same as sorbet
already does. Text editors will also have to catch up.
(it might make code harder to write, or make IDEs more needed. Or you just don’t type.)
python vs. ruby
…it’s disappointing to see it fall yet another step behind its sister language, Python. Along with better performance, much better documentation, a concurrency model, and an ever growing popularity disparity in Python’s favor, Python can now definitively boast the better type system, despite Ruby having had more than five years longer to think about its design and implementation.
I don’t even know where to start. I do a lot of python daily as part of my day-job, and I don’t get on what these bold claims are based on.
Regarding performance, it is known that both languages aren’t very fast, that ruby caught up with python in most synthetic benchmarks since 2.0, that python in fact regressed from 2.7 to 3, only having recovered recently.
Type hinting has been around since v3.5 (released in 2015). Due to the less “elastic” nature of python, is probably easier to understand. However, types aren’t that widespread in that community: popular packages such as requests or boto3 still don’t ship with type hinting (is it because signatures are done in the same file, and they still support older python versions?) Also, type hinting support for async
functions is still limited (can’t type async generators yet). We also prefer not to type at work.
(in 5 years, ruby might be here.)
Regarding documentation, I can’t really provide numbers to back the claim. I guess one finds good examples in both ecosystems(?).
But the claim that got my attention the most was “… a concurrency model”, linking to an asyncio
page. I maintain both asyncio
and standard python at work, and I can say that, if someone would move me away from the asyncio
projects, I’d be a lot happier. Its completely different execution model and reliance on async/await
keywords makes it look like a language within a language, and in fact it originated an ecosystem within an ecosystem, as networked libraries have to be duplicated for both IO models (requests/aiohttp
, or boto3/aioboto
, are some examples). This article perfectly summarizes the problems with asyncio
.
Sorry, nothing to do with types, but I had to take this out of my chest. Moving on.
Type syntax
Although sorbet
seems to fit Stripe’s requirements in regards to the health of its codebase, it is verbose, and not so easy to read.
Compare T.nilable(String)
to String?
. Or sig {params(name: String).returns(Integer)}
to def meth: (String) -> Integer
.
It also doesn’t seem to support coercion, or at least no one bothered yet to define what rbs
calls the built-in types.
On the other hand, it already supports subclasses. And supports inlining (via T.let
). And the UX of the static analyzer error messages seem very friendly to me. And their static analyzer is very fast, according to what I read.
Stripe <3
I love Stripe (everyone who worked with me can attest my respect for what the company achieved), and I received sorbet
positively; it deserves credit for moving the type-checking discussion forward in the ruby community.
But even Brandur agrees that (quote) “As nice as Sorbet is, an officially endorsed solution that the entire community can rally around is far preferable to one being developed by a single company on the side”.
When rbs
eventually supports the rbs
syntax, all the community will be better for it.