<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="honeyryderchuck.gitlab.io/feed.xml" rel="self" type="application/atom+xml" /><link href="honeyryderchuck.gitlab.io/" rel="alternate" type="text/html" /><updated>2026-03-11T13:06:30+00:00</updated><id>honeyryderchuck.gitlab.io/feed.xml</id><title type="html">honeyryder</title><subtitle>Welcome to the house of chuck.</subtitle><entry><title type="html">Context: the missing API in ruby logger</title><link href="honeyryderchuck.gitlab.io/2025/11/12/context-missing-api-in-logger.html" rel="alternate" type="text/html" title="Context: the missing API in ruby logger" /><published>2025-11-12T00:00:00+00:00</published><updated>2025-11-12T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2025/11/12/context-missing-api-in-logger</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2025/11/12/context-missing-api-in-logger.html"><![CDATA[<p>Over the last few years, I’ve spent quite a significant chunk of my “dayjob” time working on, and thinking about, observability in general, and <strong>logging</strong> in particular. After a lot of rewriting and overwriting, “<del>don’t</del> repeat yourself” and coping with ecosystem limitations, I figured it was time to write a blog post on the current <em>state of the art</em> of logging in ruby, what I think it’s missing and what I’m doing about it.</p>

<h2 id="what-is-logging">What is logging?</h2>

<p>(skip this section if you’re above being lectured about what’s logging again).</p>

<p><strong>Logging</strong> is one of those fundamental features of <strong>any</strong> type of program you use. At a high level, it keeps a <strong>record of what a program is and has been doing</strong>, be it error messages, or general information, that can be used for audit trail, debugging issues, or generally just figuring out what the hell is happening with a process.</p>

<p>Because this is a feature as old as time, a lot of energy has been spent trying to standardize it. The generally most accepted <strong>standard</strong> (in <strong>UNIX</strong> corners at least) has been the <a href="https://en.wikipedia.org/wiki/Syslog">Syslog</a> standard, which separates the <strong>program generating the message</strong> (ex: logging library interface writing to stdout, or a file, or a socket, or all at the same time) from the <strong>program managing its storage</strong> (ex: <code class="language-plaintext highlighter-rouge">logrotate</code>, <code class="language-plaintext highlighter-rouge">logstash</code>…) and the <strong>program reporting/analysing it</strong> (ex: <code class="language-plaintext highlighter-rouge">kibana</code>, or plain <code class="language-plaintext highlighter-rouge">tail</code> and <code class="language-plaintext highlighter-rouge">grep</code>).</p>

<p>(Even) more standards have existed for the message format, which may depend of the type of program you’re using (an example being the <a href="https://en.wikipedia.org/wiki/Common_Log_Format">common log format for server logs</a>). Some general rules are agreed upon though, such as: there is a log entry per line, a log entry should identify its <strong>severity level</strong> (examples: “debug”, “info”, “error”, “warn”, “alert”, …), and contain a <strong>timestamp</strong>, besides the actual log <strong>message</strong>.</p>

<h2 id="logging-in-ruby">Logging in ruby</h2>

<p>The <code class="language-plaintext highlighter-rouge">ruby</code> gateway to logging is the <a href="https://github.com/ruby/logger">logger</a> standard library. In a nutshell, users log by using <a href="https://rubyapi.org/3.3/o/logger">Logger</a> objects, which know <strong>where</strong> to write them (internally called “log device”), and <strong>how</strong> to write them (“formatter”):</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"logger"</span>

<span class="c1"># logger which writes messages to standard out</span>
<span class="n">logger</span> <span class="o">=</span> <span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">STDOUT</span><span class="p">)</span>

<span class="c1"># writes debug message with the default message format:</span>
<span class="c1">#=&gt; $&lt;capital letter for severity level&gt;, [$&lt;timestamp ruby to_s&gt; #$&lt;process id&gt;] $&lt;severity full again, why, we know it already&gt; -- : $&lt;log message&gt;</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">debug</span> <span class="s2">"foo"</span>
<span class="c1">#=&gt; D, [2025-11-05T12:10:08.282220 #72227] DEBUG -- : foo</span>

<span class="c1"># only writes messages with INFO level or higher</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info!</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"foo"</span>
<span class="c1">#=&gt; I, [2025-11-05T12:10:54.862196 #72227]  INFO -- : foo</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">debug</span> <span class="s2">"foo"</span>
<span class="c1">#=&gt;</span>
<span class="c1"># use block notation to avoid allocation the message string</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">debug</span> <span class="p">{</span> <span class="s2">"foo"</span> <span class="p">}</span>
<span class="c1">#=&gt;</span>

<span class="k">class</span> <span class="nc">MyCustomFormatter</span>
  <span class="c1"># formatters must at least implement this method</span>
  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">severity</span><span class="p">,</span> <span class="n">time</span><span class="p">,</span> <span class="n">progname</span><span class="p">,</span> <span class="n">msg</span><span class="p">)</span>
    <span class="s2">"my format -&gt; </span><span class="si">#{</span><span class="n">msg</span><span class="si">}</span><span class="s2">"</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># swap formatter</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">formatter</span> <span class="o">=</span> <span class="no">MyCustomFormatter</span><span class="p">.</span><span class="nf">new</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span> <span class="p">{</span> <span class="s2">"foo"</span> <span class="p">}</span>
<span class="c1">#=&gt; "my format -&gt; foo"</span>

<span class="c1"># enable daily log rotation</span>
<span class="n">daily_logger</span> <span class="o">=</span> <span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"my.log"</span><span class="p">,</span> <span class="ss">:daily</span><span class="p">)</span>
<span class="n">daily_logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"foo"</span> <span class="c1">#=&gt; writes log entry into my.log</span>
<span class="c1"># sleep for one day...</span>
<span class="n">daily_logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"foo"</span> <span class="c1">#=&gt; will rename my.log to my.log.1 and write new message to brand new my.log file</span>
</code></pre></div></div>

<p><code class="language-plaintext highlighter-rouge">logger</code> is a mixed bag. The default formatter is certainly unusual (although it feels like every programming language has its own default logging format, so perhaps an historical artifact?), and considering <code class="language-plaintext highlighter-rouge">ruby</code>’s historical UNIX friendliness, I’m always surprised that default messages do not include the system user. Swapping the formatter is easy though.</p>

<p>The Log device interface feels a bit more limiting. While writing to stdout/stderr or a file is easy, writing to a socket (like a syslog server) is much harder than it needs to be (you have to write your own <a href="https://docs.ruby-lang.org/en/3.3/Logger/LogDevice.html">Logger::LogDevice</a> subclass). It also works a bit counter to the Syslog standard described above, as, being a utility to “streamline the generation of messages”, it shouldn’t really care about storing details (such as log rotation), and doesn’t support the ability to write to multiple locations at once.</p>

<p>Still, it’s rather straightforward to use, as long as none of the limitations mentioned above matter to you.</p>

<h2 id="logging-in-rack">Logging in rack</h2>

<p>One of the main uses of <code class="language-plaintext highlighter-rouge">ruby</code> in the industry has been web applications. Most of them are wrapped inside <a href="https://github.com/rack/rack">rack</a> containers and deployed using application servers like <a href="https://github.com/ruby/webrick">webrick</a> or <a href="https://github.com/puma/puma">puma</a>. <code class="language-plaintext highlighter-rouge">rack</code> ships with a <a href="https://github.com/rack/rack/blob/main/lib/rack/common_logger.rb">common logger</a> middleware, which emits a log entry per request using the <a href="https://httpd.apache.org/docs/current/logs.html#common">apache common logging format</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># example of a web request log:
# client ip, user or "-", datetime, method, path, http version, status code, response body size in bytes, processing-to-send time
#
127.0.0.1 - [01/May/2025:07:20:10 +0000] "GET /index.html HTTP/1.1" 200 9481 10
</code></pre></div></div>

<p>you can use it in your rack application by adding it to your <code class="language-plaintext highlighter-rouge">config.ru</code> file:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config.ru</span>
<span class="n">use</span> <span class="no">Rack</span><span class="o">::</span><span class="no">CommonLogger</span>

<span class="n">run</span> <span class="no">MyApp</span>
</code></pre></div></div>

<p>The above isn’t common though, as the framework you may be using to build your application may do it for you, or ship with its own logger middleware implementation. For instance, both <a href="https://roda.jeremyevans.net/rdoc/classes/Roda/RodaPlugins/CommonLogger.html">roda</a> and <a href="https://sinatrarb.com/contrib/#Common+Extensions">sinatra</a> ship or recommend its own extension plugin, for different reasons, such as performance or configurability.</p>

<h2 id="logging-in-rails">Logging in rails</h2>

<p>In <code class="language-plaintext highlighter-rouge">rails</code> applications, most interact with logging via the <code class="language-plaintext highlighter-rouge">Rails.logger</code> singleton object. While mostly API compatible with the standard <code class="language-plaintext highlighter-rouge">logger</code> library counterpart, it bundles its own (<code class="language-plaintext highlighter-rouge">rails</code>) conventions on top of it.</p>

<p>Like a true schroedinger’s cat, <code class="language-plaintext highlighter-rouge">Rails.logger</code> is and is not a logger at the same time: the <a href="https://guides.rubyonrails.org/debugging_rails_applications.html#the-logger">documentation</a> says it’s an instance of <a href="https://api.rubyonrails.org/classes/ActiveSupport/Logger.html">ActiveSupport::Logger</a> (a subclass of stdlib’s <code class="language-plaintext highlighter-rouge">Logger</code>), but if you inspect it in the console, it’s actually something else:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Rails</span><span class="p">.</span><span class="nf">logger</span> <span class="c1">#=&gt; instance of ActiveSupport::BroadcastLogger</span>
</code></pre></div></div>

<p>Rails documents that one can change the logger in application config (a common use case is to write test logs to <code class="language-plaintext highlighter-rouge">/dev/null</code> by setting <code class="language-plaintext highlighter-rouge">config.logger = Logger.new("/dev/null"))</code> in <code class="language-plaintext highlighter-rouge">config/environments/test.rb</code>), but in the end, the singleton instance is an instance of <a href="https://api.rubyonrails.org/classes/ActiveSupport/BroadcastLogger.html">ActiveSupport::BroadcastLogger</a>, a proxy object which can register multiple log devices and forward message calls to them. From the official docs:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">stdout_logger</span> <span class="o">=</span> <span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">STDOUT</span><span class="p">)</span>
<span class="n">file_logger</span>   <span class="o">=</span> <span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"development.log"</span><span class="p">)</span>
<span class="n">broadcast</span> <span class="o">=</span> <span class="no">BroadcastLogger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">stdout_logger</span><span class="p">,</span> <span class="n">file_logger</span><span class="p">)</span>

<span class="n">broadcast</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="s2">"Hello world!"</span><span class="p">)</span> <span class="c1"># Writes the log to STDOUT and the development.log file.</span>
</code></pre></div></div>

<p>It seems that the broadcast logger was <code class="language-plaintext highlighter-rouge">rails</code> internal solution to the lack of support for multiple log devices per <a href="https://rubyapi.org/3.3/o/logger">Logger</a> instance in the <code class="language-plaintext highlighter-rouge">logger</code> standard library.</p>

<p>The <code class="language-plaintext highlighter-rouge">rails</code> logger also ships with its own formatter, which does the simplest possible thing:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Rails</span><span class="p">.</span><span class="nf">logger</span> <span class="s2">"foo"</span> <span class="c1">#=&gt; "foo"</span>
</code></pre></div></div>

<p>Alternatively to <code class="language-plaintext highlighter-rouge">ActiveSupport::Logger</code>, <code class="language-plaintext highlighter-rouge">rails</code> has <a href="https://api.rubyonrails.org/classes/ActiveSupport/TaggedLogging.html">ActiveSupport::TaggedLogging</a>. This adds the capability to add “context tags” to a scope, where all log messages within it will be formatted with it:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span> <span class="o">=</span> <span class="no">ActiveSupport</span><span class="o">::</span><span class="no">TaggedLogging</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">Logger</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">STDOUT</span><span class="p">))</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">tagged</span><span class="p">(</span><span class="s2">"FOO"</span><span class="p">)</span> <span class="p">{</span> <span class="n">logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"Stuff"</span> <span class="p">}</span> <span class="c1">#=&gt; Logs "[FOO] Stuff"</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">tagged</span><span class="p">(</span><span class="s2">"BAR"</span><span class="p">)</span> <span class="k">do</span>
  <span class="n">logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"Stuff"</span> <span class="c1">#=&gt; Logs "[BAR] Stuff"</span>
<span class="k">end</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">tagged</span><span class="p">(</span><span class="s2">"FOO"</span><span class="p">,</span> <span class="s2">"BAR"</span><span class="p">)</span> <span class="p">{</span> <span class="n">logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"Stuff"</span> <span class="p">}</span> <span class="c1">#=&gt; Logs "[FOO] [BAR] Stuff"</span>
</code></pre></div></div>

<h2 id="structured-logging">Structured logging</h2>

<p>All those standards and message formats are nice and all, but in 2025, everyone and their mothers want <strong>structured logging</strong>. The most common format, at least in the corners I work in, is <strong>JSON</strong>. It probably has to do with it, in spite of its deficiencies, being a quite simple serialization format and widely adopted, which guarantees virtually <strong>universal support</strong>. As a counterpart to the log management stack for syslog-type systems, new stacks started popping up, such as the <strong>fluentd/logstash/elasticsearch/kibana</strong> OS stack, alongside SaaS solutions like <strong>Splunk</strong> or <strong>Datadog</strong>.</p>

<p>There was renewed interest in re-standardizing log message “envelopes”, one of the emerging standards being the logstash event format.</p>

<pre><code class="language-log"># logstash event format
'{"message":"foo","tags":["tag1"],"source":"127.0.0.1","@version":"1","@timestamp"}'
</code></pre>

<p>That being said though, the ecosystem hasn’t really consolidated on formats yet, so it’s common to see different standards in use across different systems. What’s common across all of them though, is the need to logically structure the log <strong>message</strong> separately from its associated metadata, or <strong>context</strong>.</p>

<p>Nowadays, structured logging fills a complementary role in the larger picture of observability.</p>

<h2 id="the-new-world-of-observability">The new world of observability</h2>

<p>Monitoring the health of a system isn’t a new requirement. As mentioned above, logging is quite an old OS telemetry feature. Back in the “old days” of server/system administration, it was common to set up software like <a href="https://www.nagios.org/">Nagios</a> to collect OS-level telemetry data and visualize i.e. memory consumption, CPU usage, instance connectivity, among other data points. in user-friendly web GUIs.</p>

<p>Since the explosion of Cloud Computing and the Google SRE playbook, and trends such as microservices or lambda functions, observability took a center stage and grew until it incorporated several concepts which used to be thought of as apart from each other. Nowadays the buzzwords are <a href="en.wikipedia.org/wiki/Real_user_monitoring">RUM</a>, <a href="https://opentelemetry.io/">Open Telemetry</a>, <a href="https://en.wikipedia.org/wiki/Application_performance_management">APM</a>, <a href="https://www.splunk.com/en_us/blog/learn/red-monitoring.html">RED metrics</a>, error tracking, among others, and these are all driven by <strong>system and application-emitted metrics, logs</strong>, and its new more recent friend, <a href="https://opentelemetry.io/docs/concepts/signals/traces/">traces</a>, which are a way to visualize execution flows which incorporate related execution flows (usually callend “spans”) within it, as horizontal bars correlating timelines.</p>

<p><img src="/images/context-logger/traces.png" alt="tracing in an image" /></p>

<p>That center stage translated into big business, and companies like <strong>Datadog</strong>, <strong>Sentry</strong> or <strong>Honeycomb</strong> became almost as critical to a client’s success as the features that client provides. Observing, measuring, monitoring the health / performance / volume of our applications has never been as easy (and as expensive).</p>

<h2 id="ruby-logging-in-2025">ruby logging in 2025</h2>

<p>Sadly, the ruby <code class="language-plaintext highlighter-rouge">logger</code> library APIs didn’t keep up with the times, and are quite limited for this new paradigm. While nothing stops anyone from swapping the default formatter with a JSON capable counterpart, the <a href="https://rubyapi.org/3.4/o/logger/formatter">Logger::Formatter</a> API, which relies on implementation of <code class="language-plaintext highlighter-rouge">call</code> with a fixed set of positional arguments, makes it impossible to support metadata other than what the function already expects:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MyJSONFormatter</span>
  <span class="c1"># formatters must at least implement this method</span>
  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">severity</span><span class="p">,</span> <span class="n">time</span><span class="p">,</span> <span class="n">progname</span><span class="p">,</span> <span class="n">msg</span><span class="p">)</span>
    <span class="c1"># can't receive i.e. user data, just the 4 levels above:</span>
    <span class="p">{</span> <span class="ss">severity: time: </span><span class="n">progname</span><span class="p">:,</span> <span class="ss">message: </span><span class="n">msg</span> <span class="p">}.</span><span class="nf">to_json</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>This diminishes its reusability, and as a result, every other logger library in the ecosystem which logs JSON (and other formats) <strong>does not use</strong> the <code class="language-plaintext highlighter-rouge">logger</code> library as its foundation layer, and ends up reinventing the Formatter API to its needs.</p>

<p>But don’t take my word for it. Looking at the <a href="https://www.ruby-toolbox.com/categories/Logging?display=table&amp;order=score">most used logging libraries in ruby toolbox</a> which support structured JSON format, <a href="https://github.com/colbygk/log4r">log4r</a> has its own base formatter class which defined <code class="language-plaintext highlighter-rouge">#format(String event)</code> as the overridable method; <a href="https://github.com/roidrage/lograge">lograge</a> also has its own base formatter class which defines <code class="language-plaintext highlighter-rouge">#call(Hash data)</code> as its own, while <a href="https://github.com/reidmorrison/semantic_logger">semantic logger</a> also has one, this time defining <code class="language-plaintext highlighter-rouge">#call(SemanticLogger::Log log, SemanticLogger::Formatters::Base logger)</code>, and so does <a href="https://github.com/dwbutler/logstash-logger">logstash-logger</a> have its own base formatter, which funnily enough supports… the same <code class="language-plaintext highlighter-rouge">call</code> API as ruby <code class="language-plaintext highlighter-rouge">logger</code> formatters!</p>

<p>This is <a href="https://xkcd.com/927/">official xkcd territory</a>.</p>

<p>(Practically all of the above also solve the problem of writing to multiple log devices, in most cases naming this feature “log appenders”. But this is not the feature I’m writing the post about).</p>

<h2 id="rails-logging-in-2025">rails logging in 2025</h2>

<p>Given that <code class="language-plaintext highlighter-rouge">ActiveSupport::Logger</code> is a subclass of <code class="language-plaintext highlighter-rouge">Logger</code>, it also inherits (OO-pun intended) its problems, therefore by the transitive property, <code class="language-plaintext highlighter-rouge">rails</code> logger does not support structured logging (and JSON in particular). So if your <code class="language-plaintext highlighter-rouge">rails</code> application emits JSON logs, you’re either using one of the alternatives above, or an in-house library made out of spare parts of everything mentioned so far, or worse (gulp) a parser (like grok) regex-matching your string entry and spitting a JSON from it.</p>

<p>The most stable, and to my knowledge, more widely adopted logging libraries, are <a href="https://github.com/roidrage/lograge">lograge</a> and <a href="https://github.com/reidmorrison/rails_semantic_logger">(rails) semantic logger</a>.</p>

<p>In both cases, the <code class="language-plaintext highlighter-rouge">Rails.logger</code> singleton instance broadcasts to a custom logger implementation provided by the library, and the main log-related subscriptions for <a href="https://api.rubyonrails.org/classes/ActiveSupport/Notifications.html">default notifications</a> in-and-around business operations (like processing web requests) are swapped by custom (to each library) subscriptions, which make use of the logger API <strong>and</strong> allow adding extra context to each of these log messages.</p>

<h3 id="lograge">lograge</h3>

<p><code class="language-plaintext highlighter-rouge">lograge</code> documents a <a href="https://github.com/roidrage/lograge">custom_options callback</a>, which receives a hash and returns another hash. The received hash is the event hash which gets passed to request-level event notifications, and can be augmented in controllers by re-defining the controller <a href="https://api.rubyonrails.org/classes/ActionController/Instrumentation.html#method-i-append_info_to_payload">append_info_to_payload</a> callback. The returned hash gets passed “as is” to the eventual JSON log entry (which also contains a readable “message”), giving almost full control of the JSON message format.</p>

<p>It has several drawbacks though, one of them being, it only subscribes to action-controller-level events, so active jobs will keep being logged by “standard” rails logger. Also, it’s not possible to share or add different context to other logger calls when using <code class="language-plaintext highlighter-rouge">Rails.logger.info</code> and friends.</p>

<p>If you’re using the <code class="language-plaintext highlighter-rouge">rails</code> framework for anything other than web requests, I wouldn’t recommend it.</p>

<p>(It also subscribes to action cable events, but I suspect very few applications running in production use it).</p>

<h3 id="semantic-logger">semantic logger</h3>

<p>In turn, <a href="https://github.com/reidmorrison/rails_semantic_logger">(rails) semantic logger</a> subscribes not only to action controller events, but active job events as well (and active record events, and active view, and action mailer… if that can be subscribed, it will be subscribed!), which makes it more compelling to use. It also ships with interesting features which allow to not only add context to direct logging calls, but setting context to a given scope as well:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="s2">"hi"</span><span class="p">,</span> <span class="ss">payload: </span><span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span>
<span class="c1">#=&gt; '{"message":"hi","payload":{"foo":"bar"}....'</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="s2">"hi"</span><span class="p">)</span>
<span class="c1">#=&gt; '{"message":"hi",....'</span>
<span class="no">SemanticLogger</span><span class="p">.</span><span class="nf">tagged</span><span class="p">(</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">)</span> <span class="k">do</span>
  <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="s2">"hi"</span><span class="p">)</span>
  <span class="c1">#=&gt; '{"message":"hi","payload":{"foo":"bar"}....'</span>
<span class="k">end</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="s2">"hi"</span><span class="p">)</span>
<span class="c1">#=&gt; '{"message":"hi",....'</span>
</code></pre></div></div>

<p>Still, while having this feature, <code class="language-plaintext highlighter-rouge">semantic logger</code> still disappoints by recommending a similar type of integration as <code class="language-plaintext highlighter-rouge">lograge</code> does for requests (<a href="https://logger.rocketjob.io/rails.html"><code class="language-plaintext highlighter-rouge">log_tags</code> callback + <code class="language-plaintext highlighter-rouge">append_info_to_payload</code></a>), which limit the scope of request-level payload to the single logger call happening within log subscribers. It feels like a lost opportunity, considering that it’d be great to share that context with all user-defined logger calls happening within the scope of the request processing (including calls happening from within the controller action), and other <code class="language-plaintext highlighter-rouge">rails</code>-level business transactions (such as active job <code class="language-plaintext highlighter-rouge">#perform</code> calls) do not have an <code class="language-plaintext highlighter-rouge">append_info_to_payload</code> counterpart (perhaps someone should suggest that feature to <code class="language-plaintext highlighter-rouge">rails</code>?).</p>

<p>The resulting JSON format (all non-standard context under <code class="language-plaintext highlighter-rouge">"payload"</code>, some things under <code class="language-plaintext highlighter-rouge">"named_tags"</code> when using some obscure API) isn’t the friendliest either, and in most cases, ends up being rewritten by a pre-processing step before log ingestion happens.</p>

<p>Still, despite all its flaws and somewhat clunky API, it showcases the potential of, for lack of a better name, a logger <strong>context API</strong>.</p>

<h2 id="context-api">Context API</h2>

<p>Imagine if, during the scope of request processing, several context scopes could be interleaved, each one with its context, tearing down each sub-context when exiting blocks; this context could then be used in the log analysis engine to aggregate groups of messages tags from each particular context, allowing more fine-grained filtering.</p>

<p>If you’re using any type of tracing integration, you don’t need to imagine, because this is how the tracing API works! For example, if you are using the datadog SDK:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># from the datadog sdk docs:</span>
<span class="k">def</span> <span class="nf">index</span>
  <span class="c1"># Get the active span and set customer_id -&gt; 254889</span>
  <span class="no">Datadog</span><span class="o">::</span><span class="no">Tracing</span><span class="p">.</span><span class="nf">active_span</span><span class="o">&amp;</span><span class="p">.</span><span class="nf">set_tag</span><span class="p">(</span><span class="s1">'customer.id'</span><span class="p">,</span> <span class="n">params</span><span class="p">.</span><span class="nf">permit</span><span class="p">([</span><span class="ss">:customer_id</span><span class="p">]))</span>

  <span class="c1"># create child span, add tags to it</span>
  <span class="no">Datadog</span><span class="o">::</span><span class="no">Tracing</span><span class="p">.</span><span class="nf">trace</span><span class="p">(</span><span class="s1">'web.request'</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">span</span><span class="o">|</span>
    <span class="n">span</span><span class="p">.</span><span class="nf">set_tag</span><span class="p">(</span><span class="s1">'http.url'</span><span class="p">,</span> <span class="n">request</span><span class="p">.</span><span class="nf">path</span><span class="p">)</span>
    <span class="n">span</span><span class="p">.</span><span class="nf">set_tag</span><span class="p">(</span><span class="s1">'&lt;TAG_KEY&gt;'</span><span class="p">,</span> <span class="s1">'&lt;TAG_VALUE&gt;'</span><span class="p">)</span>
    <span class="c1"># execute something here ...</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Something like this, using plain loggers, should be possible too:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">index</span>
  <span class="n">logger</span><span class="p">.</span><span class="nf">add_context</span><span class="p">(</span><span class="ss">customer_id: </span><span class="n">params</span><span class="p">.</span><span class="nf">permit</span><span class="p">([</span><span class="ss">:customer_id</span><span class="p">]))</span>
  <span class="c1"># logger.info calls will include the "customer_id" field</span>
  <span class="n">logger</span><span class="p">.</span><span class="nf">with_context</span><span class="p">(</span><span class="ss">http_url: </span><span class="n">request</span><span class="p">.</span><span class="nf">path</span><span class="p">,</span> <span class="ss">tag_key: </span><span class="s2">"tag_value"</span><span class="p">)</span> <span class="k">do</span>
    <span class="c1"># logger.info calls will include the "customer_id", "http_url" and "tag_key" fields</span>
  <span class="k">end</span>
  <span class="c1"># logger.info calls will only include the "customer_id" field</span>
<span class="k">end</span>
</code></pre></div></div>

<p>And that’s why, to somewhat stitch the inconsistencies described above together, I’m proposing such an API to the <code class="language-plaintext highlighter-rouge">logger</code> standard library.</p>

<h2 id="feature-request">Feature Request</h2>

<p>For a more detailed description, you can read the <a href="https://github.com/ruby/logger/issues/131">issue</a> and <a href="https://github.com/ruby/logger/pull/132">PR</a> description/comments. In a nutshell, two ways are introduced of adding context: per block (via <code class="language-plaintext highlighter-rouge">Logger#with_context</code>) and per call (keyword argument in <code class="language-plaintext highlighter-rouge">Logger#info</code>, <code class="language-plaintext highlighter-rouge">Logger.error</code> and friends):</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># per block</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">with_context</span><span class="p">(</span><span class="ss">a: </span><span class="mi">1</span><span class="p">)</span> <span class="k">do</span>
  <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">)</span> <span class="c1">#=&gt; I, [a=1] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo</span>
<span class="k">end</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">with_context</span><span class="p">(</span><span class="ss">a: </span><span class="mi">1</span><span class="p">)</span> <span class="k">do</span>
  <span class="n">logger</span><span class="p">.</span><span class="nf">with_context</span><span class="p">(</span><span class="ss">b: </span><span class="mi">2</span><span class="p">)</span> <span class="k">do</span>
    <span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">)</span> <span class="c1">#=&gt; I, [a=1] [b=2] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># per call</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">,</span> <span class="ss">context: </span><span class="p">{</span><span class="ss">user_id: </span><span class="mi">1</span><span class="p">})</span> <span class="c1">#=&gt; I, [user_id=1] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="ss">context: </span><span class="p">{</span><span class="ss">user_id: </span><span class="mi">1</span><span class="p">})</span> <span class="p">{</span> <span class="s2">"foo"</span> <span class="p">}</span> <span class="c1">#=&gt; I, [user_id=1] [2025-08-13T15:00:03.830782 #5374]  INFO -- : foo</span>
</code></pre></div></div>

<p>The proposal tries to retrofit context into the current default message format, and does not aim at proposing a JSON message formatter. At least until this is done.</p>

<p>That’s it!</p>

<p>There’s a lot of devil in the details though, and if you’ll read through the PR discussions, there were many meaningful points raised:</p>

<ul>
  <li>how/where to manage contexts?
    <ul>
      <li>ruby should manage contexts per thread <strong>AND</strong> per fiber, which raises some questions around context sharing across parent-child fibers, what the runtime supports OOTB, as well as certain core APIs which spawn fibers under the hood.</li>
    </ul>
  </li>
  <li>should context be managed in formatters rather than logger instances?
    <ul>
      <li>I’m leaning on the latter, but it’ll depend on future developments in <code class="language-plaintext highlighter-rouge">logger</code>. For example, will it ever support multiple log devices per instance? And if so, will each log device have its own formatter? In such a case, should context be shared across formatters?</li>
    </ul>
  </li>
  <li>what’s the bare minimym feature set
    <ul>
      <li>do we need per-call context? can it get away with <code class="language-plaintext highlighter-rouge">with_context</code> only?</li>
    </ul>
  </li>
</ul>

<h3 id="logging-context-in-rack">Logging context in rack</h3>

<p>Unlocking per-request logging context becomes as simple as including this middleware in your rack application:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">LoggingContext</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">app</span><span class="p">,</span> <span class="n">logger</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">)</span>
    <span class="vi">@app</span> <span class="o">=</span> <span class="n">app</span>
    <span class="vi">@logger</span> <span class="o">=</span> <span class="n">logger</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
    <span class="vi">@logger</span><span class="p">.</span><span class="nf">with_context</span> <span class="p">{</span> <span class="vi">@app</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="n">env</span><span class="p">)</span> <span class="p">}</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># then in config.ru</span>
<span class="n">use</span> <span class="no">LoggingContext</span>

<span class="n">run</span> <span class="no">MyApp</span>
</code></pre></div></div>

<p>You could then make use of this API in your application, knowing that context will be correctly tore down at the end of the request lifecycle:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># This is just an example of how to add request info as logging context, it is NOT supposed to be a recommendation about how to log</span>
<span class="c1"># authentication info.</span>

<span class="c1"># roda (with rodauth) endpoint</span>
<span class="k">class</span> <span class="nc">MyApp</span> <span class="o">&lt;</span> <span class="no">Roda</span>
  <span class="n">plugin</span> <span class="ss">:common_logger</span>
  <span class="n">plugin</span> <span class="ss">:rodauth</span>

  <span class="c1"># ...</span>

  <span class="n">route</span> <span class="k">do</span> <span class="o">|</span><span class="n">r</span><span class="o">|</span>
    <span class="n">logger</span> <span class="o">=</span> <span class="vi">@logger</span> <span class="o">||</span> <span class="n">request</span><span class="p">.</span><span class="nf">get_header</span><span class="p">(</span><span class="no">RACK_ERRORS</span><span class="p">)</span>
    <span class="n">r</span><span class="p">.</span><span class="nf">rodauth</span>

    <span class="n">get</span> <span class="s1">'index'</span> <span class="k">do</span>
      <span class="vi">@user</span> <span class="o">=</span> <span class="no">DB</span><span class="p">[</span><span class="ss">:accounts</span><span class="p">].</span><span class="nf">where</span><span class="p">(</span><span class="ss">:id</span><span class="o">=&gt;</span><span class="n">rodauth</span><span class="p">.</span><span class="nf">session_value</span><span class="p">).</span><span class="nf">get</span><span class="p">(</span><span class="ss">:email</span><span class="p">)</span>

      <span class="n">logger</span><span class="p">.</span><span class="nf">with_context</span><span class="p">(</span><span class="ss">user: </span><span class="p">{</span> <span class="ss">id: </span><span class="vi">@user</span><span class="p">.</span><span class="nf">id</span> <span class="p">})</span> <span class="k">do</span>
        <span class="n">view</span> <span class="s1">'index'</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># rails controller action</span>
<span class="k">class</span> <span class="nc">MyController</span>
  <span class="n">before_action</span> <span class="ss">:require_user</span>
  <span class="n">around_context</span> <span class="ss">:add_logging_context</span>

  <span class="c1"># ...</span>

  <span class="k">def</span> <span class="nf">index</span>
    <span class="no">Rails</span><span class="p">.</span><span class="nf">logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"about to index"</span> <span class="c1"># will log user.id in context</span>
  <span class="k">end</span>

  <span class="kp">private</span>

  <span class="k">def</span> <span class="nf">add_logging_context</span>
    <span class="no">Rails</span><span class="p">.</span><span class="nf">logger</span><span class="p">.</span><span class="nf">with_context</span><span class="p">(</span><span class="ss">user: </span><span class="p">{</span> <span class="ss">id: </span><span class="vi">@user</span><span class="p">.</span><span class="nf">id</span> <span class="p">})</span> <span class="p">{</span> <span class="k">yield</span> <span class="p">}</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<h3 id="logging-context-in-background-jobs">Logging context in background jobs</h3>

<p>Similar approaches can be applied for your preferred background job framework. For brevity, I’ll just show below how you could use the same callback/middleware strategy for <strong>Sidekiq</strong> and <strong>Active Job</strong>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># 1. Sidekiq</span>
<span class="k">class</span> <span class="nc">LoggingContext</span>
  <span class="kp">include</span> <span class="no">Sidekiq</span><span class="o">::</span><span class="no">ServerMiddleware</span>
  <span class="k">def</span> <span class="nf">initialize</span><span class="p">(</span><span class="n">logger</span><span class="p">)</span>
    <span class="vi">@logger</span> <span class="o">=</span> <span class="n">logger</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">job</span><span class="p">,</span> <span class="n">payload</span><span class="p">,</span> <span class="n">queue</span><span class="p">)</span>
    <span class="vi">@logger</span><span class="p">.</span><span class="nf">with_context</span><span class="p">(</span><span class="ss">job: </span><span class="p">{</span> <span class="ss">queue: </span><span class="n">queue</span><span class="p">,</span> <span class="ss">id: </span><span class="n">job</span><span class="p">[</span><span class="s2">"jid"</span><span class="p">]</span> <span class="p">})</span> <span class="p">{</span> <span class="k">yield</span> <span class="p">}</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># when initializing...</span>
<span class="no">Sidekiq</span><span class="p">.</span><span class="nf">configure_server</span> <span class="k">do</span> <span class="o">|</span><span class="n">config</span><span class="o">|</span>
  <span class="n">config</span><span class="p">.</span><span class="nf">server_middleware</span> <span class="k">do</span> <span class="o">|</span><span class="n">chain</span><span class="o">|</span>
    <span class="c1"># if you're using rails, replace bellow with Rails.logger</span>
    <span class="n">chain</span><span class="p">.</span><span class="nf">add</span> <span class="no">MyMiddleware</span><span class="o">::</span><span class="no">Server</span><span class="o">::</span><span class="no">ErrorLogger</span><span class="p">,</span> <span class="ss">logger: </span><span class="no">LOGGER</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># then in job...</span>
<span class="k">class</span> <span class="nc">MyJob</span>
  <span class="kp">include</span> <span class="no">Sidekiq</span><span class="o">::</span><span class="no">Job</span>

  <span class="k">def</span> <span class="nf">perform</span><span class="p">(</span><span class="n">arg1</span><span class="p">,</span> <span class="n">arg2</span><span class="p">)</span>
    <span class="no">LOGGER</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"performing"</span> <span class="c1"># will include job.queue and job.id in context</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># 2. Active Job</span>
<span class="k">class</span> <span class="nc">ApplicationJob</span> <span class="o">&lt;</span> <span class="no">ActiveJob</span><span class="o">::</span><span class="no">Base</span>
  <span class="n">around_perform</span> <span class="k">do</span> <span class="o">|</span><span class="n">job</span><span class="p">,</span> <span class="n">block</span><span class="o">|</span>
    <span class="no">Rails</span><span class="p">.</span><span class="nf">logger</span><span class="p">.</span><span class="nf">with_context</span><span class="p">(</span><span class="ss">job: </span><span class="p">{</span> <span class="ss">queue: </span><span class="n">job</span><span class="p">.</span><span class="nf">queue_name</span><span class="p">,</span> <span class="ss">id: </span><span class="n">job</span><span class="p">.</span><span class="nf">id</span> <span class="p">})</span> <span class="k">do</span>
      <span class="n">block</span><span class="p">.</span><span class="nf">call</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># then in job...</span>
<span class="k">class</span> <span class="nc">MyJob</span> <span class="o">&lt;</span> <span class="no">ApplicationJob</span>
  <span class="k">def</span> <span class="nf">perform</span><span class="p">(</span><span class="n">arg1</span><span class="p">,</span> <span class="n">arg2</span><span class="p">)</span>
    <span class="no">Rails</span><span class="p">.</span><span class="nf">logger</span><span class="p">.</span><span class="nf">info</span> <span class="s2">"performing"</span> <span class="c1"># will include job.queue and job.id in context</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<h2 id="logging-context-in-other-languages">Logging context in other languages</h2>

<p>Another angle of this discussion is looking at how other ecosystems solve this problem. I’ll just mention a few examples, as my purpose is not to be exhaustive, so apologies in advance if I skipped your second-preferred language.</p>

<h3 id="java">Java</h3>

<p>While core Java Logger APIs do not seem to support this, most applications use the <a href="https://logging.apache.org/log4j/2.x/index.html">log4j</a> library, which supports a feature called <a href="https://logging.apache.org/log4j/2.x/manual/thread-context.html">Thread Context</a>, which is very similar to the one described above:</p>

<div class="language-java highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nc">ThreadContext</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"ipAddress"</span><span class="o">,</span> <span class="n">request</span><span class="o">.</span><span class="na">getRemoteAddr</span><span class="o">());</span>
<span class="nc">ThreadContext</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"hostName"</span><span class="o">,</span> <span class="n">request</span><span class="o">.</span><span class="na">getServerName</span><span class="o">());</span>
<span class="nc">ThreadContext</span><span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"loginId"</span><span class="o">,</span> <span class="n">session</span><span class="o">.</span><span class="na">getAttribute</span><span class="o">(</span><span class="s">"loginId"</span><span class="o">));</span>

<span class="kt">void</span> <span class="nf">performWork</span><span class="o">()</span> <span class="o">{</span>
  <span class="c1">// explicitly add context for this function, which copies all context until then</span>
  <span class="nc">ThreadContext</span><span class="o">.</span><span class="na">push</span><span class="o">(</span><span class="s">"performWork()"</span><span class="o">);</span>
  <span class="no">LOGGER</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="s">"Performing work"</span><span class="o">);</span> <span class="c1">// will include ipAddress, etc...</span>
  <span class="c1">// do work</span>
  <span class="nc">ThreadContext</span><span class="o">.</span><span class="na">pop</span><span class="o">();</span>
<span class="o">}</span>

<span class="c1">// or with auto-closing enabled</span>
<span class="k">try</span> <span class="o">(</span><span class="nc">CloseableThreadContext</span><span class="o">.</span><span class="na">Instance</span> <span class="n">ignored</span> <span class="o">=</span> <span class="nc">CloseableThreadContext</span>
        <span class="o">.</span><span class="na">put</span><span class="o">(</span><span class="s">"ipAddress"</span><span class="o">,</span> <span class="n">request</span><span class="o">.</span><span class="na">getRemoteAddr</span><span class="o">())</span>
        <span class="o">.</span><span class="na">push</span><span class="o">(</span><span class="s">"performWork()"</span><span class="o">))</span> <span class="o">{</span>

    <span class="no">LOGGER</span><span class="o">.</span><span class="na">debug</span><span class="o">(</span><span class="s">"Performing work"</span><span class="o">);</span>
    <span class="c1">// do work</span>
<span class="o">}</span>
</code></pre></div></div>

<p>Verbose (it’s Java), but it works!</p>

<p>Java 21 released <a href="https://docs.oracle.com/en/java/javase/21/core/virtual-threads.html">Virtual Threads</a>, which are somewhat like coroutines which coordinate execution across a number of OS threads. It’s not clear to me whether <code class="language-plaintext highlighter-rouge">log4j</code> thread contexts support them OOTB.</p>

<h3 id="go">go</h3>

<p>One of <code class="language-plaintext highlighter-rouge">go</code>’s main features is the wide array of functionality provided by its standard library, and logging context is no exception.</p>

<p>The standard library logging package is called <code class="language-plaintext highlighter-rouge">slog</code>, which supports, in the usual <code class="language-plaintext highlighter-rouge">go</code> way, using <code class="language-plaintext highlighter-rouge">context.Context</code> objects to pass structured context, but also extending logger instances themselves, via the <code class="language-plaintext highlighter-rouge">.With</code> call, with per instance context:</p>

<p>(<code class="language-plaintext highlighter-rouge">slog</code> also ships with a JSON formatter.)</p>

<div class="language-go highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">import</span> <span class="p">(</span>
	<span class="s">"context"</span>
	<span class="s">"log/slog"</span>
	<span class="s">"os"</span>
<span class="p">)</span>

<span class="k">func</span> <span class="n">main</span><span class="p">()</span> <span class="p">{</span>
	<span class="n">logger</span> <span class="o">:=</span> <span class="n">slog</span><span class="o">.</span><span class="n">New</span><span class="p">(</span><span class="n">slog</span><span class="o">.</span><span class="n">NewJSONHandler</span><span class="p">(</span><span class="n">os</span><span class="o">.</span><span class="n">Stdout</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">slog</span><span class="o">.</span><span class="n">HandlerOptions</span><span class="p">{</span>
		<span class="n">Level</span><span class="o">:</span> <span class="n">slog</span><span class="o">.</span><span class="n">LevelInfo</span><span class="p">,</span>
	<span class="p">}))</span>
  <span class="c">// Add default attributes to all log entries</span>
	<span class="n">baseLogger</span> <span class="o">:=</span> <span class="n">logger</span><span class="o">.</span><span class="n">With</span><span class="p">(</span>
		<span class="s">"app"</span><span class="p">,</span> <span class="s">"example"</span><span class="p">,</span>
		<span class="s">"env"</span><span class="p">,</span> <span class="s">"production"</span><span class="p">,</span>
	<span class="p">)</span>
  <span class="n">slog</span><span class="o">.</span><span class="n">SetDefault</span><span class="p">(</span><span class="n">logger</span><span class="p">)</span>

  <span class="n">http</span><span class="o">.</span><span class="n">HandleFunc</span><span class="p">(</span><span class="s">"/"</span><span class="p">,</span> <span class="k">func</span><span class="p">(</span><span class="n">w</span> <span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span> <span class="n">r</span> <span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span> <span class="p">{</span>
    <span class="c">// Extract or generate a request ID for tracing</span>
		<span class="n">requestID</span> <span class="o">:=</span> <span class="n">r</span><span class="o">.</span><span class="n">Header</span><span class="o">.</span><span class="n">Get</span><span class="p">(</span><span class="s">"X-Request-ID"</span><span class="p">)</span>
		<span class="k">if</span> <span class="n">requestID</span> <span class="o">==</span> <span class="s">""</span> <span class="p">{</span>
			<span class="n">requestID</span> <span class="o">=</span> <span class="s">"default-id"</span>
		<span class="p">}</span>

    <span class="c">// Attach the request ID to context</span>
		<span class="n">ctx</span> <span class="o">:=</span> <span class="n">context</span><span class="o">.</span><span class="n">WithValue</span><span class="p">(</span><span class="n">r</span><span class="o">.</span><span class="n">Context</span><span class="p">(),</span> <span class="s">"request_id"</span><span class="p">,</span> <span class="n">requestID</span><span class="p">)</span>

    <span class="c">// Create request-scoped logger</span>
		<span class="n">reqLogger</span> <span class="o">:=</span> <span class="n">logger</span><span class="o">.</span><span class="n">With</span><span class="p">(</span>
			<span class="s">"request_id"</span><span class="p">,</span> <span class="n">requestID</span><span class="p">,</span>
			<span class="s">"path"</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">URL</span><span class="o">.</span><span class="n">Path</span><span class="p">,</span>
			<span class="s">"method"</span><span class="p">,</span> <span class="n">r</span><span class="o">.</span><span class="n">Method</span><span class="p">,</span>
		<span class="p">)</span>

    <span class="n">handleRequest</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">reqLogger</span><span class="p">,</span> <span class="n">w</span><span class="p">,</span> <span class="n">r</span><span class="p">)</span>
  <span class="p">}</span>

  <span class="n">http</span><span class="o">.</span><span class="n">ListenAndServe</span><span class="p">(</span><span class="s">":8080"</span><span class="p">,</span> <span class="no">nil</span><span class="p">)</span>
<span class="p">}</span>

<span class="k">func</span> <span class="n">handleRequest</span><span class="p">(</span><span class="n">ctx</span> <span class="n">context</span><span class="o">.</span><span class="n">Context</span><span class="p">,</span> <span class="n">logger</span> <span class="o">*</span><span class="n">slog</span><span class="o">.</span><span class="n">Logger</span><span class="p">,</span> <span class="n">w</span> <span class="n">http</span><span class="o">.</span><span class="n">ResponseWriter</span><span class="p">,</span> <span class="n">r</span> <span class="o">*</span><span class="n">http</span><span class="o">.</span><span class="n">Request</span><span class="p">)</span> <span class="p">{</span>
	<span class="n">logger</span><span class="o">.</span><span class="n">InfoContext</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="s">"Handling request"</span><span class="p">)</span> <span class="c">// includes request_id, path, metho</span>
	<span class="n">w</span><span class="o">.</span><span class="n">Write</span><span class="p">([]</span><span class="kt">byte</span><span class="p">(</span><span class="s">"Request handled"</span><span class="p">))</span>
	<span class="n">logger</span><span class="o">.</span><span class="n">InfoContext</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="s">"Request processed"</span><span class="p">)</span> <span class="c">// includes request_id, path, metho</span>
<span class="p">}</span>
</code></pre></div></div>

<p>While it takes some getting used to both ways of doing the same thing, it’s still interesting to see how the usage of explicit context forwarding permeates across the ecosystem, including in logging.</p>

<h3 id="python">python</h3>

<p>As usual with all things <code class="language-plaintext highlighter-rouge">python</code>, it’s all a bit of a mess, and in accordance with the “there’s always one obvious way to do something” reality, there are at least 2 ways of doing it.</p>

<p>BFirst, when using the standard <code class="language-plaintext highlighter-rouge">logging</code> package, per-call context is supported via the <code class="language-plaintext highlighter-rouge">extra</code> keyword argument:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">()</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">info</span><span class="p">(</span><span class="sh">"</span><span class="s">msg</span><span class="sh">"</span><span class="p">,</span> <span class="n">extra</span><span class="o">=</span><span class="p">{</span><span class="sh">"</span><span class="s">foo</span><span class="sh">"</span><span class="p">:</span> <span class="sh">"</span><span class="s">bar</span><span class="sh">"</span><span class="p">})</span>
</code></pre></div></div>

<p>Internally, logging message calls will generate <code class="language-plaintext highlighter-rouge">Log</code> records, an object which contains multiple attributes, including this <code class="language-plaintext highlighter-rouge">.extra</code>; these records then get passed to formatters, which will access this extra context when formatting the message.</p>

<p>Now that we got that out of the way…</p>

<p>The <code class="language-plaintext highlighter-rouge">logging</code> package avoids extra API to support contexts, instead providing ways for an introspection-based approach, such as the <code class="language-plaintext highlighter-rouge">logging.LoggerAdapter</code> interface.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">from</span> <span class="n">flask</span> <span class="kn">import</span> <span class="n">g</span>

<span class="k">class</span> <span class="nc">UserAdapter</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">LoggerAdapter</span><span class="p">):</span>
  <span class="k">def</span> <span class="nf">process</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">msg</span><span class="p">,</span> <span class="n">kwargs</span><span class="p">):</span>
    <span class="n">extra</span> <span class="o">=</span> <span class="n">kwargs</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="sh">"</span><span class="s">extra</span><span class="sh">"</span><span class="p">,</span> <span class="p">{})</span>
    <span class="n">extra</span><span class="p">[</span><span class="sh">'</span><span class="s">user_id</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">g</span><span class="p">.</span><span class="n">user_id</span>
    <span class="n">kwargs</span><span class="p">[</span><span class="sh">'</span><span class="s">extra</span><span class="sh">'</span><span class="p">]</span> <span class="o">=</span> <span class="n">extra</span>
    <span class="k">return</span> <span class="n">msg</span><span class="p">,</span> <span class="n">kwargs</span>

<span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="n">adapter</span> <span class="o">=</span> <span class="nc">UserAdapter</span><span class="p">(</span><span class="n">logger</span><span class="p">)</span>
</code></pre></div></div>

<p>The adapter above relies on importing external context store APIs, which tend to be framework-specific; for once, the example above will only work with <code class="language-plaintext highlighter-rouge">flask</code>, so you may have troubles reusing this outside of it, such as, p. ex. a background task execution lifecycle (something like <code class="language-plaintext highlighter-rouge">celery</code>, for example). If the background task framework supports a similar imported context store API based approach, in order to reuse the adapter you’ll still have to play a game of “which execution context am I in?”. All in all, you’ll have a hard time if you want to use that local variable as context transparently on multiple log calls.</p>

<p>Some of these limitations can be circumvented by using the <code class="language-plaintext highlighter-rouge">contextvars</code> package.</p>

<p>Another recommendation to add contextual info is to using <code class="language-plaintext highlighter-rouge">logging.Filter</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="n">logging</span>
<span class="kn">from</span> <span class="n">flask</span> <span class="kn">import</span> <span class="n">g</span>

<span class="k">class</span> <span class="nc">UserFilter</span><span class="p">(</span><span class="n">logging</span><span class="p">.</span><span class="n">Filter</span><span class="p">):</span>
  <span class="k">def</span> <span class="nf">filter</span><span class="p">(</span><span class="n">self</span><span class="p">,</span> <span class="n">record</span><span class="p">):</span>
    <span class="n">record</span><span class="p">.</span><span class="n">user_id</span> <span class="o">=</span> <span class="n">g</span><span class="p">.</span><span class="n">user_id</span>
    <span class="k">return</span> <span class="bp">True</span>

<span class="c1"># later, you'll have to explicitly add the filter to the logger
</span><span class="n">logger</span> <span class="o">=</span> <span class="n">logging</span><span class="p">.</span><span class="nf">getLogger</span><span class="p">(</span><span class="n">__name__</span><span class="p">)</span>
<span class="n">f</span> <span class="o">=</span> <span class="nc">UserFilter</span><span class="p">()</span>
<span class="n">logger</span><span class="p">.</span><span class="nf">addFilter</span><span class="p">(</span><span class="n">f</span><span class="p">)</span>
</code></pre></div></div>

<p>Adding this to all (or a subset of) endpoints of a web application will involve a similar middleware such as what <code class="language-plaintext highlighter-rouge">loggerAdapter</code> provides, while having the same limitations, so I’m not sure what this abstraction buys one, besides making it a bit more explicit in some cases.</p>

<p>All in all, python’s approach(es) does not feel at all ergonomic, requiring boilerplate to get things done. It is truly the most low-level of high-level languages.</p>

<h2 id="beyond-logging">Beyond logging</h2>

<p>If the feature gets accepted, most of the inconsistencies described above can be dealt with. For once, all base formatters from the libraries described above can base off the standard library <code class="language-plaintext highlighter-rouge">Logger::Formatter</code>, thereby standardizing on a single API and enabling reusable extensions. Adding a simpler json formatter variant will be much easier (who knows, perhaps the standard library can ship with one). <code class="language-plaintext highlighter-rouge">rack</code> could ship with a logging context middleware.</p>

<p>It also opens up quite a few opportunities for context coalescing.</p>

<p>For instance, logs/traces/metrics context sharing. Imagine tools like the datadog SDK, or its OTel counterpart. what if, instead of adding tags to traces only, one could add it automatically to the context of a known logger instance?</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Datadog</span><span class="p">.</span><span class="nf">active_logger</span> <span class="o">=</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">logger</span>

<span class="c1"># add as context to current active trace and log</span>
<span class="no">Datadog</span><span class="p">.</span><span class="nf">active_trace</span><span class="p">.</span><span class="nf">set_tags</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">,</span> <span class="s2">"bar"</span><span class="p">)</span>
<span class="c1"># instead of the current version, which only adds to active trace</span>
<span class="no">Datadog</span><span class="o">::</span><span class="no">Tracing</span><span class="p">.</span><span class="nf">active_trace</span><span class="p">.</span><span class="nf">set_tags</span><span class="p">(</span><span class="s2">"foo"</span><span class="p">,</span> <span class="s2">"bar"</span><span class="p">)</span>
</code></pre></div></div>

<p>The datadog dashboard already links traces with logs which contain a corresponding “trace_id” field. Now imagine not having to deal with the mental burden of knowing which tags are searchable in APM trace search, which ones are searchable for logs, which ones are common which ones are similar… there’d be a single context to deal with! (Now, if only datadog could listen to their users and import user-defined trace tags to trace-generated metrics…).</p>

<p>This could be the rug that ties the whole room together.</p>

<h2 id="rails-8-new-event-subscription-api">Rails 8 new event subscription API</h2>

<p>If you mostly use <code class="language-plaintext highlighter-rouge">ruby</code> through the lens of <code class="language-plaintext highlighter-rouge">rails</code>, you may have looked at the recent 8.1 announcement and read about <a href="https://guides.rubyonrails.org/8_1_release_notes.html#structured-event-reporting">Structured Event Reporting</a>, and may be thinking “that solves it, right?”.</p>

<p>Sorta, kinda, and no.</p>

<p>It sorta solves the problem around sending context into events. Above I complained about the <code class="language-plaintext highlighter-rouge">append_info_to_payload</code> being the only way to arbitrarily inject data into the event object, and this only working for the web request case. So this is a +1.</p>

<p>It kinda makes it work for “rails logs”, as event subscription is how rails default request/view/activerecord logs are emitted. This is probably why most of the API around <code class="language-plaintext highlighter-rouge">Rails.event</code> mimics some of the <code class="language-plaintext highlighter-rouge">Rails.logger</code> API (<code class="language-plaintext highlighter-rouge">#tagged</code> being the most obvious one), and hint at it being the main motivating factor behind the feature (it was developed by a Shopify employee, so you’d have to confirm with someone who works there).</p>

<p>But ultimately, it does not solve the main issue around logging context. <code class="language-plaintext highlighter-rouge">Rails.logger</code> is public API. As application users, we are encouraged to use it as the gateway to write our own logs. Event subscription is nice, but I’m not going to pivot to “emit events so I can write logs”. So while nice, it looks a bit like a rails solution to a rails problem.</p>

<h2 id="what-now">What now?</h2>

<p>This does not solve the lack of support for multiple log devices. Nor support for non-file log devices. Those are its own battles. If you feel strongly about any of them though, don’t hesitate, go ahead and propose a solution.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Over the last few years, I’ve spent quite a significant chunk of my “dayjob” time working on, and thinking about, observability in general, and logging in particular. After a lot of rewriting and overwriting, “don’t repeat yourself” and coping with ecosystem limitations, I figured it was time to write a blog post on the current state of the art of logging in ruby, what I think it’s missing and what I’m doing about it.]]></summary></entry><entry><title type="html">http-2 1.0.0, a fork’s tale</title><link href="honeyryderchuck.gitlab.io/2024/07/10/http-2-a-fork-tale.html" rel="alternate" type="text/html" title="http-2 1.0.0, a fork’s tale" /><published>2024-07-10T00:00:00+00:00</published><updated>2024-07-10T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2024/07/10/http-2-a-fork-tale</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2024/07/10/http-2-a-fork-tale.html"><![CDATA[<p><strong>TL;DR</strong> The <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> gem has been officially <strong>archived</strong>, and has been <strong>replaced</strong> by <a href="https://github.com/igrigorik/http-2">http-2</a> (the gem <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> was originally forked from) as the <strong>only direct dependency</strong> of <a href="https://gitlab.com/os85/httpx">httpx</a>, after being merged back into the latter.</p>

<h2 id="origin-story">Origin story</h2>

<p>The <a href="https://github.com/igrigorik/http-2">http-2</a> gem, is a (quote) <em>pure ruby implementation of the HTTP/2 protocol and HPACK header compression</em>. It’s “transport agnostic”, as in, it does not mess directly with sockets, instead accepting byte strings (via <code class="language-plaintext highlighter-rouge">conn &lt;&lt; bytes</code>), and allowing callbacks to be registered, in order to be called at key moments of an HTTP/2 connection management lifecycle.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># from the README</span>
<span class="nb">require</span> <span class="s1">'http/2'</span>

<span class="n">socket</span> <span class="o">=</span> <span class="no">YourTransport</span><span class="p">.</span><span class="nf">new</span>

<span class="n">conn</span> <span class="o">=</span> <span class="no">HTTP2</span><span class="o">::</span><span class="no">Client</span><span class="p">.</span><span class="nf">new</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">on</span><span class="p">(</span><span class="ss">:frame</span><span class="p">)</span> <span class="p">{</span><span class="o">|</span><span class="n">bytes</span><span class="o">|</span> <span class="n">socket</span> <span class="o">&lt;&lt;</span> <span class="n">bytes</span> <span class="p">}</span>

<span class="k">while</span> <span class="n">bytes</span> <span class="o">=</span> <span class="n">socket</span><span class="p">.</span><span class="nf">read</span>
 <span class="n">conn</span> <span class="o">&lt;&lt;</span> <span class="n">bytes</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Internally, it handles the head-scratching details of the HTTP/2 specs, such as binary frame encoding, stream multiplexing, header compression, and so on, so that, to the end-user, it almost feels like using an HTTP/1 parser. And it does all that, using approachable pure ruby code. It’s been around <strong>since 2014</strong> (long before I planned maintaining an HTTP library), and I’d go as far as calling it the reference implementation of HTTP/2 in ruby.</p>

<p>So when I started toying around with building an HTTP application server, and ultimately came up with an HTTP client (<a href="https://gitlab.com/os85/httpx">httpx</a>, no less), it was a no-brainer decision to pick <a href="https://github.com/igrigorik/http-2">http-2</a> for the HTTP/2 parts of it. Over time, I also became a contributor, authoring several patches, and ultimately gettinng to learn the head scratching details of the HTTP/2 protocol, which the gem initially abstracted for me.</p>

<h2 id="a-fork-in-the-road">A fork in the road</h2>

<p><img src="/images/a-fork-tale/fork-spaghetti.jpg" alt="Git Fork on Spaghetti Code" />
<em>git forks serve the best spaghetti code</em></p>

<p>As <a href="https://gitlab.com/os85/httpx">httpx</a> usage by the community picked up, so did the bug reports, some of them related to <a href="https://github.com/igrigorik/http-2">http-2</a>. Being sort of involved in its development, I could see some cracks which weren’t evident in the beginning, namely spec compliance, and some performance issues here and there. <a href="https://github.com/igrigorik/http-2">http-2</a> being critical to my “HTTP library that could”, I set myself to solve the ones I was able to, and propose the patches upstream, in one pull request.</p>

<p><a href="https://github.com/igrigorik/http-2">http-2</a> had a single maintainer at the time, Ilya Grigorik, which was also the author. I could see that, over <strong>time</strong>, he took more <strong>time</strong> to answer issues or review pull requests in github, sometimes months. Which can mean a lot of things, but if one could reduce it to common characteristics, it usually means that people are just busy with life and/or overwhelmed with “dayjob” responsibilities, and have very little, if any <strong>time</strong> left for interesting-but-ultimately-unpaid work.</p>

<p>The format (one single PR) in which the changes were proposed certainly presented a challenge, given the scope, even if each change was contextually in its own commit (I guess github pull request review flows aren’t optimized for that use-case yet). There were requests to break them down in shorter pull requests, but this was easier said than done (latter changes often depended on earlier changes), and ultimately demanded that I spent even more of my personal <strong>time</strong> in work that wasn’t receiving much of it from everyone else involved. This left the pull request stuck in a social deadlock, where the reviewer didn’t have <strong>time</strong> nor the motivation to review the full scope of changes, the requester didn’t have <strong>time</strong> nor the energy to adjust the scope of the changes, and the community didn’t have neither the <strong>time</strong> nor the context to help the requester nor the reviewer. The tool certainly didn’t help, but <strong>time</strong> was certainly the essence of the problem here.</p>

<p>This standstill was only worsened by having to regularly rebase changes and resolve the resulting conflicts from upstream, and a growing frustration from not being able to solve the production issues I ultimately needed to fix. I felt that, in order to progress with <a href="https://gitlab.com/os85/httpx">httpx</a>, I needed to solve the problem of not owning its critical dependencies, so I needed to do something drastic.</p>

<p>So I forked <a href="https://github.com/igrigorik/http-2">http-2</a>, and <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> was born. And <a href="https://gitlab.com/os85/httpx">httpx</a> has been using since version <code class="language-plaintext highlighter-rouge">0.6.0</code>, released around November 2019.</p>

<h2 id="good-times">Good times</h2>

<p>Fred from the metaphorical shackles of collaboration, I was finally able to improve on what was missing, and then some: compliance tests became a first-class continuous integration citizen; benchmarks were run regularly; new, more performant, ruby APIs were being used, while the gem public API remained backwards-compatible. All this contributed to improved <a href="https://gitlab.com/os85/httpx">httpx</a> performance when benchmarked against other HTTP clients.</p>

<p>On the other hand, the parent was receiving very little activity (less than 10 commits since the fork).</p>

<p>Overall, the decision to fork was an overwhelming net-positive, for <a href="https://gitlab.com/os85/httpx">httpx</a>, despite some hiccups along the way.</p>

<p>But the main drawback of the decision was, nobody was watching.</p>

<h2 id="bad-times">Bad times</h2>

<p>The <a href="https://github.com/igrigorik/http-2">http-2</a> gem was quite popular by the time the fork happened: it’s still over 800 stars even today, and still relied upon: 711 github repositories reference it, and is a dependency from some noteworthy gems, such as the ruby <a href="https://github.com/aws/aws-sdk-ruby">AWS SDK</a>.</p>

<p>There have been other “forks” as well: <a href="https://github.com/socketry/async-http">async-http</a>, the HTTP workhorse of the <a href="https://github.com/socketry/async">async</a> ecosystem, used to have it as a dependency, having been replaced meanwhile by <a href="https://github.com/socketry/protocol-http2">protocol-http2</a>, which although not officially a fork, it certainly used it as reference; <a href="https://github.com/digital-fabric/tipi">tipi</a>, a fiber-based HTTP application server, still declares it as a dependency, but its author <a href="https://github.com/digital-fabric/h2p">has since forked http-2 under a new name</a>, probably with the intent of releasing it as a separate gem.</p>

<p>Whether these forks happened for the same reasons as mine did is irrelevant, as the outcome should be evident: duplication effort and community fragmentation. All these forks have to solve the same issues of the original implementation (spec compliance above all), while not talking to and collaborating with each other. The ecosystems using these “forks” also ultimately determine their popularity, usage, and consequently, the conditions under which a certain category of bugs is found and reported; and when reporting them, <a href="https://gitlab.com/os85/httpx">httpx</a> gem users will use the <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> repo, while users of <a href="https://github.com/socketry/async">async</a> gems will report bugs under the <a href="https://github.com/socketry/protocol-http2">protocol-http2</a> repo.</p>

<p>Only 3 bug reports have been filled overall for <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> (almost 2 million downloads). 4 for <a href="https://github.com/socketry/protocol-http2">protocol-http2</a> (over 5 million). Since 2019, <a href="https://github.com/igrigorik/http-2">http-2</a> has had 8 bug reports (over 17 million downloads overall).</p>

<p>The numbers above are to be taken with a grain of salt. Bugs may have been reported in the repo of the parent gem depending on them. Nevertheless, are the low bug reports correlated with higher quality / less bugs, or lower usage? There’s not a definitive answer.</p>

<p>What I do know is that, despite full API compatibility with the parent gem, no other gem besides <a href="https://gitlab.com/os85/httpx">httpx</a> declares <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> as a direct dependency (the same happens for <a href="https://github.com/socketry/protocol-http2">protocol-http2</a> and <a href="https://github.com/socketry/async-http">async-http</a>, but there’s no API parity there). They’ve been around for at least 5 years, so why is that? Why hasn’t the community migrated to a better alternative? Are they blind?</p>

<p>It turns out that such a thing rarely, if ever, happens.</p>

<p>You got to have a “carrot”. It can be a <strong>certification</strong>. In real life, ain’t nobody got the time to validate whether your fork improves compliance legit. There may be multiple forks around claiming the same. Who’s the regulated authority ensuring specifications are held up? What, there’s no “HTTP/2 certified seal of approval”? What, you said specs run in your CI? Sure, I’ll take your word for it…</p>

<p>It can be convincing prominent gems using the parent gem to <strong>switch</strong> to yours. Depending on who you’re asking it from, guarantees will be asked for. And without a certification, all that is left is <strong>trust</strong> in the fork maintainer (reliance on <strong>social capital</strong>), or <strong>usage metrics</strong>, such as github <strong>repository stats</strong> (which can be inflated by maintainer popularity, proglang userbase volume, or well-timed devrel in HN) or <strong>number of gem downloads</strong> (which can be inflated by misconfigured CIs and internet bots). Now, I hate taking decisions on dependencies based on github stars as much as the next guy, but I also work and have worked in places where convincing managers to take your side in decision logs often involves looking at a table comparing options where “measure X is bigger for option 1 than option 2” where no one really understands X, but it’s important to take decisions based on data (and in some cases yes, X was github stars, and I felt dirty).</p>

<p>Awareness to your fork can also be achieved in other ways. You can present it at a conference. You can write a few blog posts about it (hello there!). Ultimately that requires investing more of your <strong>time</strong>, which you may not have, and ay have ultimately been the main reason for forkig (as per above, it was the case for <a href="https://github.com/igrigorik/http-2">http-2</a>).</p>

<p>And even if you do all of the above, the path of least resistance will keep most on the parent gem. Despite all of its known flaws. Despite being somewhat inactive. It’s the devil they know. It’ll fail in unexpected ways, may or may not get reported back, and the fork maintainer will have no other option but to monitor the changes from the parent repo.</p>

<p>To sum up, while the decision to fork was an overwhelming net-positive for <a href="https://gitlab.com/os85/httpx">httpx</a>, that’s certainly debatable for the maintainers, and the community as a whole.</p>

<h2 id="a-light-that-never-goes-out">A light that never goes out</h2>

<p>Recently, a ruby <a href="https://github.com/aws/aws-sdk-ruby">AWS SDK</a> maintainer became a committer, and started picking up outstanding issues in the <a href="https://github.com/igrigorik/http-2">http-2</a> repository. It eventually stumbled in my at-the-time-still-open pull request. He promptly asked me whether I wanted to resume the work. I gave him a very short version of the history described above, and suggested using <a href="https://gitlab.com/os85/http-2-next">http-2-next</a>, which was turned down as being “too difficult” (probably not technically, as per what I wrote in the previous section). He was nonetheless interested in helping remove the obstacles preventing it from having been merged in the past. So I found myself considering whether it was worth doing it.</p>

<p>It’s been 5 years. A lot of things were against it: <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> source code is primarily hosted in gitlab, and integrated with <strong>gitlab CI</strong> (readers of this blog should already know <a href="/2020/10/03/how-i-decreased-the-carbon-footprint-of-ci.html">I’m a gitlab fanboy</a>. I had since adapted code style and linting rules to my own personal preferences (for instance, I prefer having double quote strings everywhere and avoid the ambiguity of dealing with both; I know, <a href="https://anti-pattern.com/always-use-double-quoted-strings-in-ruby">controversial</a>). Unexisting things like RBS type signatures. The scope of changes was therefore much greater than before, which would make reviewing it even harder than before; accomplishing it would not be possible by just cherry-picking commits from one side to the other, as both main repo and fork had moved forward, and the potential for conflicts was just too high.</p>

<p>On the other side of the coin, there was a lot going for it. For example, there was no breaking public API change, so it’s not like a wildly different gem being merged into another, which would have held adoption back. <a href="https://github.com/igrigorik/http-2">http-2</a> still has a lot more community watching the repo or reporting bugs, and that would help validate the performance and compliance benefits committed to the fork even more.</p>

<p>So we all sat together (virtually), and came to an agreement. <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> was to be ported “as-is” into the “main” branch of <a href="https://github.com/igrigorik/http-2">http-2</a>, in one giant pull request. Once reviewed, this would become the repository HEAD. Once that was done, I’d become co-maintainer, with gem push rights.</p>

<p>There were compromises made: one giant commit instead of multiple smaller commits meant both that <a href="https://github.com/igrigorik/http-2">http-2</a> maintenanceship had to accept extra changes they perhaps would not agree to otherwise (different linting rules, for example), and <a href="https://gitlab.com/os85/http-2-next">http-2-next</a> maintenanceship would lose the commit history of each change from the fork (the old repo will always be there for consultation purposes though), all in the name of reducing the overhead of getting the changes upstream and publish a release. It also meant I had to say goodbye to <strong>gitlab CI</strong> and just learn how to bake the same cake with <strong>Github Actions</strong>, although some things were lost along the way; for instance, I was able to publish coverage docs in gitlab and link to them on the coverage badge, and I still don’t know how to generate coverage badges in <strong>Github Actions</strong>, nor how to make coverage docs publicly available (if someone knows how to do it, I’ll wait for your pull request:) ).</p>

<p>It took what it had to take, but we did it! <a href="https://github.com/igrigorik/http-2">http-2</a> 1.0.0 was released in June 2024, and, 5 years after, <a href="https://gitlab.com/os85/httpx">httpx</a> 1.4.0 became the first version since 0.6.0 to declare <a href="https://github.com/igrigorik/http-2">http-2</a> as a dependency.</p>

<h2 id="conclusion">Conclusion</h2>

<p>I wrote this post as a celebration of a fork successfully being merged back into the mothership. This is not just about me, the ruby community, or my own particular gem drop in the rubygems ocean. Generally, this type of event is the exception, not the rule. In the FOSS world, forks are allowed, and encouraged. And for many good reasons. It’s empowering. It’s liberating. It can help breed innovation. But sometimes, they’re unnecessary fragmentation. Of contributors, and users. They generate effort duplication. They may lead to competting efforts in an environment where there may ultimately be no trophy at the end of the line, rather an inbox full of angry users and bug reports, or complete silence, and ultimately burnout. And when you realize it, it’s too late, or costly, to go back.</p>

<p>Back then, I was so obsessed with the idea of “killing” my dependencies, that I couldn’t see the bigger picture. In hindsight, if I could do things differently, I would have tried to contact Ilya in order to figure out whether I could help with reducing his burden, perhaps not being fearful of suggesting becoming a maintainer and getting a no for an answer. Essentially, just try to solve the social collaboration problem first, before jumping into implementing a technical solution.</p>

<p>Raise your glass to all forks, old and new, dead and gone, alive and well! May they all find their way back to the Source!</p>

<p><img src="/images/a-fork-tale/gatsby.jpg" alt="Gatsby Toast" /></p>]]></content><author><name></name></author><summary type="html"><![CDATA[TL;DR The http-2-next gem has been officially archived, and has been replaced by http-2 (the gem http-2-next was originally forked from) as the only direct dependency of httpx, after being merged back into the latter.]]></summary></entry><entry><title type="html">The state of HTTP clients, or why you should use httpx</title><link href="honeyryderchuck.gitlab.io/2023/10/15/state-of-ruby-http-clients-use-httpx.html" rel="alternate" type="text/html" title="The state of HTTP clients, or why you should use httpx" /><published>2023-10-15T00:00:00+00:00</published><updated>2023-10-15T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2023/10/15/state-of-ruby-http-clients-use-httpx</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2023/10/15/state-of-ruby-http-clients-use-httpx.html"><![CDATA[<p><strong>TL;DR</strong> <em>most http clients you’ve been using since the ruby heyday are either broken, unmaintained, or stale, and you should be using <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> nowadays.</em></p>

<p>Every year, a few articles come out with a title similar to “the best ruby http clients of the year of our lord 20xx”. Most of the community dismisses them as clickbait, either because of the reputation of the content owner website, companies pushing their developers to write meaningless content in their company tech blog for marketing purposes, or AI bots trained on similar articles from the previous decade and serving you the same contet over and over.</p>

<p>And they’re right. Most of the times, these articles are hollow, devoid of meaningful examples or discussions about relevant features, trade-offs or performance characteristics, and mostly rely on shallow popularity metrics such as total downloads, number of stars on GitHub, or number of twitter followers from the core maintainer, to justify selections. They’ll repeat what you know already for years: <a href="https://github.com/lostisland/faraday">faraday</a> is downloaded 20 million times a year, <a href="https://github.com/jnunemaker/httparty">httparty</a> parties hard, no one likes <a href="https://github.com/ruby/net-http/">net-http</a>, and there are too many http clients in the ruby community.</p>

<p>These articles very rarely mention newcomers. Being the developer of <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a>, a relatively recent (created in 2017) HTTP client, and having extensively researched the competition, I can’t help but feel that there’s a lot that hasn’t been mentioned yet. So, given the context I gathered all over these years, I believe I can myself do <em>the article I’d like someone else to have done already about the topic but didn’t</em>.</p>

<p>Alas, this is yet another “the state of ruby HTTP clients in 2023”. <a href="https://www.youtube.com/watch?v=Hgd2F2QNfEE">There are many like it, but this one is mine</a>. And while you’ll find it hardly surprising that I recommend you to use <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> nowadays (I’m the maintainer after all), I’ll try to make the analysis as unbiased as possible, and play the devil’s advocate here and there.</p>

<h2 id="population">Population</h2>

<p>As of the time of writing this article, there are <a href="https://www.ruby-toolbox.com/categories/http_clients?display=table&amp;order=score">33 http client gems listed in ruby toolbox</a>. It takes a book to cover them all! How can I limit the sample to relevant gems only? What classifies as “relevant” anyway?</p>

<p>While the ruby toolbox ranking suffers from the “social” factor as well (github and number of stars are an important metric in their score calculation after all), it does collect data around maintenance health, which is a variable to take into account.</p>

<p>Categorization is not very precise either; for instance, some of the listed gems are hardly HTTP “clients”, rather a layer built on top of other HTTP clients instead. For instance, <a href="https://www.ruby-toolbox.com/projects/flexirest">flexirest</a> or <a href="https://www.ruby-toolbox.com/projects/restfulie">restfulie</a> are DSLs around “RESTful API” concepts; <a href="https://www.ruby-toolbox.com/projects/hyperclient">hyperclient</a> is a DSL to build <a href="https://stateless.group/hal_specification.html">HAL JSON API</a> clients; <a href="https://www.ruby-toolbox.com/projects/json_api_client">json_api_client</a> does the same for APIs following the JSON API Spec; all of them are using <a href="https://github.com/ruby/net-http/">net-http</a>, ruby’s own standard library include HTTP client, under the hood though. So one can dismiss them as <em>not really</em> HTTP clients.</p>

<p>Some of the listed gems can’t event perform HTTP requests. For instance, <a href="https://www.ruby-toolbox.com/projects/multipart-post">multipart-post</a>, the second best-ranked by project score index, is essentially a group of components to be used with <a href="https://github.com/ruby/net-http/">net-http</a> to enable generation of multipart requests. You still have to use <a href="https://github.com/ruby/net-http/">net-http</a> directly though! There are other gems of this kind (I’ll address them later) which aren’t part of this list either.</p>

<p>Filtering by these two metrics alone, we come to a much shorter list of candidates, which most rubyists should be familiar with:</p>

<ul>
  <li><a href="https://www.ruby-toolbox.com/projects/faraday">faraday</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/excon">excon</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/rest-client">rest-client</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/httparty">httparty</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/httpclient">httpclient</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/httpclient">typhoeus</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/http">HTTPrb</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/mechanize">mechanize</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/httpi">httpi</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/curb">curb</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/em-http-request">em-http-request</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/httpx">httpx</a></li>
  <li><a href="https://github.com/ruby/net-http">net-http</a></li>
</ul>

<p>But we can go even further.</p>

<h3 id="active-maintenance">Active maintenance</h3>

<p>While I don’t personally measure gems by the change rate of the source code, as I believe that there’s a thing such as considering a piece of software as “feature complete”, one can’t apply that line of thought to gems having frequent complaints and bug reports, with barely a response from any maintainer. And there are entries in our remaining list of candidates which, although very popular based on number of downloads and GitHub stars, haven’t been very (if at all) responsive to user feedback in the last couple of years.</p>

<p>Take <a href="https://github.com/rest-client/rest-client">rest-client</a> for example: one of the oldest and most downloaded gems of the list, its last release was in 2019, with several unanswered bug reports and open pull requests since then.</p>

<p><a href="https://github.com/nahi/httpclient">httpclient</a>, even older that <a href="https://github.com/rest-client/rest-client">rest-client</a>, is in an even worse condition: last released in 2016(!), several unanswered issues, including <a href="https://github.com/nahi/httpclient/issues/445">this one which is particularly concerning, and should render the gem unusable</a>.</p>

<p>For another example, there’s also <a href="https://github.com/typhoeus/typhoeus">typhoeus</a>, last released in 2020, with several open issues as well.</p>

<p>While maintainers shouldn’t be criticized for exercising the freedom of leaving their maintenance duties behind, I find it concerning nonetheless that <a href="https://www.scrapingdog.com/blog/ruby-http-clients/">articles keep popping up recommending their orphaned gems</a>. Consider as well that these gems are still reverse dependencies of thousands of other gems. As an example, <a href="https://github.com/typhoeus/typhoeus">typhoeus</a> is the default HTTP client library in <a href="https://github.com/OpenAPITools/openapi-generator/blob/master/docs/generators/ruby.md">openapi-generator</a>, which automates the generation of API client SDKs in several programming languages (including ruby).</p>

<p>So while I’ll probably mention some of them here and there, I won’t further analyse any of the alternatives which are <em>de facto</em> unmaintained.</p>

<h3 id="wrappers-wrappers-everywhere">Wrappers, wrappers everywhere</h3>

<p>When it comes to HTTP clients in ruby, there are 3 main groups:</p>

<ul>
  <li>Those which wrap <a href="https://github.com/ruby/net-http/">net-http</a></li>
  <li>Those which wrap <a href="https://curl.se/">curl</a></li>
  <li>Everything else</li>
</ul>

<p>On top of these, you’ll find the “general wrappers” which integrate with as many HTTP “backends” as possible, and aim at providing common interfaces and functionality on top. This group includes <a href="https://github.com/lostisland/faraday">faraday</a>, the best-ranked gem by project score in Ruby Toolbox, and <a href="https://github.com/savonrb/httpi">httpi</a>, which is a transitive dependency of <a href="https://github.com/savonrb/savon">savon</a>, the most popular ruby <a href="https://en.wikipedia.org/wiki/SOAP">SOAP</a> client. This means that, for most of the purposes of this article’s research, they’re irrelevant, although I’ll still include <a href="https://github.com/lostisland/faraday">faraday</a> due to its popularity.</p>

<h4 id="faraday">Faraday</h4>

<p><a href="https://github.com/lostisland/faraday">faraday</a> provides a common HTTP API, and an integration layer every client can integrate with, and distributes common functionality around. In a nutshell, it aims at doing what <a href="https://github.com/rack/rack">rack</a> did for application servers: provide a “common middleware” and enable switching the “engine”. Its mirroring of <a href="https://github.com/rack/rack">rack</a>’s stragegy goes beyond that, as it even copies some of its quirks, such as the rack <code class="language-plaintext highlighter-rouge">env</code>, all the way to “status - headers - body” interface, and the concept of middlewares.</p>

<p>Its approach has had undeniable success: not only the most downloaded, it’s also the HTTP client gem with the most reverse dependencies. Nevertheless, it’s far from the “one true way” of putting HTTP requests in front of people.</p>

<p>For once, it does not guarantee full feature coverage for all supported backends: while one can argue whether this can be made feasible or not, maintenance of the integration layer requires decent knowledge of both <a href="https://github.com/lostisland/faraday">faraday</a> and the underlying HTTP client, for each of the supported clients, and there isn’t enough skill around with the time and motivation to do it. So just assume that there’s always something which will be missing for a given integration, some feature which was recently added, some feature which only exist in that particular backend, and so on. Which makes the advantage of possibly switching backends heavily constrained by how deeply the <a href="https://github.com/lostisland/faraday">faraday</a> featureset is used.</p>

<p>Moreover, the features it offers (usually via <a href="https://lostisland.github.io/faraday/#/middleware/index">middlewares</a>) often repeat functionality already provided by some of the backends, and sometimes incomplete in comparison. For instance, <a href="https://github.com/lostisland/faraday">faraday</a> provides HTTP auth, json encoding, or multipart encoding, as features; however, it only supports <a href="https://en.wikipedia.org/wiki/Basic_access_authentication">Basic HTTP auth</a> (some backends support other schemes authentication schemes, such as <a href="https://en.wikipedia.org/wiki/Digest_access_authentication">Digest HTTP auth</a>). Also, some of the backends already deal with multipart requests (in some cases in a more complete manner, we’ll get to that later), and dealing with <code class="language-plaintext highlighter-rouge">JSON</code> may arguably not be a “hard” problem worth having a middleware for (the <a href="https://github.com/flori/json">json</a> standard library makes that already quite easy). Some of the value of these middlewares is therefore a bit dilluted, at least when not dealing with more involved features (like dealing with retries, for instance).</p>

<p>Moreover, by basing itself on the <a href="https://github.com/rack/rack">rack</a> protocol, it also inherits its problems. <a href="https://github.com/rack/rack">rack</a> API, although simple, ain’t easy. Consider the lowest common denominator:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
  <span class="p">[</span><span class="mi">200</span><span class="p">,</span> <span class="p">{},</span> <span class="p">[</span><span class="s2">"Hello World"</span><span class="p">]]</span>
<span class="k">end</span>
</code></pre></div></div>

<p>That <code class="language-plaintext highlighter-rouge">env</code> variable isn’t self-explanatory; it’s a bucket of key-value junk. And while the <a href="https://github.com/rack/rack/blob/main/SPEC.rdoc">rack spec</a> does a reasonable job of specifying which keys must or should be there and what they should point to, <a href="https://github.com/lostisland/faraday">faraday</a> does not provide a specification. So <code class="language-plaintext highlighter-rouge">env</code> ends up being an undefined “object which is an hash?”, where you can call things such as <code class="language-plaintext highlighter-rouge">env.request</code>, <code class="language-plaintext highlighter-rouge">env.ssl</code>, <code class="language-plaintext highlighter-rouge">env.body</code>, <code class="language-plaintext highlighter-rouge">env[:method]</code>or <code class="language-plaintext highlighter-rouge">env[:parallel_manager]</code>, and the only way to know which is which, is by reading the code of existing adapters and hope/test you’re using the right thing. All of that for the convenience of having something similar to <a href="https://github.com/rack/rack">rack</a>, because it makes things… simple? 🤷</p>

<p>Building features on top of middleware was also a mistake inherited from <a href="https://github.com/rack/rack">rack</a> in hindsight. <a href="https://github.com/lostisland/faraday/issues/1458">Order matters</a>.</p>

<p>To sum up, although <a href="https://github.com/lostisland/faraday">faraday</a> treats the backends it integrates with as dump pipes, they’re rarely dumb. Its choices in integration path also make it rather limiting when building adapters for it, and “spread ownership” from having adapters as its own separate gems (a decision of <a href="https://github.com/lostisland/faraday">faraday</a> maintainers) results in adapters covering a “low common denominator” subset of features - which makes it hard to switch adapters - so gems integrating with <a href="https://github.com/lostisland/faraday">faraday</a> usually settle with just one. Its user-facing API is reasonably ok (if you forget about parallel requests of multipart support); however, most third-party SDK/gems based on <a href="https://github.com/lostisland/faraday">faraday</a> just treat it as an implementation detail, and end up <strong>not</strong> exposing <a href="https://github.com/lostisland/faraday">faraday</a> connections to end users to “augment with middlewares” or even changing backend. And they’ll have to deal with its other quirks. <a href="https://github.com/stripe/stripe-ruby/issues/795#issuecomment-502707959">The stripe gem decided not to wait any longer for that upside</a>.</p>

<p>So if you want an HTTP client to implement an SDK on top of, do your research and pick up your own HTTP client, instead of <a href="https://github.com/lostisland/faraday">faraday</a>.</p>

<h4 id="wrapping-curl">Wrapping curl</h4>

<p><a href="https://curl.se/">curl</a> is the most widely used HTTP client in all of software. It’s probably top 10 in most used software in general. <a href="https://daniel.haxx.se/blog/2021/12/03/why-curl-is-used-everywhere-even-on-mars/">It’s used even in Mars</a>. This is synonym to “battle-tested”, “fully-featured”, and “performant”. Being written in C, it’s no wonder that, for a multitude of runtimes with any sort of C ABI interoperability, there are <a href="https://github.com/topics/curl-library">a lot of wrappers for it</a>. And ruby is no exception: <a href="https://github.com/typhoeus/typhoeus">typhoeus</a>, <a href="https://github.com/taf2/curb">curb</a> and <a href="https://github.com/toland/patron">patron</a> at least, are all <a href="https://curl.se/libcurl/">libcurl</a> wrappers, interfacing with it either via <code class="language-plaintext highlighter-rouge">libffi</code> or C extensions.</p>

<p>This is no free lunch either. For once, HTTP is only <a href="https://everything.curl.dev/protocols/curl">one of the many protocols supported by curl for transfers</a>. The integration will therefore have to make sure that no other protocol can be abused (and, for example, some vulnerable FTP code path is accidentally called), only possible by custom-building <a href="https://curl.se/">curl</a> with support for only HTTP; however, in most cases, integrations will often target the system-installed <a href="https://curl.se/libcurl/">libcurl</a>, which is open-ended in that regard.</p>

<p>This, on the other hand, makes deployments and dependency tracking harder: now you’ll have to follow changes and security announcements related both to the ruby HTTP library <strong>and</strong> <a href="https://curl.se/libcurl/">libcurl</a>. Otherwise, how will you know that a bugfix has been released, or worse, a security fix? (Did I already mention that <a href="https://curl.se/libcurl/">libcurl</a> is written in C? <a href="https://daniel.haxx.se/blog/2023/10/11/how-i-made-a-heap-overflow-in-curl/">Here’s a recent reminder.</a>) You’ll also need to ensure that the version of <a href="https://curl.se/libcurl/">libcurl</a> you want to compile against is installed in your production servers, which makes server setups (containers or not) more cumbersome to maintain: installing <a href="https://curl.se/">curl</a>, or <a href="https://curl.se/libcurl/">libcurl</a>, is usually something left for the system package manager to handle (<code class="language-plaintext highlighter-rouge">aptget</code>, <code class="language-plaintext highlighter-rouge">yum</code>, <code class="language-plaintext highlighter-rouge">brew</code>…), but these tend to take years to adopt the “latest greatest” version of <a href="https://curl.se/libcurl/">libcurl</a>, in this case the one containing that security fix you so desperately need. So you’ll have to do the work of downloading, unpacking and installing it as a pre-compiled system package (don’t forget to do the same with the <a href="https://curl.se/docs/libs.html">several libcurl dependencies</a>, like <code class="language-plaintext highlighter-rouge">libidn2</code>, or <code class="language-plaintext highlighter-rouge">nghttp2</code>, etc…). To mitigate some of the pain associated with this, it’s usually best practice that the ruby interface ends up supporting multiple versions of <a href="https://curl.se/libcurl/">libcurl</a> which may be installed, at the cost of increased risk and maintenance overhead for the gem maintainers.</p>

<p>Alternatively, you can include it as an on-the-fly-compiled vendored C dependency from the gem. That will come <a href="https://github.com/taf2/curb/issues/75">with its own can of worms though</a>. Even <a href="https://github.com/typhoeus/ethon/issues/206">FFI-based integrations aren’t free of system-related problems</a>. This is the type of overhead that a pure ruby package does not incur.</p>

<p>Usability of the gem API is also a problem. However good <a href="https://curl.se/libcurl/">libcurl</a> API is, it is idiomatic C, not idiomatic ruby. And for all its efforts in hiding the details of <a href="https://curl.se/libcurl/">libcurl</a> API, these tend to leak into the surface of end user ruby code:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># using typhoeus</span>
<span class="k">case</span> <span class="n">response</span><span class="p">.</span><span class="nf">code</span>
<span class="k">when</span> <span class="mi">200</span>
  <span class="c1"># success</span>
<span class="k">when</span> <span class="mi">0</span>
  <span class="c1"># special curl code for when something is wrong</span>

<span class="c1"># using curb</span>
<span class="c1"># curl_easy and curl_multi are C-level libcurl interfaces</span>
<span class="c1"># curb exposes them to ruby code almost "as is"</span>
<span class="n">c</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Easy</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"https://http2.akamai.com"</span><span class="p">)</span>
<span class="c1"># this is the C-way how conn options are set (this one enables HTTP/2). So one line for each...</span>
<span class="n">c</span><span class="p">.</span><span class="nf">set</span><span class="p">(</span><span class="ss">:HTTP_VERSION</span><span class="p">,</span> <span class="no">Curl</span><span class="o">::</span><span class="no">HTTP_2_0</span><span class="p">)</span>
</code></pre></div></div>

<p>This could probably be worth it if there’d be a huge feature gap, or the performance was much greater than the non-curl based alternatives, but this is not the case either (more about this later).</p>

<p>So from the standpoint of coding in ruby, I don’t see many advantages which justify the downsides of choosing a library wrapping <a href="https://curl.se/libcurl/">libcurl</a>.</p>

<h4 id="wrapping-net-http">Wrapping net-http</h4>

<p><a href="https://github.com/ruby/net-http">net-http</a> is the standard library HTTP client. Because it ships with ruby, it’s probably (because I don’t have numbers to back it up, but still, high degree of certainty) the most widely used ruby HTTP client. A significant portion of that usage is indirect though, given how many gems out there wrap it (<a href="https://github.com/jnunemaker/httparty">httparty</a> and <a href="https://github.com/rest-client/rest-client">rest-client</a> most notably; <a href="https://github.com/lostisland/faraday">faraday</a> default adapter is also for <a href="https://github.com/ruby/net-http/">net-http</a>).</p>

<p>And that’s because nobody likes writing <a href="https://github.com/ruby/net-http/">net-http</a> code. And it’s easy to see why, <a href="http://www.rubyinside.com/nethttp-cheat-sheet-2940.html">just look at this cheatsheet</a>: its API is convoluted, verbose, needlessly OO-heavy (why does one need an exception for every HTTP error status code…), it just does not enact joy. Worse, there’s no fix for that: because it’s standard library, and its clunky API is relied up almost as much as ruby core syntax, it’s resistant to change, so its clunkiness is relied upon in a lot of legacy code; any change to address the mentioned points risks having a wide “blast radius” and breaking a significant portion of ruby production deployments.</p>

<p>For this reason, and for a while already, (<a href="https://github.com/jnunemaker/httparty">httparty</a> first release is from 2008!), several libraries have been released with the expressed goal of exposing a user-friendlier DSL for doing HTTP requests, while abstracting the difficulty of dealing with <a href="https://github.com/ruby/net-http/">net-http</a> API internally. Off this wave, the “one that parties hard” and <a href="https://github.com/rest-client/rest-client">rest-client</a> have been the most popular ones. The improvements are perceived by many to offset the drawbacks of the using <a href="https://github.com/ruby/net-http/">net-http</a>, while still retaining the whole “engine” intact. This creates a whole new set of problems though.</p>

<p>One is “feature parity drift”. <a href="https://github.com/ruby/net-http/">net-http</a> has many features AND lacks key features, but still receives active development, sometimes addresses the latter. For a wrapper, this means that, there’s always going to be a subset of recent functionality which hasn’t been properly wrapped yet. <a href="https://github.com/jnunemaker/httparty">httparty</a> took years to include configuration to cover all possible <a href="https://github.com/ruby/net-http/">net-http</a> options: just in 2018, I remember ranting about <a href="https://www.rubydoc.info/stdlib/net/Net%2FHTTP:set_debug_output">not being able to enable net-http’s debug output from its API</a>, an option supported in <a href="https://github.com/ruby/net-http/">net-http</a> at least since the ruby 1.8.7. days; and somewhere, <a href="https://github.com/rest-client/rest-client/issues/687">someone’s still waiting for max_retries support to be added to rest-client</a>.</p>

<p>Another is “implementation multiplication”. <a href="https://github.com/ruby/net-http/">net-http</a> lacks some basic core functionality one would expect from an HTTP client, like support for multipart request or digest auth; so <a href="https://github.com/jnunemaker/httparty/blob/master/examples/multipart.rb">faraday has to</a> <a href="https://github.com/jnunemaker/httparty/blob/master/lib/httparty/net_digest_auth.rb">fill in the gaps</a>, just like <a href="https://github.com/lostisland/faraday-multipart">faraday</a>, or <a href="https://github.com/rest-client/rest-client?search=1#multipart">rest-client</a>, and this despite <a href="https://github.com/socketry/multipart-post">known patches</a> <a href="https://github.com/drbrain/net-http-digest_auth">to net-http itself</a> being developed by the community, all of which is a massive repetition of effort, where certain edge-case bugs may be present in some but not in others, clearly not the most efficient use of a community time and energy.</p>

<p>And meanwhile, new features arrive in <a href="https://github.com/ruby/net-http/">net-http</a> every year; it being in standard library, there’s always someone pushing for new features to be added, which reflects in “continuous overhead” for wrapper maintainers, which are required to perpetually shim the new functionality. If the wrappers are maintained at all, that is (<a href="https://github.com/rest-client/rest-client">rest-client</a> hasn’t since a release in 3 years, so as good as “unmaintained”).</p>

<p>So while I agree with the overall sentiment that <a href="https://github.com/ruby/net-http/">net-http</a> is not code I like reading or maintaining, and that its existence only reflects badly on ruby itself (no one will take a “ruby is beautiful” statement seriously by looking at its stdlib HTTP-related code), on the other hand, given the situation I just described, and economy of dependencies trumps freedom of solution choice, using <a href="https://github.com/ruby/net-http/">net-http</a> straight up is a better option than sticking with one of its wrappers.</p>

<h2 id="evaluation">Evaluation</h2>

<p>So far, one can see that, although there seems to be plenty of choice, there’s actually a short list one can reasonably hold on to:</p>

<ul>
  <li><a href="https://www.ruby-toolbox.com/projects/faraday">faraday</a></li>
  <li><a href="https://www.ruby-toolbox.com/projects/excon">excon</a></li>
  <li><del>rest-client</del> (no release in the last 3 years, high number of unanswered issues)</li>
  <li><a href="https://www.ruby-toolbox.com/projects/httparty">httparty</a></li>
  <li><del>httpclient</del> (no release in the last 3 years, high number of unanswered issues)</li>
  <li><del>typhoeus</del> (no release in the last 3 years, high number of unanswered issues)</li>
  <li><a href="https://www.ruby-toolbox.com/projects/http">HTTPrb</a></li>
  <li><del>mechanize</del></li>
  <li><del>httpi</del> (fringe HTTP client wrapper, no release in almost 2 years)</li>
  <li><a href="https://www.ruby-toolbox.com/projects/curb">curb</a></li>
  <li><del>em-http-request</del></li>
  <li><a href="https://www.ruby-toolbox.com/projects/httpx">httpx</a></li>
  <li><a href="https://github.com/ruby/net-http">net-http</a></li>
</ul>

<p>I’m also removing <a href="https://github.com/igrigorik/em-http-request">em-http-request</a> and <a href="https://github.com/sparklemotion/mechanize">mechanize</a> from this list. About <a href="https://github.com/igrigorik/em-http-request">em-http-request</a>, despite its low-but-existing activity rate, its adoption hangs on it being used via an async framework, <a href="https://github.com/eventmachine/eventmachine">eventmachine</a>, which itself hasn’t seen much activity lately, and has fallen out of use and popularity due to its API and runtime incompatibility with “standard” ruby network code. About <a href="https://github.com/sparklemotion/mechanize">mechanize</a>, despite it technically being an HTTP client, it’s mostly a “web scraping” tool which interacts with webpages (fill up forms, click links, etc…), impersonating the role of a browser (which is also technically an HTTP client).</p>

<p>So now that we have a defined sample for the analysis, let’s begin.</p>

<h3 id="ux--developer-ergonomics">UX / Developer ergonomics</h3>

<h4 id="response">Response</h4>

<p>The most basic feature required from an HTTP client library is performing GET requests (for example, to download a webpage). And that’s a feature that any library mentioned in this article so far (and all the others that haven’t, most probably), is able to easily perform. In fact, it’s so easy, that you can achieve it using similar API for all them:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># please download google front page</span>
<span class="n">uri</span> <span class="o">=</span> <span class="s2">"https://www.google.com"</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span> <span class="c1"># httpx</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span> <span class="c1"># Excon</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span> <span class="c1"># faraday</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span> <span class="c1"># HTTPrb</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span> <span class="c1"># httparty</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span> <span class="c1"># curb</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">get_response</span><span class="p">(</span><span class="no">URI</span><span class="p">(</span><span class="n">uri</span><span class="p">))</span>  <span class="c1"># even net-http manages to inline</span>
</code></pre></div></div>

<p>The response object that each of these calls returns will be a bit “different but similar” in most situations: some will return the response status code via a <code class="language-plaintext highlighter-rouge">.status</code> method, while others call it <code class="language-plaintext highlighter-rouge">.code</code>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">response</span><span class="p">.</span><span class="nf">status</span> <span class="c1">#=&gt; 200, for httpx, excon, faraday</span>
<span class="n">response</span><span class="p">.</span><span class="nf">code</span> <span class="c1">#=&gt; 200, for HTTPrb, httparty, curb</span>
<span class="n">response</span><span class="p">.</span><span class="nf">code</span> <span class="c1">#=&gt; "200", why, net-http…</span>
</code></pre></div></div>

<p>The response object will also allow access to the response HTTP headers, in most of cases via a <code class="language-plaintext highlighter-rouge">.headers</code> method. The returned object is not always the same, although in most cases is, at the very least, something which allows <code class="language-plaintext highlighter-rouge">[key]</code> based lookups, and which can be turned into a Hash:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># httpx</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span> <span class="c1">#=&gt; a custom class, which implements basic [] and []=, responds to .to_h</span>
<span class="c1"># excon</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span> <span class="c1">#=&gt; instance of a custom class inheriting from Hash</span>
<span class="c1"># faraday</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span> <span class="c1">#=&gt; instance of a custom class inheriting from Hash</span>
<span class="c1"># HTTPrb</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span> <span class="c1">#=&gt; a custom class, which implements basic [] and []=, responds to .to_h</span>
<span class="c1"># httparty</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span> <span class="c1">#=&gt; a custom SimpleDelegator (to a Hash) class</span>
<span class="c1"># curb</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span> <span class="c1">#=&gt; a Hash</span>
<span class="c1"># net-http</span>
<span class="n">response</span><span class="p">.</span><span class="nf">header</span> <span class="c1">#=&gt; a custom class, which is HTTPSuccess when 200, something else otherwise….</span>

<span class="c1"># all support case-insensitive lookup</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"content-type"</span><span class="p">]</span> <span class="c1">#=&gt; "text/html; charset=ISO-8859-1"</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"Content-Type"</span><span class="p">]</span> <span class="c1">#=&gt; "text/html; charset=ISO-8859-1"</span>

<span class="c1"># only httpx provides access to multi-value header</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"set-cookie"</span><span class="p">]</span> <span class="c1">#=&gt; "SOCS=CA…; AEC=AUEFqZe…; __Secure-ENID=12.SE=A8"</span>
<span class="n">response</span><span class="p">.</span><span class="nf">headers</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"set-cookie"</span><span class="p">)</span> <span class="c1">#=&gt; ["SOCS=CA…", "AEC=AUEFqZe…", "__Secure-ENID=12.SE=A8"] , accesses each "set-cookie" response header individually</span>
</code></pre></div></div>

<p>Finally, the response object allows retrieving the response body, usually via a <code class="language-plaintext highlighter-rouge">.body</code> method. As with the example above, the returned object is not always the same, but at the very least can be turned into a String, and in some cases, can be handled as a “file”, i.e. can be read in chunks, which is ideal when dealing with chonky payloads. In some cases, there is custom API for decoding well known encoding formats into plain ruby objects:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># httpx</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span> <span class="c1">#=&gt; a custom class</span>
<span class="n">response</span><span class="p">.</span><span class="nf">to_s</span> <span class="c1">#=&gt; a ruby string</span>
<span class="n">response</span><span class="p">.</span><span class="nf">form</span> <span class="c1">#=&gt; if "application/x-www-form-urlencoded" content-type, returns the ruby Hash</span>
<span class="n">response</span><span class="p">.</span><span class="nf">json</span> <span class="c1">#=&gt; if "application/json" content-type, returns the ruby Hash</span>
<span class="c1"># excon</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span> <span class="c1">#=&gt; a ruby string</span>
<span class="c1"># and that's it, no shortcut for decoding</span>
<span class="c1"># faraday</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span> <span class="c1">#=&gt; a ruby string</span>
<span class="c1"># HTTPrb</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span> <span class="c1">#=&gt; a custom class, which implements .to_s and .readpartial</span>
<span class="c1"># httparty</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span> <span class="c1">#=&gt; a ruby string</span>
<span class="c1">#faraday</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s1">'https://httpbin.org'</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="c1"># json decoder supported via faraday middleware</span>
  <span class="n">f</span><span class="p">.</span><span class="nf">response</span> <span class="ss">:json</span>
<span class="k">end</span>
<span class="n">json</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/get"</span><span class="p">).</span><span class="nf">body</span> <span class="c1"># already a ruby Hash</span>
<span class="c1"># curb</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span> <span class="c1">#=&gt; a ruby string</span>
<span class="c1"># net-http</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span> <span class="c1">#=&gt; a ruby string.</span>

<span class="c1"># --------</span>

<span class="n">big_file_url</span> <span class="o">=</span> <span class="s1">'https://some-cdn.com/path/to/file'</span>

<span class="c1"># httpx and HTTPrb support chunked response streaming via implementations of .read</span>
<span class="c1"># or .readpartial, so this is possible with both:</span>

<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">big_file_url</span><span class="p">)</span> <span class="c1"># httpx</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">big_file_url</span><span class="p">)</span> <span class="c1"># HTTPrb</span>

<span class="no">IO</span><span class="p">.</span><span class="nf">write</span><span class="p">(</span><span class="s2">"/path/to/file"</span><span class="p">,</span> <span class="n">response</span><span class="p">.</span><span class="nf">body</span><span class="p">)</span>
<span class="c1"># HTTPX has an API just for this:</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span><span class="p">.</span><span class="nf">copy_to</span><span class="p">(</span><span class="s2">"/path/to/file"</span><span class="p">)</span>
<span class="c1"># both also implement .each, which yield chunks</span>
<span class="n">response</span><span class="p">.</span><span class="nf">body</span><span class="p">.</span><span class="nf">each</span> <span class="p">{</span> <span class="o">|</span><span class="n">chunk</span><span class="o">|</span> <span class="n">handle_chunk</span><span class="p">(</span><span class="n">chunk</span><span class="p">)</span> <span class="p">}</span>

<span class="c1"># other options have their own bespoke "read in chunks" callback</span>

<span class="c1"># excon</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s2">"/path/to/file"</span><span class="p">,</span> <span class="s2">"wb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="n">streamer</span> <span class="o">=</span> <span class="nb">lambda</span> <span class="k">do</span> <span class="o">|</span><span class="n">chunk</span><span class="p">,</span> <span class="n">remaining_bytes</span><span class="p">,</span> <span class="n">total_bytes</span><span class="o">|</span>
    <span class="n">f</span> <span class="o">&lt;&lt;</span> <span class="n">chunk</span>
  <span class="k">end</span>
  <span class="no">Excon</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">big_file_url</span><span class="p">,</span> <span class="ss">:response_block</span> <span class="o">=&gt;</span> <span class="n">streamer</span><span class="p">)</span>
<span class="k">end</span>

<span class="c1"># faraday</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s2">"/path/to/file"</span><span class="p">,</span> <span class="s2">"wb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="no">Faraday</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">big_file_url</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">req</span><span class="o">|</span>
    <span class="n">req</span><span class="p">.</span><span class="nf">options</span><span class="p">.</span><span class="nf">on_data</span> <span class="k">do</span> <span class="o">|</span><span class="n">chunk</span><span class="p">,</span> <span class="n">overall_received_bytes</span><span class="p">,</span> <span class="n">env</span><span class="o">|</span>
      <span class="n">f</span> <span class="o">&lt;&lt;</span> <span class="n">chunk</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># httparty</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s2">"/path/to/file"</span><span class="p">,</span> <span class="s2">"wb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="no">HTTParty</span><span class="p">(</span><span class="n">big_file_url</span><span class="p">,</span> <span class="ss">stream_body: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">fragment</span><span class="o">|</span>
    <span class="k">if</span> <span class="n">fragment</span><span class="p">.</span><span class="nf">code</span> <span class="o">==</span> <span class="mi">200</span> <span class="c1"># yup, you gotta test fragments….</span>
      <span class="n">f</span> <span class="o">&lt;&lt;</span> <span class="n">fragment</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># curb</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s2">"/path/to/file"</span><span class="p">,</span> <span class="s2">"wb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="n">c</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Easy</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">big_file_url</span><span class="p">)</span>
  <span class="n">c</span><span class="p">.</span><span class="nf">on_body</span> <span class="p">{</span><span class="o">|</span><span class="n">data</span><span class="o">|</span> <span class="n">f</span> <span class="o">&lt;&lt;</span> <span class="n">data</span><span class="p">}</span>
  <span class="n">c</span><span class="p">.</span><span class="nf">perform</span>
<span class="k">end</span>

<span class="c1"># net-http</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="s2">"/path/to/file"</span><span class="p">,</span> <span class="s2">"wb"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="n">u</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">big_file_url</span><span class="p">)</span>
  <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">u</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">u</span><span class="p">.</span><span class="nf">port</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">u</span><span class="p">)</span>
  <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">response</span><span class="o">|</span>
    <span class="n">response</span><span class="p">.</span><span class="nf">read_body</span> <span class="k">do</span> <span class="o">|</span><span class="n">chunk</span><span class="o">|</span>
       <span class="n">f</span> <span class="o">&lt;&lt;</span> <span class="n">chunk</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>And this is where the first usability differences are noticeable: 1) <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> and <a href="https://github.com/httprb/http">httprb</a> both make the task of dealing with response body chunking a bit more intuitive than the rest, which rely on “same but different” blocks; 2) <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> provides a few shortcuts to parse well-known mime-types into ruby objects (<a href="https://github.com/lostisland/faraday">faraday</a> does the same for JSON via some middleware boilerplate); 3) ruby stdlib mitigates some of the shortcomings of other libraries by supporting decoding of common mime types natively (<code class="language-plaintext highlighter-rouge">JSON.parse(response.body)</code> for strings works well enough).</p>

<h4 id="request">Request</h4>

<p>Another common feature that all HTTP clients support is requests with other HTTP verbs, such as POST requests. This usually requires support for passing the request body, as well as the setting headers (a feature which is also useful for GET requests btw) in a user-friendly manner.</p>

<p>In order to use another HTTP verb, most libraries will rely on a same-named downcased method, while relying on more or less verbose options to pass extra parameters:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># use-cases:</span>
<span class="c1"># 1. GET with the "x-api-token: SECRET" header</span>
<span class="c1"># 2. GET with the "?foo=bar" query param in the request URL</span>
<span class="c1"># 3. POST the "data" string</span>
<span class="c1"># 4. POST the "foo&amp;bar" urlencoded form data</span>
<span class="c1"># 5. POST the '{"foo":"bar"}' JSON payload</span>
<span class="c1"># 6. POST the '{"foo":"bar"}' JSON payload with the "x-api-token: SECRET" header</span>
<span class="n">get_uri</span> <span class="o">=</span> <span class="s2">"https://httpbin.org/get"</span>
<span class="n">post_uri</span> <span class="o">=</span> <span class="s2">"https://httpbin.org/post"</span>

<span class="c1"># httpx</span>
<span class="c1"># 1.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="ss">headers: </span><span class="p">{</span> <span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span> <span class="p">})</span>
<span class="c1"># 2.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="ss">params: </span><span class="p">{</span> <span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span> <span class="p">})</span>
<span class="c1"># 3.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="s2">"data"</span><span class="p">)</span> <span class="c1"># defaults to "application/octet-stream" content-type</span>
<span class="c1"># 4.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span> <span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span> <span class="p">})</span>
<span class="c1"># 5.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">json: </span><span class="p">{</span> <span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span> <span class="p">})</span>
<span class="c1"># 6.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">headers: </span><span class="p">{</span> <span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span> <span class="p">},</span> <span class="ss">json: </span><span class="p">{</span> <span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span> <span class="p">})</span>

<span class="c1"># excon</span>
<span class="c1"># 1.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="ss">headers: </span><span class="p">{</span> <span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span> <span class="p">})</span>
<span class="c1"># 2.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="ss">query: </span><span class="p">{</span> <span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span> <span class="p">})</span>
<span class="c1"># 3.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="s2">"data"</span><span class="p">)</span> <span class="c1"># does not specify content type</span>

<span class="c1"># excon does not provide shortcuts for encoding the request body</span>
<span class="c1"># in well known encoding formats, so DIY.</span>
<span class="c1"># 4.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">:body</span> <span class="o">=&gt;</span> <span class="no">URI</span><span class="p">.</span><span class="nf">encode_www_form</span><span class="p">(</span><span class="s1">'foo'</span> <span class="o">=&gt;</span> <span class="s1">'bar'</span><span class="p">),</span> <span class="ss">:headers</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="s2">"Content-Type"</span> <span class="o">=&gt;</span> <span class="s2">"application/x-www-form-urlencoded"</span> <span class="p">})</span>
<span class="c1"># 5.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">:body</span> <span class="o">=&gt;</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="s1">'foo'</span> <span class="o">=&gt;</span> <span class="s1">'bar'</span><span class="p">),</span> <span class="ss">:headers</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="s2">"Content-Type"</span> <span class="o">=&gt;</span> <span class="s2">"application/json"</span> <span class="p">})</span>
<span class="c1"># 6.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">:body</span> <span class="o">=&gt;</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">(</span><span class="s1">'foo'</span> <span class="o">=&gt;</span> <span class="s1">'bar'</span><span class="p">),</span> <span class="ss">:headers</span> <span class="o">=&gt;</span> <span class="p">{</span> <span class="s2">"Content-Type"</span> <span class="o">=&gt;</span> <span class="s2">"application/json"</span><span class="p">,</span> <span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span> <span class="p">})</span>

<span class="c1"># faraday</span>
<span class="c1"># 1.</span>
<span class="c1"># starting on the wrong foot, here's a 2nd argument that needs to be nil...</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="kp">nil</span><span class="p">,</span> <span class="p">{</span> <span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span> <span class="p">})</span>
<span class="c1"># 2.</span>
<span class="c1"># depending on whether GET or POST, the 3rd argument is either transformed</span>
<span class="c1"># into a URL query string or POST form body</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="p">{</span> <span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span> <span class="p">},</span> <span class="p">{</span> <span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span> <span class="p">})</span>
<span class="c1"># 3.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="s2">"data"</span><span class="p">)</span> <span class="c1"># defaults to application/x-www-form-urlencoded content-type</span>
<span class="c1"># 4.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span> <span class="c1"># can encode ruby objects to default</span>
<span class="c1"># 5.</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s1">'https://httpbin.org'</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="c1"># json encoder supported, again via more middleware boilerplate</span>
  <span class="n">f</span><span class="p">.</span><span class="nf">request</span> <span class="ss">:json</span>
<span class="k">end</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="s2">"/post"</span><span class="p">,</span> <span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span>
<span class="c1"># 6.</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="s2">"/post"</span><span class="p">,</span> <span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">},</span> <span class="p">{</span> <span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span> <span class="p">})</span>

<span class="c1"># HTTPrb</span>
<span class="c1"># 1.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">headers</span><span class="p">(</span><span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span><span class="p">).</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">)</span>
<span class="c1"># 2.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="ss">params: </span><span class="p">{</span> <span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span> <span class="p">})</span>
<span class="c1"># 3.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="s2">"data"</span><span class="p">)</span> <span class="c1"># does not specify content type...</span>
<span class="c1"># 4.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span>
<span class="c1"># 5.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">json: </span><span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span>
<span class="c1"># 6.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">headers</span><span class="p">(</span><span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span><span class="p">).</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">json: </span><span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span>

<span class="c1"># httparty</span>
<span class="c1"># 1.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="ss">headers: </span><span class="p">{</span> <span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span> <span class="p">})</span>
<span class="c1"># 2.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="ss">query: </span><span class="p">{</span> <span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span> <span class="p">})</span>
<span class="c1"># 3.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="s2">"data"</span><span class="p">)</span> <span class="c1"># defaults to application/x-www-form-urlencoded content-type</span>
<span class="c1"># 4.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span> <span class="c1"># can encode ruby objects to default as well</span>
<span class="c1"># 5.</span>
<span class="c1"># no shortcut provided for json, DIY</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">}),</span> <span class="ss">headers: </span><span class="p">{</span><span class="s2">"content-type"</span> <span class="o">=&gt;</span> <span class="s2">"application/json"</span><span class="p">})</span>
<span class="c1"># 6.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">}),</span> <span class="ss">headers: </span><span class="p">{</span><span class="s2">"x-api-token"</span> <span class="o">=&gt;</span> <span class="s2">"SECRET"</span><span class="p">,</span> <span class="s2">"content-type"</span> <span class="o">=&gt;</span> <span class="s2">"application/json"</span><span class="p">})</span>

<span class="c1"># curb</span>
<span class="c1"># 1.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">get_uri</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="n">http</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s1">'x-api-token'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'x-api-token'</span>
<span class="k">end</span>
<span class="c1"># 2.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="no">Curl</span><span class="p">.</span><span class="nf">urlalize</span><span class="p">(</span><span class="n">get_uri</span><span class="p">,</span> <span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">}))</span>
<span class="c1"># 3.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="s2">"data"</span><span class="p">)</span> <span class="c1"># defaults to application/x-www-form-urlencoded content-type, like curl</span>
<span class="c1"># 4.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span>
<span class="c1"># 5.</span>
<span class="c1"># needs block-mode to add headers...</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">}))</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="n">http</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"content-type"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"application/json"</span>
<span class="k">end</span>
<span class="c1"># 6.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">}))</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="c1"># one of these for each new header you'll need to add...</span>
  <span class="n">http</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"content-type"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"application/json"</span>
  <span class="n">http</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"x-api-token"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"SECRET"</span>
<span class="k">end</span>

<span class="c1"># net-http</span>
<span class="n">get_uri</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">get_uri</span><span class="p">)</span>

<span class="c1"># 1. and 2.</span>
<span class="c1"># net-http does not provide query params API, you have to use URI for that</span>
<span class="n">get_uri</span><span class="p">.</span><span class="nf">query</span> <span class="o">=</span> <span class="no">URI</span><span class="p">.</span><span class="nf">www_encode_form</span><span class="p">({</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span>
<span class="c1"># and now you can do the request...</span>

<span class="n">http</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">get_uri</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">get_uri</span><span class="p">.</span><span class="nf">port</span><span class="p">)</span>
<span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">get_uri</span><span class="p">.</span><span class="nf">request_uri</span><span class="p">)</span>
<span class="n">request</span><span class="p">[</span><span class="s2">"x-api-token"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"SECRET"</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>

<span class="c1"># 3.</span>
<span class="n">post_uri</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">post_uri</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="s2">"data"</span><span class="p">)</span>  <span class="c1"># defaults to application/x-www-form-urlencoded content-type</span>

<span class="c1"># 4.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">post_form</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="p">{</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">})</span>

<span class="c1"># 5.</span>
<span class="n">http</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">post_uri</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">post_uri</span><span class="p">.</span><span class="nf">port</span><span class="p">)</span>
<span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Post</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">post_uri</span><span class="p">.</span><span class="nf">request_uri</span><span class="p">)</span>
<span class="n">request</span><span class="p">[</span><span class="s2">"content-type"</span><span class="p">]</span> <span class="o">=</span> <span class="s2">"application/json"</span>
<span class="n">request</span><span class="p">.</span><span class="nf">body</span> <span class="o">=</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="s2">"foo"</span> <span class="o">=&gt;</span> <span class="s2">"bar"</span><span class="p">}</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>

<span class="c1"># and let's forget the last, I'm tired of writing net-http examples. you get the picture from the above</span>
</code></pre></div></div>

<p>This is not exhaustive, but it does tell one a few things: 1) <a href="https://github.com/ruby/net-http/">net-http</a> starts showing how verbose can it get; 2) For most options, API shortcuts for encoding the request body are quite limited beyond “x-www-form-urlencoded”; 3) some clients get a bit too creative with the usage of blocks; 4) <a href="https://github.com/lostisland/faraday">faraday</a> positional arguments make it a bit confusing to do simple requests. 5) <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> and <a href="https://github.com/httprb/http">httprb</a> manage to achieve all examples in concise one-liners; 6) As in the previous section, ruby has quite a lot of stdlib support to circumvent some of these shortcomings (via <a href="https://github.com/ruby/uri">uri</a> or <a href="https://github.com/flori/json">json</a> bundled gems).</p>

<h4 id="multipart">Multipart</h4>

<p>Another common and widely supported encoding format for upload files is <code class="language-plaintext highlighter-rouge">multipart/form-data</code>, aka <a href="https://www.rfc-editor.org/rfc/rfc1867">Multipart</a>. While a common and old standard, even supported by browsers for form submission, it’s surprising to find that some HTTP clients either don’t implement, require a separate dependency for it, or implement it partially. Let’s demonstrate:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># please:</span>
<span class="c1"># 1. POST a "document.jpeg" file</span>
<span class="c1"># 2. POST a "selfie.mp4" file</span>
<span class="c1"># 3. POST a "document.jpeg" file and a "selfie.mp4" file</span>
<span class="c1"># 4. POST a "document.jpeg" file, a "selfie.mp4" file, and a "name=Joe" text field</span>
<span class="c1"># 5. POST a "document.jpeg" file, a "selfie.mp4" file, and a "{"name": "Joe", "age": 20}" JSON "data" field</span>
<span class="n">post_uri</span> <span class="o">=</span> <span class="s2">"https://httpbin.org/post"</span>
<span class="n">doc_path</span> <span class="o">=</span> <span class="s2">"/path/to/document.jpeg"</span>
<span class="n">selfie_path</span> <span class="o">=</span> <span class="s2">"/path/to/selfie.mp4"</span>

<span class="c1"># httpx</span>
<span class="c1"># 1.</span>
<span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span> <span class="ss">document: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">)</span> <span class="p">})</span>
<span class="c1"># multipart payload</span>
<span class="c1"># single part with name="document", filename="document.jpg" and content-type=image/jpeg</span>

<span class="c1"># 2.</span>
<span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span> <span class="ss">selfie: </span><span class="no">Pathname</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">)</span> <span class="p">})</span> <span class="c1"># also supports pathnames</span>
<span class="c1"># multipart payload</span>
<span class="c1"># single part with name="selfie", filename="selfie.mp4" and content-type=video/mp4</span>

<span class="c1"># 3.</span>
<span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span> <span class="ss">document: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span> <span class="ss">selfie: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">)</span> <span class="p">})</span>
<span class="c1"># multipart payload</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=video/mp4</span>

<span class="c1"># 4.</span>
<span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span> <span class="ss">document: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span> <span class="ss">selfie: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">),</span> <span class="ss">name: </span><span class="s2">"Joe"</span> <span class="p">})</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=video/mp4</span>
<span class="c1"># third part with name="name", content-type=text/plain</span>

<span class="c1"># 5.</span>
<span class="no">HTTPX</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span> <span class="ss">document: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span> <span class="ss">selfie: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">),</span> <span class="ss">data: </span><span class="p">{</span> <span class="ss">content_type: </span><span class="s2">"application/json"</span><span class="p">,</span> <span class="ss">body: </span><span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="ss">name: </span><span class="s2">"Joe"</span><span class="p">,</span> <span class="ss">age: </span><span class="mi">20</span><span class="p">})</span> <span class="p">}})</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=video/mp4</span>
<span class="c1"># third part with name="data", content-type=application/json</span>


<span class="c1"># excon</span>
<span class="c1"># does not support multipart requests</span>

<span class="c1"># faraday</span>
<span class="c1"># does not support multipart requests OOTB</span>
<span class="c1"># requires separate faraday-multipart extension gem for that: https://github.com/lostisland/faraday-multipart</span>
<span class="nb">require</span> <span class="s1">'faraday'</span>
<span class="nb">require</span> <span class="s1">'faraday/multipart'</span>

<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="n">f</span><span class="p">.</span><span class="nf">request</span> <span class="ss">:multipart</span>
<span class="k">end</span>
<span class="c1"># 1.</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="p">{</span><span class="ss">document: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">FilePart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span> <span class="s1">'image/jpeg'</span><span class="p">)</span> <span class="p">})</span>
<span class="c1"># requires using a specific faraday-multipart class for file parts</span>
<span class="c1"># mime types need to be known ahead of time!</span>

<span class="c1"># 2.</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="p">{</span><span class="ss">selfie: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">FilePart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie</span><span class="p">),</span> <span class="s1">'video/mp4'</span><span class="p">)</span> <span class="p">})</span>

<span class="c1"># 3.</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="p">{</span>
  <span class="ss">document: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">FilePart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span> <span class="s1">'image/jpeg'</span><span class="p">),</span>
  <span class="ss">selfie: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">FilePart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie</span><span class="p">),</span> <span class="s1">'video/mp4'</span><span class="p">)</span>
<span class="p">})</span>

<span class="c1"># 4.</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="p">{</span>
  <span class="ss">document: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">FilePart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span> <span class="s1">'image/jpeg'</span><span class="p">),</span>
  <span class="ss">selfie: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">FilePart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie</span><span class="p">),</span> <span class="s1">'video/mp4'</span><span class="p">),</span>
  <span class="ss">name: </span><span class="s2">"Joe"</span>
<span class="p">})</span>
<span class="c1"># when it comes to text/plain, you can just pass a string</span>

<span class="c1"># 5.</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="p">{</span>
  <span class="ss">document: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">FilePart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span> <span class="s1">'image/jpeg'</span><span class="p">),</span>
  <span class="ss">selfie: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">FilePart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie</span><span class="p">),</span> <span class="s1">'video/mp4'</span><span class="p">),</span>
  <span class="ss">data: </span><span class="no">Faraday</span><span class="o">::</span><span class="no">Multipart</span><span class="o">::</span><span class="no">ParamPart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
    <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="ss">name: </span><span class="s2">"Joe"</span><span class="p">,</span> <span class="ss">age: </span><span class="mi">20</span><span class="p">}),</span>
    <span class="s1">'application/json'</span>
  <span class="p">)</span>
<span class="p">})</span>
<span class="c1"># separate custom part class for other encodings!</span>

<span class="c1"># HTTPrb</span>
<span class="c1"># does not support multipart OOTB</span>
<span class="c1"># requires separate "http/form_data" gem: https://github.com/httprb/form_data</span>
<span class="c1"># 1.</span>
<span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span> <span class="ss">document: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">doc_path</span><span class="p">,</span> <span class="ss">content_type: </span><span class="s2">"image/jpeg"</span><span class="p">)</span> <span class="p">})</span>
<span class="c1"># requires using a specific http/form_data class for file parts</span>
<span class="c1"># mime types need to be known ahead of time!</span>

<span class="c1"># 2.</span>
<span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span> <span class="ss">selfie: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">,</span> <span class="ss">content_type: </span><span class="s2">"video/mp4"</span><span class="p">)</span> <span class="p">})</span>

<span class="c1"># 3.</span>
<span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span>
  <span class="ss">document: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">doc_path</span><span class="p">,</span> <span class="ss">content_type: </span><span class="s2">"image/jpeg"</span><span class="p">),</span>
  <span class="ss">selfie: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">,</span> <span class="ss">content_type: </span><span class="s2">"video/mp4"</span><span class="p">)</span>
<span class="p">})</span>

<span class="c1"># 4.</span>
<span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span>
  <span class="ss">document: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">doc_path</span><span class="p">,</span> <span class="ss">content_type: </span><span class="s2">"image/jpeg"</span><span class="p">),</span>
  <span class="ss">selfie: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">,</span> <span class="ss">content_type: </span><span class="s2">"video/mp4"</span><span class="p">),</span>
  <span class="ss">name: </span><span class="s2">"Joe"</span>
<span class="p">})</span>
<span class="c1"># encodes strings as text/plain</span>

<span class="c1"># 5.</span>
<span class="no">HTTP</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">form: </span><span class="p">{</span>
  <span class="ss">document: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">doc_path</span><span class="p">,</span> <span class="ss">content_type: </span><span class="s2">"image/jpeg"</span><span class="p">),</span>
  <span class="ss">selfie: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">,</span> <span class="ss">content_type: </span><span class="s2">"video/mp4"</span><span class="p">),</span>
  <span class="ss">name: </span><span class="no">HTTP</span><span class="o">::</span><span class="no">FormData</span><span class="o">::</span><span class="no">Part</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="ss">name: </span><span class="s2">"Joe"</span><span class="p">,</span> <span class="ss">age: </span><span class="mi">20</span><span class="p">}),</span> <span class="ss">content_type: </span><span class="s1">'application/json'</span><span class="p">)</span>
<span class="p">})</span>
<span class="c1"># separate custom part class for other encodings!</span>


<span class="c1"># httparty</span>
<span class="c1"># some built-in multipart capabilities in place</span>

<span class="c1"># 1.</span>
<span class="no">HTTParty</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="p">{</span> <span class="ss">document: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">)</span> <span class="p">})</span>
<span class="c1"># multipart payload</span>
<span class="c1"># single part with name="document", filename="document.jpg" and content-type=image/jpeg</span>

<span class="c1"># 2.</span>
<span class="no">HTTParty</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="p">{</span> <span class="ss">selfie: </span><span class="no">File</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">)</span> <span class="p">})</span>
<span class="c1"># multipart payload</span>
<span class="c1"># single part with name="selfie", filename="selfie.mp4" and content-type=application/mp4</span>
<span class="c1"># The content-type is wrong!</span>

<span class="c1"># 3.</span>
<span class="no">HTTParty</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="p">{</span>
  <span class="ss">document: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span>
  <span class="ss">selfie: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">)</span>
<span class="p">})</span>
<span class="c1"># multipart payload</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=application/mp4 (Wrong!)</span>

<span class="c1"># 4.</span>
<span class="no">HTTParty</span><span class="p">.</span><span class="nf">post</span><span class="p">(</span><span class="n">post_uri</span><span class="p">,</span> <span class="ss">body: </span><span class="p">{</span>
  <span class="ss">document: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">),</span>
  <span class="ss">selfie: </span><span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">),</span>
  <span class="ss">name: </span><span class="s2">"Joe"</span>
<span class="p">})</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=application/mp4 (Wrong!)</span>
<span class="c1"># third part with name="name", content-type=text/plain</span>

<span class="c1"># 5.</span>
<span class="c1"># passing a custom json part is not supported!</span>

<span class="c1"># curb</span>
<span class="c1"># requires more calls to set it up</span>
<span class="c1"># 1.</span>
<span class="n">c</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Easy</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">post_uri</span><span class="p">)</span>
<span class="n">c</span><span class="p">.</span><span class="nf">multipart_form_post</span> <span class="o">=</span> <span class="kp">true</span>
<span class="n">c</span><span class="p">.</span><span class="nf">http_post</span><span class="p">(</span><span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">file</span><span class="p">(</span><span class="s1">'document'</span><span class="p">,</span> <span class="n">doc_path</span><span class="p">))</span>
<span class="c1"># multipart payload</span>
<span class="c1"># single part with name="document", filename="document.jpg" and content-type=image/jpeg</span>

<span class="c1"># 2.</span>
<span class="n">c</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Easy</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">post_uri</span><span class="p">)</span>
<span class="n">c</span><span class="p">.</span><span class="nf">multipart_form_post</span> <span class="o">=</span> <span class="kp">true</span>
<span class="n">c</span><span class="p">.</span><span class="nf">http_post</span><span class="p">(</span><span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">file</span><span class="p">(</span><span class="s1">'selfie'</span><span class="p">,</span> <span class="n">selfie_path</span><span class="p">))</span>
<span class="c1"># multipart payload</span>
<span class="c1"># single part with name="selfie", filename="selfie.mp4" and content-type=application/octet-stream</span>
<span class="c1"># this mime-type is wrong!</span>

<span class="c1"># 3.</span>
<span class="n">c</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Easy</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">post_uri</span><span class="p">)</span>
<span class="n">c</span><span class="p">.</span><span class="nf">multipart_form_post</span> <span class="o">=</span> <span class="kp">true</span>
<span class="n">c</span><span class="p">.</span><span class="nf">http_post</span><span class="p">(</span>
  <span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">file</span><span class="p">(</span><span class="s1">'document'</span><span class="p">,</span> <span class="n">doc_path</span><span class="p">),</span>
  <span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">file</span><span class="p">(</span><span class="s1">'selfie'</span><span class="p">,</span> <span class="n">selfie_path</span><span class="p">))</span>
<span class="c1"># multipart payload</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=application/octet-stream (Wrong!)</span>

<span class="c1"># 4.</span>
<span class="n">c</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Easy</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">post_uri</span><span class="p">)</span>
<span class="n">c</span><span class="p">.</span><span class="nf">multipart_form_post</span> <span class="o">=</span> <span class="kp">true</span>
<span class="n">c</span><span class="p">.</span><span class="nf">http_post</span><span class="p">(</span>
  <span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">file</span><span class="p">(</span><span class="s1">'document'</span><span class="p">,</span> <span class="n">doc_path</span><span class="p">),</span>
  <span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">file</span><span class="p">(</span><span class="s1">'selfie'</span><span class="p">,</span> <span class="n">selfie_path</span><span class="p">),</span>
  <span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">content</span><span class="p">(</span><span class="s1">'name'</span><span class="p">,</span> <span class="s2">"Joe"</span><span class="p">))</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=application/octet-stream (Wrong!)</span>
<span class="c1"># third part with name="name", content-type=text/plain</span>

<span class="c1"># 5.</span>
<span class="n">c</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Easy</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">post_uri</span><span class="p">)</span>
<span class="n">c</span><span class="p">.</span><span class="nf">multipart_form_post</span> <span class="o">=</span> <span class="kp">true</span>
<span class="n">c</span><span class="p">.</span><span class="nf">http_post</span><span class="p">(</span>
  <span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">file</span><span class="p">(</span><span class="s1">'document'</span><span class="p">,</span> <span class="n">doc_path</span><span class="p">),</span>
  <span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">file</span><span class="p">(</span><span class="s1">'selfie'</span><span class="p">,</span> <span class="n">selfie_path</span><span class="p">),</span>
  <span class="no">Curl</span><span class="o">::</span><span class="no">PostField</span><span class="p">.</span><span class="nf">content</span><span class="p">(</span><span class="s1">'data'</span><span class="p">,</span> <span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="ss">name: </span><span class="s2">"Joe"</span><span class="p">,</span> <span class="ss">age: </span><span class="mi">20</span><span class="p">}),</span> <span class="s2">"application/json"</span><span class="p">))</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=application/octet-stream (Wrong!)</span>
<span class="c1"># third part with name="data", content-type=application/json</span>

<span class="c1"># net-http</span>
<span class="c1"># does not support multipart requests</span>
<span class="c1"># you can use the previously mentioned multipart-post gem</span>
<span class="c1"># https://github.com/socketry/multipart-post</span>
<span class="nb">require</span> <span class="s2">"net/http"</span>
<span class="nb">require</span> <span class="s1">'net/http/post/multipart'</span>

<span class="n">url</span> <span class="o">=</span> <span class="no">URI</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">post_uri</span><span class="p">)</span>


<span class="c1"># 1.</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">file</span><span class="o">|</span>
  <span class="n">req</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Post</span><span class="o">::</span><span class="no">Multipart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
    <span class="n">url</span><span class="p">.</span><span class="nf">path</span><span class="p">,</span>
    <span class="s2">"document"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">file</span><span class="p">,</span> <span class="s2">"image/jpeg"</span><span class="p">)</span>
  <span class="p">)</span>
  <span class="n">res</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">,</span> <span class="ss">use_ssl: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
    <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">req</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
<span class="c1"># uses multipart-post provided class to build part</span>
<span class="c1"># mime type needs to be known ahead of time!</span>


<span class="c1"># 2.</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">file</span><span class="o">|</span>
  <span class="n">req</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Post</span><span class="o">::</span><span class="no">Multipart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
    <span class="n">url</span><span class="p">.</span><span class="nf">path</span><span class="p">,</span>
    <span class="s2">"selfie"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(,</span> <span class="s2">"video/mp4"</span><span class="p">)</span>
  <span class="p">)</span>
  <span class="n">res</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">,</span> <span class="ss">use_ssl: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
    <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">req</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># 3.</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">doc_file</span><span class="o">|</span>
  <span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">selfie_file</span><span class="o">|</span>
    <span class="n">req</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Post</span><span class="o">::</span><span class="no">Multipart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
      <span class="n">url</span><span class="p">.</span><span class="nf">path</span><span class="p">,</span>
      <span class="s2">"document"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">doc_file</span><span class="p">,</span> <span class="s2">"image/jpeg"</span><span class="p">),</span>
      <span class="s2">"selfie"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_file</span><span class="p">,</span> <span class="s2">"video/mp4"</span><span class="p">)</span>
    <span class="p">)</span>
    <span class="n">res</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">,</span> <span class="ss">use_ssl: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
      <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">req</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># 4.</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">doc_file</span><span class="o">|</span>
  <span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">selfie_file</span><span class="o">|</span>
    <span class="n">req</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Post</span><span class="o">::</span><span class="no">Multipart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
      <span class="n">url</span><span class="p">.</span><span class="nf">path</span><span class="p">,</span>
      <span class="s2">"document"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">doc_file</span><span class="p">,</span> <span class="s2">"image/jpeg"</span><span class="p">),</span>
      <span class="s2">"selfie"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_file</span><span class="p">,</span> <span class="s2">"video/mp4"</span><span class="p">),</span>
      <span class="s2">"name"</span> <span class="o">=&gt;</span> <span class="s2">"Joe"</span>
    <span class="p">)</span>
    <span class="n">res</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">,</span> <span class="ss">use_ssl: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
      <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">req</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
<span class="c1"># text inputs will be encoded as text/plain</span>

<span class="c1"># 5.</span>
<span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">doc_path</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">doc_file</span><span class="o">|</span>
  <span class="no">File</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="n">selfie_path</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">selfie_file</span><span class="o">|</span>
    <span class="n">req</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Post</span><span class="o">::</span><span class="no">Multipart</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
      <span class="n">url</span><span class="p">.</span><span class="nf">path</span><span class="p">,</span>
      <span class="s2">"document"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">doc_file</span><span class="p">,</span> <span class="s2">"image/jpeg"</span><span class="p">),</span>
      <span class="s2">"selfie"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">selfie_file</span><span class="p">,</span> <span class="s2">"video/mp4"</span><span class="p">),</span>
      <span class="s2">"data"</span> <span class="o">=&gt;</span> <span class="no">UploadIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">StringIO</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">JSON</span><span class="p">.</span><span class="nf">dump</span><span class="p">({</span><span class="ss">name: </span><span class="s2">"Joe"</span><span class="p">,</span> <span class="ss">age: </span><span class="mi">20</span><span class="p">})),</span> <span class="s2">"application/json"</span><span class="p">)</span>
    <span class="p">)</span>
    <span class="n">res</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">url</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">url</span><span class="p">.</span><span class="nf">port</span><span class="p">,</span> <span class="ss">use_ssl: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
      <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">req</span><span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
<span class="c1"># kinda works....</span>
<span class="c1"># first part with name="document", filename="document.jpg" and content-type=image/jpeg</span>
<span class="c1"># second part with name="selfie", filename="selfie.mp4" and content-type=application/octet-stream (Wrong!)</span>
<span class="c1"># third part with name="data", content-type=application/json...</span>
<span class="c1"># but also filename=local.path, which is wrong!!!</span>

</code></pre></div></div>

<p>As mentioned earlier, multipart encoding support across our researched HTTP clients is quite… non-standardized. <a href="https://github.com/excon/excon">excon</a>, <a href="https://github.com/lostisland/faraday">faraday</a>, <a href="https://github.com/httprb/http">httprb</a> and <a href="https://github.com/ruby/net-http/">net-http</a> do not support it “out-of-the-box”, although in the case of the last 3, there are at least well known “extension gems” adding support for it. In some of these cases, the “parts” need to be passed as instances from a custom class (<code class="language-plaintext highlighter-rouge">Faraday::Multipart::FilePart</code> for <a href="https://github.com/lostisland/faraday">faraday</a>, <code class="language-plaintext highlighter-rouge">HTTP::FormData::File</code> for <a href="https://github.com/httprb/http">httprb</a>, <code class="language-plaintext highlighter-rouge">Curl::PostField</code> for <a href="https://github.com/taf2/curb">curb</a>, <code class="language-plaintext highlighter-rouge">UploadIO</code> for <a href="https://github.com/ruby/net-http/">net-http</a>), which make orchestrating these requests needlessly cumbersome, as the ruby <code class="language-plaintext highlighter-rouge">File</code> object abstraction they wrap should give them all they need (the ones which require a wrapper class for “non-file” parts are puzzling). Still, by either accepting or wrapping <code class="language-plaintext highlighter-rouge">File</code> objects, it indicates that, at best, they probably stream the multipart request payload in chunks (at worst, they may buffer the payload in a file; I didn’t research them that thoroughly).</p>

<p>The feature that is “broken” in most cases is mime type detection; <a href="https://github.com/lostisland/faraday">faraday</a>, <a href="https://github.com/httprb/http">httprb</a> and <a href="https://github.com/ruby/net-http/">net-http</a> extensions pass the “burden” of identifying it to the caller, which now has to figure out how to do it, and orchestrate the whole thing themselves; in other cases (<a href="https://github.com/jnunemaker/httparty">httparty</a>, <a href="https://github.com/taf2/curb">curb</a>, <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a>), this job is outsourced to a separate module or library, but the devil is in the details here: <a href="https://github.com/jnunemaker/httparty">httparty</a> outsources this concern to <a href="https://github.com/discourse/mini_mime">mini_mime</a>, a “lighter” version of the <a href="https://github.com/mime-types/ruby-mime-types/">mime-types</a> gem, which keeps a registry of “file extension to mime types” relations, and as we’ve seen in the snippet above, isn’t accurate for mp4; I don’t know what internally <a href="https://github.com/taf2/curb">curb</a> uses, but it’s not accurate either for mp4 (perhaps, like <a href="https://github.com/typhoeus/typhoeus">typhoeus</a> it integrates with <code class="language-plaintext highlighter-rouge">mime-types</code>?).</p>

<p><a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> works by using one of an array of known ruby gems which detect a file’s mime type by inspecting its <a href="https://en.wikipedia.org/wiki/List_of_file_signatures">magic bytes</a> (the most accurate way to figure it out), and if none is available, it’ll use the <a href="https://en.wikipedia.org/wiki/File_(command)">file</a> command, which requires a shell call, but uses the same approach to detect mime types, and is widely supported and installed. Besides that, it directly supports “low common denominator” interfaces, such as <a href="https://docs.ruby-lang.org/en/3.2/File.html">File</a>, <a href="https://docs.ruby-lang.org/en/3.2/Pathname.html">Pathname</a> or <a href="https://docs.ruby-lang.org/en/3.2/Tempfile.html">Tempfile</a> objects, as “parts” (core and stdlib classes), and therefore requires no custom external class to deal with multipart payloads.</p>

<h4 id="networking">Networking</h4>

<p>When deploying HTTP clients in production setups, you’ll often find yourself trying to minimize the impact of HTTP requests in your business operations. For instance, you’ll want to make sure that you’re reusing connections when possible, in order to minimize the impact of TCP slow starts, or that very slow peers won’t hog you beyond what you consider reasonable. In short, we’re looking at support for persistent connections, and timeouts.</p>

<p>Most of the bunch support persistent connections (via HTTP/1.1 keep-alive), to some extent, in most of cases using ruby blocks to enable “persistent” contexts to users, and in some cases enabling persistent connection support via a client flag. Some clients will only allow persistent connections to be set on only one peer per block, whether others will enable persistence for all requests within a block. Some will not only allow connection re-use, they’ll also support sending multiple requests at the same time, by leveraging HTTP/1.1 features such as pipelining, or by using HTTP/2 multiplexing.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># please download hackernews first 2 pages</span>
<span class="n">uris</span> <span class="o">=</span> <span class="sx">%w[https://news.ycombinator.com/news https://news.ycombinator.com/news?p=2]</span>

<span class="c1"># httpx</span>
<span class="c1"># using HTTP/2 multiplexing or HTTP/1.1 pipelining, depends of peer server support</span>
<span class="n">responses</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="o">*</span><span class="n">uris</span><span class="p">)</span>
<span class="c1"># will make requests concurrently when targetting different peers</span>
<span class="n">responses</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"https://www.google.com"</span><span class="p">,</span> <span class="o">*</span><span class="n">uris</span><span class="p">)</span>
<span class="c1"># also supports persistent blocks</span>
<span class="no">HTTPX</span><span class="p">.</span><span class="nf">wrap</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="c1"># if you need to do sequential requests and want to reuse the connection</span>
  <span class="n">r1</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uris</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span>
  <span class="n">r2</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uris</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
<span class="k">end</span>
<span class="c1"># explicitly setting the client to persistent by default</span>
<span class="c1"># will auto-reconnect when peer server disconnects due to inactivity</span>
<span class="c1"># will perform TLS session resumption when possible</span>
<span class="n">http</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">plugin</span><span class="p">(</span><span class="ss">:persistent</span><span class="p">)</span> <span class="c1"># also sets retries</span>
<span class="n">responses1</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="o">*</span><span class="n">uris</span><span class="p">)</span> <span class="c1"># conns open</span>
<span class="n">responses2</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="o">*</span><span class="n">uris</span><span class="p">)</span> <span class="c1">#conns still open</span>
<span class="n">http</span><span class="p">.</span><span class="nf">close</span> <span class="c1"># in order to explicitly close connections</span>

<span class="c1"># Excon</span>
<span class="c1"># persistent connection set for a single peer</span>
<span class="n">connection</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">,</span> <span class="ss">:persistent</span> <span class="o">=&gt;</span> <span class="kp">true</span><span class="p">)</span>
<span class="c1"># sequential connections</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="ss">path: </span><span class="s2">"/news"</span><span class="p">)</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="ss">path: </span><span class="s2">"/news?page=2"</span><span class="p">)</span>
<span class="c1"># or send them at once using HTTP/1.1 pipelining (if peer supports)</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">requests</span><span class="p">({</span><span class="ss">path: </span><span class="s2">"/news"</span> <span class="p">},</span> <span class="p">{</span><span class="ss">path: </span><span class="s2">"/news?page=2"</span><span class="p">})</span>
<span class="n">connection</span><span class="p">.</span><span class="nf">reset</span> <span class="c1"># don't forget to close them when you don't need them anymore</span>

<span class="c1"># faraday by itself does not support persistent connections, so you'll have to pick</span>
<span class="c1"># adapters which actually support that</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:url</span> <span class="o">=&gt;</span> <span class="s2">"https://news.ycombinator.com"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">f</span><span class="o">|</span>
  <span class="c1"># the net-http-persistenta dapter suports it</span>
  <span class="n">f</span><span class="p">.</span><span class="nf">adapter</span> <span class="ss">:net_http_persistent</span><span class="p">,</span> <span class="ss">pool_size: </span><span class="mi">5</span>
  <span class="c1"># the httpx adapter does too</span>
  <span class="n">f</span><span class="p">.</span><span class="nf">adapter</span> <span class="ss">:httpx</span><span class="p">,</span> <span class="ss">persistent: </span><span class="kp">true</span>
<span class="k">end</span>
<span class="c1"># and now you can re-use</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news"</span><span class="p">)</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news?page=2"</span><span class="p">)</span>
<span class="c1"># faraday also supports a weird parallel api, which only the httpx and typhoeus adapters support, AFAIK</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">:url</span> <span class="o">=&gt;</span> <span class="s2">"https://news.ycombinator.com"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">faraday</span><span class="o">|</span>
  <span class="n">faraday</span><span class="p">.</span><span class="nf">adapter</span> <span class="ss">:httpx</span>
  <span class="c1"># or</span>
  <span class="n">faraday</span><span class="p">.</span><span class="nf">adapter</span> <span class="ss">:typhoeus</span>
<span class="k">end</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">in_parallel</span> <span class="k">do</span>
  <span class="n">response1</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news"</span><span class="p">)</span> <span class="c1"># does not block</span>
  <span class="n">response2</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news?page=2"</span><span class="p">)</span> <span class="c1"># does not block</span>
<span class="k">end</span> <span class="c1"># waits until requests are done</span>
<span class="n">response1</span><span class="p">.</span><span class="nf">body</span><span class="p">.</span><span class="nf">to_s</span> <span class="c1">#=&gt; the response as a ruby String</span>
<span class="n">response2</span><span class="p">.</span><span class="nf">body</span><span class="p">.</span><span class="nf">to_s</span> <span class="c1">#=&gt; the response as a ruby String</span>

<span class="c1"># HTTPrb</span>
<span class="c1"># supports persistent connections on a single peer via block:</span>
<span class="no">HTTP</span><span class="p">.</span><span class="nf">persistent</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="n">r1</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news"</span><span class="p">).</span><span class="nf">to_s</span>
  <span class="c1"># BIG CAVEAT: because httprb delays consuming the response payload,</span>
  <span class="c1"># you have to eager-consume it within the block before the next request</span>
  <span class="c1"># is sent (hence the #to_s calls)</span>
  <span class="n">r2</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news?page=2"</span><span class="p">).</span><span class="nf">to_s</span>
<span class="k">end</span>
<span class="c1"># or initializes the client, and it's up to you when to close</span>
<span class="n">http</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">persistent</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">)</span>
<span class="n">r1</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news"</span><span class="p">).</span><span class="nf">to_s</span> <span class="c1"># remember to eager load!</span>
<span class="n">r2</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news?page=2"</span><span class="p">)</span> <span class="c1"># remember to eager load!</span>
<span class="n">http</span><span class="p">.</span><span class="nf">close</span> <span class="c1"># you forgot to eager load! payloads may have been lost!</span>

<span class="c1"># httparty does not support persistent connections!</span>

<span class="c1"># curb</span>
<span class="c1"># supports persistent and parallel requests, also via HTTP/2,</span>
<span class="c1"># via the curl multi api ruby shim, which feels like writing C, if you ask me</span>
<span class="n">m</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Multi</span><span class="p">.</span><span class="nf">new</span>
<span class="c1"># add a few easy handles</span>
<span class="n">uris</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">url</span><span class="o">|</span>
  <span class="n">responses</span><span class="p">[</span><span class="n">url</span><span class="p">]</span> <span class="o">=</span> <span class="s2">""</span>
  <span class="n">c</span> <span class="o">=</span> <span class="no">Curl</span><span class="o">::</span><span class="no">Easy</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">url</span><span class="p">)</span> <span class="k">do</span><span class="o">|</span><span class="n">curl</span><span class="o">|</span>
    <span class="n">curl</span><span class="p">.</span><span class="nf">follow_location</span> <span class="o">=</span> <span class="kp">true</span>
    <span class="n">curl</span><span class="p">.</span><span class="nf">on_body</span><span class="p">{</span><span class="o">|</span><span class="n">data</span><span class="o">|</span> <span class="n">responses</span><span class="p">[</span><span class="n">url</span><span class="p">]</span> <span class="o">&lt;&lt;</span> <span class="n">data</span><span class="p">;</span> <span class="n">data</span><span class="p">.</span><span class="nf">size</span> <span class="p">}</span>
    <span class="n">curl</span><span class="p">.</span><span class="nf">on_success</span> <span class="p">{</span><span class="o">|</span><span class="n">easy</span><span class="o">|</span> <span class="nb">puts</span> <span class="s2">"success, add more easy handles"</span> <span class="p">}</span>
  <span class="k">end</span>
  <span class="n">m</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="n">c</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">m</span><span class="p">.</span><span class="nf">perform</span>

<span class="c1"># net-http</span>
<span class="c1"># supports persistent connection on a single peer via block</span>
<span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="s2">"news.ycombinator.com"</span><span class="p">,</span> <span class="mi">443</span><span class="p">,</span> <span class="ss">use_ssl: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="c1"># sequential requests only</span>
  <span class="n">responses</span> <span class="o">=</span> <span class="n">uris</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">uri</span><span class="o">|</span>
    <span class="n">req</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="no">URI</span><span class="p">(</span><span class="n">uri</span><span class="p">).</span><span class="nf">request_uri</span><span class="p">)</span>
    <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">req</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>This example shows <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> versatility in terms of options on how to make persistent, and even concurrent usage of connections, obvious, convenient and flexible. It also starts showing the limitations of the alternatives: the ones that actually support persistent connections, only support it on one peer per connection/session object; while all of them support plain sequential keep-alive requests, only <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> and <a href="https://github.com/taf2/curb">curb</a> support concurrent requests via HTTP/2 multiplexing <strong>and</strong> HTTP/1.1 pipelining (<a href="https://github.com/excon/excon">excon</a> only supports the latter); while <a href="https://github.com/lostisland/faraday">faraday</a> itself does not provide the low level networking features, it does build quite the convoluted API on top of them to support persistent connections and parallel requests; while <a href="https://github.com/taf2/curb">curb</a> provides access to the low-level features we all expect <a href="https://curl.se/">curl</a> to support, the API to use them feels almost like a verbatim translation from its C API, which is far from “idiomatic ruby”, and does not look like the easiest code to maintain; and oh well, <a href="https://github.com/ruby/net-http/">net-http</a> keeps looking verbose and limited (although not as limited as <a href="https://github.com/jnunemaker/httparty">httparty</a> in that regard).</p>

<p>The ability to set timeouts is the other key feature required to mitigate service delivery against service throttling, or network congestion. ruby being so adopted in the startup world, where one sometimes needs to run before it can walk, such matters are usually brushed aside during early product delivery, until production incidents happen. Perhaps given this context, it’s not surprising that it took until 2018 for <a href="https://github.com/ruby/net-http/">net-http</a> to introduce a write timeout. But overall, there’s a tendency for ruby HTTP clients to provide timeouts to monitor read/write IO readiness, i.e. “tcp read syscall should not take more than 3 seconds”, instead of a more “cancellation-oriented” approach, “i.e. should receive HTTP response in 3 seconds”. This is a leaky default, as it still exposes clients to <a href="https://www.netscout.com/what-is-ddos/slowloris-attacks">slowloris type of situations</a>: if you set 15 seconds <code class="language-plaintext highlighter-rouge">read_timeout</code> using <a href="https://github.com/ruby/net-http/">net-http</a>, it can still take you minutes to receive a response, if the server sends one byte every 15 seconds. That’s why <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> supports cancellation-type timeouts: <code class="language-plaintext highlighter-rouge">write_timeout</code>, <code class="language-plaintext highlighter-rouge">read_timeout</code>, and <code class="language-plaintext highlighter-rouge">request_timeout</code> options all cover the <strong>total time</strong> taken to write an HTTP request, receive an HTTP response, or the combination of both, respectively.</p>

<p>Some of the clients will also provide extra timeout options to add similar semantics, but they’re usually incompatible with the defaults, or broken when used alongside other unrelated features.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># please download hackernews main page</span>
<span class="n">uri</span> <span class="o">=</span> <span class="s2">"https://news.ycombinator.com/news"</span>

<span class="c1"># httpx</span>
<span class="c1"># 10 seconds to write the request, 30 seconds to receive the response</span>
<span class="c1"># raise `HTTPX::WriteTimeoutError` or `HTTPX::ReadTimeoutError` (both `HTTPX::TimeoutError`)</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">,</span> <span class="ss">timeout: </span><span class="p">{</span> <span class="ss">write_timeout: </span><span class="mi">10</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">30</span> <span class="p">})</span>
<span class="c1"># 3 seconds to fully establish the TLS connection, 40 seconds to send request AND get the response</span>
<span class="c1"># raise `HTTPX::ConnectionTimeoutError` or `HTTPX::RequestTimeoutError` (both `HTTPX::TimeoutError`)</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">,</span> <span class="ss">timeout: </span><span class="p">{</span> <span class="ss">connect_timeout: </span><span class="mi">3</span><span class="p">,</span> <span class="ss">request_timeout: </span><span class="mi">40</span> <span class="p">})</span>

<span class="c1"># excon</span>
<span class="c1"># monitors IO "read" readiness and connection establishment, via `IO.select`</span>
<span class="c1"># raises `Excon::Error::Timeout`</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">,</span> <span class="ss">connect_timeout: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">write_timeout: </span><span class="mi">2</span><span class="p">)</span>

<span class="c1"># faraday</span>
<span class="c1"># timeout mechanism implemented by adapters</span>
<span class="c1"># raises `Faraday::TimeoutError` on error</span>
<span class="c1"># requires construction of a connection object</span>
<span class="c1"># supports a general timeout for the whole request</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">,</span> <span class="ss">request: </span><span class="p">{</span> <span class="ss">timeout: </span><span class="mi">5</span> <span class="p">})</span>
<span class="c1"># support granular timeout options</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">,</span> <span class="ss">request: </span><span class="p">{</span> <span class="ss">open_timeout: </span><span class="mi">5</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">write_timeout: </span><span class="mi">2</span><span class="p">})</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">conn</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news"</span><span class="p">)</span>

<span class="c1"># but what happens if:</span>
<span class="c1"># :timeout is mixed with granular timeouts</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">,</span> <span class="ss">request: </span><span class="p">{</span> <span class="ss">timeout: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">open_timeout: </span><span class="mi">5</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">write_timeout: </span><span class="mi">2</span><span class="p">})</span>
<span class="c1"># answer: :timeout is ignored.</span>

<span class="c1"># timeouts are also set in the adapter</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">,</span> <span class="ss">request: </span><span class="p">{</span> <span class="ss">read_timeout: </span><span class="mi">2</span><span class="p">})</span> <span class="k">do</span> <span class="o">|</span><span class="n">conn</span><span class="o">|</span>
  <span class="n">conn</span><span class="p">.</span><span class="nf">adapter</span> <span class="ss">:httpx</span><span class="p">,</span> <span class="ss">timeout: </span><span class="p">{</span> <span class="ss">read_timeout: </span><span class="mf">0.1</span> <span class="p">}</span>
<span class="k">end</span>
<span class="c1"># `HTTPX::ReadTimeoutError` is raised, i.e. you can set timeouts both for faraday and adapter if the adapter allows it!!</span>

<span class="c1"># HTTPrb</span>
<span class="c1"># monitors IO "read" readiness, via `IO.wait_readable` and `IO.wait_writable` for operation timeouts</span>
<span class="c1"># uses Timeout.timeout for TCP/SSL Socket connect timeout</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">timeout</span><span class="p">(</span><span class="ss">connect: </span><span class="mi">5</span><span class="p">,</span> <span class="ss">write: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">read: </span><span class="mi">10</span><span class="p">).</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span>
<span class="c1"># single timeout for the whole request/response operation</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">timeout</span><span class="p">(</span><span class="mi">10</span><span class="p">).</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span>

<span class="c1"># meaning a bit unclear in the block form: it is in fact a timeout for the whole block, which goes a bit</span>
<span class="c1"># against its "upper bound of how long a request can take" documentation</span>
<span class="no">HTTP</span><span class="p">.</span><span class="nf">timeout</span><span class="p">(</span><span class="mi">5</span><span class="p">).</span><span class="nf">persistent</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="n">r1</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news"</span><span class="p">).</span><span class="nf">to_s</span>
  <span class="n">r2</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/news?page=2"</span><span class="p">).</span><span class="nf">to_s</span>
<span class="k">end</span>

<span class="c1"># httparty</span>
<span class="c1"># supports the same timeouts as the underlying net-http "engine"</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">,</span> <span class="p">{</span> <span class="ss">open_timeout: </span><span class="mi">5</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">2</span><span class="p">,</span> <span class="ss">write_timeout: </span><span class="mi">2</span><span class="p">})</span>
<span class="c1"># has a default_timeout, which will be used everywhere in replacement of</span>
<span class="c1"># open_timeout, read_timeout and write_timeout, which is a bit confusing.</span>
<span class="n">response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">,</span> <span class="p">{</span> <span class="ss">default_timeout: </span><span class="mi">5</span> <span class="p">})</span>

<span class="c1"># curb</span>
<span class="c1"># just uses curl request/response cancellation-based timeout under the hood</span>
<span class="c1"># setting a default timeout</span>
<span class="no">Curl</span><span class="o">::</span><span class="no">Multi</span><span class="p">.</span><span class="nf">default_timeout</span> <span class="o">=</span> <span class="mi">5</span>

<span class="n">res</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="c1"># raises exception if request/response not handled within 5 seconds</span>
  <span class="n">http</span><span class="p">.</span><span class="nf">timeout</span> <span class="o">=</span> <span class="mi">5</span>
<span class="k">end</span>

<span class="c1"># net-http</span>
<span class="c1"># monitors IO "read" readiness, via `IO.wait_readable` and `IO.wait_writable`</span>
<span class="c1"># uses Timeout.timeout for TCP/SSL Socket connect timeout</span>
<span class="n">uri</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">uri</span><span class="p">)</span>
<span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">uri</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">uri</span><span class="p">.</span><span class="nf">port</span><span class="p">,</span> <span class="ss">open_timeout: </span><span class="mi">5</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">5</span><span class="p">,</span> <span class="ss">write_timeout: </span><span class="mi">5</span><span class="p">)</span> <span class="k">do</span>
  <span class="c1"># ...</span>
<span class="k">end</span>
</code></pre></div></div>

<p>To sum up, when in comes to timeouts, there are two libraries, <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> and (in a less granular way) <a href="https://github.com/taf2/curb">curb</a>, which use a cancellation-oriented mechanism towards a more resilient experience, whereas everything else defaults to readiness-based IO APIs which do not completely protected against slow peers overtaking operations beyond what’s acceptable (which means, you still have to build your own mechanism on top of it). Some of the alternatives try to build a more encompassing timeout on top, but, as in the case of <a href="https://github.com/httprb/http">httprb</a>, it results in an inconsistent experience when combined with other features (such as the “persistent” block).</p>

<h4 id="error-handling">Error handling</h4>

<p>In ruby operations, errors can be represented in two ways: a value representing an error, or an exception being raised. HTTP clients may choose one of the two to signal errors in its method calls. For instance, we just talked about timeouts; when a request times out, an HTTP client may raise a “timeout exception” (<a href="https://github.com/typhoeus/typhoeus#handling-http-errors">typhoeus, for example, may use <code class="language-plaintext highlighter-rouge">response.code == 0</code> to signal errors, which is just confusing</a>). Of course, in HTTP requests, not all errors are alike. For instance, 4xx and 5xx response status codes are considered “error responses”, and its up to the client whether to model these as exceptions to be raised, or plain response objects.</p>

<p>Given these options, it’s no wonder that there will be no consensus in how HTTP client handle errors.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">uri_ok</span> <span class="o">=</span> <span class="s2">"https://httpbin.org/status/200"</span>
<span class="n">uri_404</span> <span class="o">=</span> <span class="s2">"https://httpbin.org/status/404"</span>
<span class="n">uri_timeout</span> <span class="o">=</span> <span class="s2">"https://httpbin.org/delay/10"</span>

<span class="c1"># httpx</span>
<span class="c1"># does not automatically raise exception</span>
<span class="n">http</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">with</span><span class="p">(</span><span class="ss">timeout: </span><span class="p">{</span> <span class="ss">request_timeout: </span><span class="mi">5</span> <span class="p">})</span>
<span class="n">ok_response</span><span class="p">,</span> <span class="n">error_response</span><span class="p">,</span> <span class="n">timeout_response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_ok</span><span class="p">,</span> <span class="n">uri_404</span><span class="p">,</span> <span class="n">uri_timeout</span><span class="p">)</span>
<span class="c1"># ok_response is a HTTPX::Response object, with status 200</span>
<span class="c1"># error_response is a HTTPX::Response object, with status 404</span>
<span class="c1"># timeout_response is a HTTPX::ErrorResponse, wrapping the HTTPX::RequestTimeoutError exception</span>
<span class="c1"># .raise_for_status allows for explicit raise</span>

<span class="n">ok_response</span><span class="p">.</span><span class="nf">raise_for_status</span> <span class="c1">#=&gt; 200 response, so does nothing</span>
<span class="n">error_response</span><span class="p">.</span><span class="nf">raise_for_status</span> <span class="c1">#=&gt; raises an HTTPX::HTTPError, which wraps the 404 error response</span>
<span class="n">timeout_response</span><span class="p">.</span><span class="nf">raise_for_status</span> <span class="c1">#=&gt; raises the wrapped exception</span>

<span class="c1"># httpx also allows using pattern matching</span>
<span class="p">[</span><span class="n">ok_response</span><span class="p">,</span> <span class="n">error_response</span><span class="p">,</span> <span class="n">timeout_response</span><span class="p">].</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">response</span><span class="o">|</span>
  <span class="k">case</span> <span class="n">response</span>
  <span class="k">in</span> <span class="p">{</span> <span class="ss">error: </span><span class="n">error</span> <span class="p">}</span>
    <span class="c1"># timeout_response will be here</span>
  <span class="k">in</span> <span class="p">{</span> <span class="ss">status: </span><span class="mi">400</span><span class="o">...</span> <span class="p">}</span>
    <span class="c1"># error_response will be here</span>
  <span class="k">else</span>
    <span class="c1"># ok_response will be here</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># excon</span>
<span class="c1"># returns a plain response for HTTP errors</span>
<span class="n">error_response</span> <span class="o">=</span> <span class="no">Excon</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_404</span><span class="p">)</span>
<span class="n">error_response</span><span class="p">.</span><span class="nf">status</span> <span class="c1">#=&gt; 404</span>
<span class="c1"># raises an exception on timeout</span>
<span class="no">Excon</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_timeout</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">5</span><span class="p">)</span> <span class="c1">#=&gt; raises Excon::Error::Timeout</span>

<span class="c1"># faraday</span>
<span class="c1"># same as excon</span>
<span class="n">error_response</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_404</span><span class="p">)</span>
<span class="n">error_response</span><span class="p">.</span><span class="nf">status</span> <span class="c1">#=&gt; 404</span>
<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">uri_timeout</span><span class="p">,</span> <span class="ss">request: </span><span class="p">{</span> <span class="ss">read_timeout: </span><span class="mi">5</span> <span class="p">})</span>
<span class="n">conn</span><span class="p">.</span><span class="nf">get</span> <span class="c1">#=&gt; raises Faraday::TimeoutError</span>

<span class="c1"># HTTPrb</span>
<span class="c1"># same as excon</span>
<span class="n">http</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">timeout</span><span class="p">(</span><span class="ss">read: </span><span class="mi">5</span><span class="p">)</span>
<span class="n">error_response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_404</span><span class="p">)</span>
<span class="n">error_response</span><span class="p">.</span><span class="nf">status</span> <span class="c1">#=&gt; 404</span>
<span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_timeout</span><span class="p">)</span> <span class="c1">#=&gt; raises HTTP::TimeoutError</span>

<span class="c1"># httparty</span>
<span class="c1"># same as excon, with a twist</span>
<span class="n">error_response</span> <span class="o">=</span> <span class="no">HTTParty</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_404</span><span class="p">,</span> <span class="ss">timeout: </span><span class="mi">5</span><span class="p">)</span>
<span class="n">error_response</span><span class="p">.</span><span class="nf">code</span> <span class="c1">#=&gt; 404</span>
<span class="c1"># does not wrap errors coming from net-http engine</span>
<span class="no">HTTParty</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_timeout</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">5</span><span class="p">)</span> <span class="c1">#=&gt; raises Net::ReadTimeout</span>

<span class="c1"># curb</span>
<span class="no">Curl</span><span class="o">::</span><span class="no">Multi</span><span class="p">.</span><span class="nf">default_timeout</span> <span class="o">=</span> <span class="mi">5</span>
<span class="n">error_response</span> <span class="o">=</span> <span class="no">Curl</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_404</span><span class="p">)</span>
<span class="n">error_response</span><span class="p">.</span><span class="nf">status</span> <span class="c1">#=&gt; "404"</span>
<span class="no">Curl</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="n">uri_timeout</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="n">http</span><span class="p">.</span><span class="nf">timeout</span> <span class="o">=</span> <span class="mi">5</span>
<span class="k">end</span> <span class="c1">#=&gt; raises Curl::Err::TimeoutError</span>

<span class="c1"># net-http</span>
<span class="n">uri_404</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">uri_404</span><span class="p">)</span>
<span class="n">uri_timeout</span> <span class="o">=</span> <span class="no">URI</span><span class="p">(</span><span class="n">uri_timeout</span><span class="p">)</span>
<span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">uri_404</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">uri_404</span><span class="p">.</span><span class="nf">port</span><span class="p">,</span> <span class="ss">use_ssl: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">uri_404</span><span class="p">.</span><span class="nf">request_uri</span><span class="p">)</span>
  <span class="n">error_response</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
  <span class="n">error_response</span><span class="p">.</span><span class="nf">code</span> <span class="c1">#=&gt; "404"</span>
<span class="k">end</span>
<span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="p">.</span><span class="nf">start</span><span class="p">(</span><span class="n">uri_timeout</span><span class="p">.</span><span class="nf">host</span><span class="p">,</span> <span class="n">uri_timeout</span><span class="p">.</span><span class="nf">port</span><span class="p">,</span> <span class="ss">read_timeout: </span><span class="mi">5</span><span class="p">,</span> <span class="ss">use_ssl: </span><span class="kp">true</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">http</span><span class="o">|</span>
  <span class="n">request</span> <span class="o">=</span> <span class="no">Net</span><span class="o">::</span><span class="no">HTTP</span><span class="o">::</span><span class="no">Get</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="n">uri_timeout</span><span class="p">.</span><span class="nf">request_uri</span><span class="p">)</span>
  <span class="n">http</span><span class="p">.</span><span class="nf">request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="k">end</span> <span class="c1">#=&gt; raises Net::ReadTimeout</span>
</code></pre></div></div>

<p>From the examples above, one can see that the approach of most HTTP clients is remarkably consistent: HTTP errors result in plain responses, whereas networking errors result in errors under the HTTP client namespace. The outlier is <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a>, which returns a (different) response object in both cases, that can be “raised on demand”, and where HTTP and networking errors will result in (different) exceptions. This results in (arguably) better semantics and more options for the end user (at the cost of perhaps breaking rubyists expectations, and at least 1 more instruction in order to get the behaviour of other clients).</p>

<h3 id="extensibility">Extensibility</h3>

<p>This is ruby: even if a library was not designed for extensibility, extending it is still possible; monkey-patching is the last resort.</p>

<p>That being said, it’s still good to rely on libraries with extension capabilities. This usually favours composability and code reuse over controlled contracts, and makes it more difficult to have separate patches stepping on each other, when customizing its usage for one’s needs.</p>

<p>Some of our HTTP clients have supported extensions from the “get go”, and even “dogfood” it by implementing some of its internals using the same contracts. Others supported them only much later, and mostly as an “external” interface. And some of them (like <a href="https://github.com/ruby/net-http/">net-http</a>…) just don’t.</p>

<p><a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> comes with a plugin system, which was directly inspired by similar systems found in gems from Jeremy Evans, like <a href="https://github.com/jeremyevans/roda">roda</a> or <a href="https://github.com/jeremyevans/sequel">sequel</a>; and just like the mentioned examples, most features it provides ship as plugins (which means users don’t pay the cost for features they don’t use). For instance, this is how one enables retries:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">http</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">plugin</span><span class="p">(</span><span class="ss">:retries</span><span class="p">)</span>
<span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"https://news.ycombinator.com"</span><span class="p">)</span> <span class="c1"># will retry up to 3 times by default</span>
</code></pre></div></div>

<p>Plugins are essentially modules acting as namespaces for other modules which add functionality to core structures of the library:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">module</span> <span class="nn">MyPlugin</span>
  <span class="k">module</span> <span class="nn">ResponseMethods</span>
    <span class="c1"># adds the method to the response object</span>
    <span class="k">def</span> <span class="nf">get_server_metric</span>
      <span class="vi">@headers</span><span class="p">[</span><span class="s2">"x-server-response-time"</span><span class="p">]</span>
    <span class="k">end</span>
  <span class="k">end</span>

  <span class="k">module</span> <span class="nn">ConnectionMethods</span>
    <span class="k">def</span> <span class="nf">send</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
      <span class="n">start_time</span> <span class="o">=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">now</span>
      <span class="n">request</span><span class="p">.</span><span class="nf">on</span><span class="p">(</span><span class="ss">:response</span><span class="p">)</span> <span class="k">do</span>
        <span class="nb">puts</span> <span class="s2">"this is how much it took: </span><span class="si">#{</span><span class="no">Time</span><span class="p">.</span><span class="nf">now</span> <span class="o">-</span> <span class="n">start_time</span><span class="si">}</span><span class="s2">"</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="n">http</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">plugin</span><span class="p">(</span><span class="no">MyPlugin</span><span class="p">)</span>
<span class="n">resp</span> <span class="o">=</span> <span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"http://internal-domain-with-metrics/this"</span><span class="p">)</span>
<span class="nb">puts</span> <span class="n">resp</span><span class="p">.</span><span class="nf">get_server_metric</span>
</code></pre></div></div>

<p><a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> plugins are also composable, and a <a href="https://honeyryderchuck.gitlab.io/httpx/wiki/Custom-Plugins">topic in itself</a>.</p>

<p>Alternatively, <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> also provides event-based hooks one can register on the session object:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">started</span> <span class="o">=</span> <span class="p">{}</span>
<span class="n">http</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">on_request_started</span> <span class="k">do</span> <span class="o">|</span><span class="n">request</span><span class="o">|</span>
  <span class="n">started</span><span class="p">[</span><span class="n">request</span><span class="p">]</span> <span class="o">=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">now</span>
<span class="k">end</span><span class="p">.</span><span class="nf">on_response_completed</span> <span class="k">do</span> <span class="o">|</span><span class="n">request</span><span class="p">,</span> <span class="n">response</span><span class="o">|</span>
  <span class="nb">puts</span> <span class="s2">"this is how much it took: </span><span class="si">#{</span><span class="no">Time</span><span class="p">.</span><span class="nf">now</span> <span class="o">-</span> <span class="n">started</span><span class="p">[</span><span class="n">request</span><span class="p">]</span><span class="si">}</span><span class="s2">"</span>
<span class="k">end</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"http://internal-domain-with-metrics/this"</span><span class="p">)</span>
</code></pre></div></div>

<p>The difference between both being, event-based hooks are a “high-level” way of intercepting the request/response lifecycle which is easy to learn and use, whereas plugins are more powerful and low-level, but also more involved, and requiring knowledge about <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> internals, to some extent.</p>

<p><a href="https://github.com/excon/excon">excon</a> supports middlewares as extension points, essentially modules defining 2/3 callbacks. It’s relatively simple, and used internally to build features such as following redirects, response decompression, among others. You can define and call it like this:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MyMiddleware</span> <span class="o">&lt;</span> <span class="no">Excon</span><span class="o">::</span><span class="no">Middleware</span><span class="o">::</span><span class="no">Base</span>
  <span class="c1"># can override request_call, response_call and error_call</span>

  <span class="k">def</span> <span class="nf">response_call</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
    <span class="nb">puts</span> <span class="n">data</span><span class="p">[</span><span class="ss">:headers</span><span class="p">][</span><span class="s2">"x-server-response-time"</span><span class="p">]</span>
    <span class="vi">@stack</span><span class="p">.</span><span class="nf">response_call</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="no">Excon</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"http://internal-domain-with-metrics/this"</span><span class="p">,</span>
  <span class="c1"># don't forget to add defaults...</span>
  <span class="ss">middlewares: </span><span class="no">Excon</span><span class="p">.</span><span class="nf">defaults</span><span class="p">[</span><span class="ss">:middlewares</span><span class="p">]</span> <span class="o">+</span> <span class="p">[</span><span class="no">MyMiddleware</span><span class="p">]</span>
<span class="p">)</span>
</code></pre></div></div>

<p>Middlewares are called in order. And that has some drawbacks. For instance, a data structure may be changed by one middleware, that will interfere with the execution of the next one. For instance, there’s a middleware to capture cookies, and another to follow redirect responses; If the second is set before the first, it means that cookies won’t be applied to the redirected request. This type of design is more prone to errors.</p>

<p>As mentioned earlier in the article, <a href="https://github.com/lostisland/faraday">faraday</a> uses a similar design, inspired from the rack middleware stack:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Middleware</span> <span class="o">&lt;</span> <span class="no">Faraday</span><span class="o">::</span><span class="no">Middleware</span>
  <span class="k">def</span> <span class="nf">on_request</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
    <span class="c1"># do smth with request env</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">on_complete</span><span class="p">(</span><span class="n">env</span><span class="p">)</span>
    <span class="nb">puts</span> <span class="n">env</span><span class="p">[</span><span class="ss">:response_headers</span><span class="p">][</span><span class="s2">"x-server-response-time"</span><span class="p">]</span>
  <span class="k">end</span>

  <span class="c1">### or alternatively, you could instead do:</span>

  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">request_env</span><span class="p">)</span>
    <span class="vi">@app</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="n">request_env</span><span class="p">).</span><span class="nf">on_complete</span> <span class="k">do</span> <span class="o">|</span><span class="n">response_env</span><span class="o">|</span>
      <span class="nb">puts</span> <span class="n">response_env</span><span class="p">[</span><span class="ss">:response_headers</span><span class="p">][</span><span class="s2">"x-server-response-time"</span><span class="p">]</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="n">conn</span> <span class="o">=</span> <span class="no">Faraday</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">conn</span><span class="o">|</span>
  <span class="n">conn</span><span class="p">.</span><span class="nf">request</span> <span class="no">Middleware</span> <span class="c1"># registers #on_request</span>
  <span class="n">conn</span><span class="p">.</span><span class="nf">response</span> <span class="no">Middleware</span> <span class="c1"># registers #on_complete</span>
  <span class="c1"># registers #call</span>
  <span class="n">conn</span><span class="p">.</span><span class="nf">use</span> <span class="no">Middleware</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Compared to the previous approach, it’s a bit confusing having two ways to accomplish something. And the same drawback applies: order matters. And with that, the <a href="https://github.com/lostisland/faraday/issues/1238">inevitable</a> <a href="https://github.com/lostisland/faraday/issues/1458">questions</a> follow.</p>

<p><a href="https://github.com/httprb/http">httprb</a> provides a feature called <a href="https://github.com/httprb/http/wiki/Logging-and-Instrumentation">features</a>, which is quite undocumented, albeit used internally to implement de/compression or debug logs. Looking at a few internal examples, the approach is relatively similar to <a href="https://github.com/excon/excon">excon</a>’s:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">MyFeature</span> <span class="o">&lt;</span> <span class="no">HTTP</span><span class="o">::</span><span class="no">Feature</span>
  <span class="k">def</span> <span class="nf">wrap_request</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
    <span class="c1"># do smth</span>
    <span class="n">request</span> <span class="c1"># must return</span>
  <span class="k">end</span>

  <span class="k">def</span> <span class="nf">wrap_response</span><span class="p">(</span><span class="n">response</span><span class="p">)</span>
    <span class="nb">puts</span> <span class="n">response</span><span class="p">.</span><span class="nf">headers</span><span class="p">[</span><span class="s2">"x-server-response-time"</span><span class="p">]</span>
    <span class="n">response</span> <span class="c1"># must return</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># optional: register here</span>
<span class="no">HTTP</span><span class="o">::</span><span class="no">Options</span><span class="p">.</span><span class="nf">register_feature</span><span class="p">(</span><span class="ss">:my_feature</span><span class="p">,</span> <span class="no">MyFeature</span><span class="p">)</span>

<span class="n">http</span> <span class="o">=</span> <span class="no">HTTP</span><span class="p">.</span><span class="nf">use</span><span class="p">(</span><span class="no">MyFeature</span><span class="p">)</span>
<span class="n">http</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="o">...</span><span class="p">)</span>
</code></pre></div></div>

<p>Being so similar to the examples above, the same drawbacks apply here. And you’ll also have to take into account that, because <a href="https://github.com/httprb/http">httprb</a> responses are “lazy”, the <code class="language-plaintext highlighter-rouge">wrap_response</code> hook can be called before the response is fully on the client side.</p>

<p><a href="https://github.com/jnunemaker/httparty">httparty</a> does not provide extension mechanisms like the previous ones. Instead, it promotes its class injection API as a way for users to decorate behaviour around API calls (which is the most popular way of using it):</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Google</span>
  <span class="kp">include</span> <span class="no">HTTParty</span>
  <span class="nb">format</span> <span class="ss">:html</span>
  <span class="n">base_uri</span> <span class="s1">'https://www.google.com'</span>

  <span class="k">def</span> <span class="nf">q</span><span class="p">(</span><span class="n">options</span> <span class="o">=</span> <span class="p">{})</span>
    <span class="n">q_query</span> <span class="o">=</span> <span class="no">URI</span><span class="p">.</span><span class="nf">www_encode_form</span><span class="p">(</span><span class="n">options</span><span class="p">)</span>
    <span class="nb">self</span><span class="p">.</span><span class="nf">class</span><span class="p">.</span><span class="nf">get</span><span class="p">(</span><span class="s2">"/search?</span><span class="si">#{</span><span class="n">q_query</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
  <span class="k">end</span>

  <span class="c1"># intercepting all requests, invoke the monkeypatch:</span>
  <span class="k">class</span> <span class="o">&lt;&lt;</span> <span class="nb">self</span>
    <span class="k">def</span> <span class="nf">perform_request_with_log</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
      <span class="nb">puts</span> <span class="s2">"this: </span><span class="si">#{</span><span class="n">args</span><span class="si">}</span><span class="s2">"</span>
      <span class="n">perform_request_without_log</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">)</span>
    <span class="k">end</span>
    <span class="kp">alias_method</span> <span class="ss">:perform_request_without_log</span><span class="p">,</span> <span class="ss">:perform_request</span>
    <span class="kp">alias_method</span> <span class="ss">:perform_request</span><span class="p">,</span> <span class="ss">:perform_request_with_log</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>As the example shows, there are limits to the extensions this approach enables: decorating behaviour is easy, but introspecting the client isn’t a first-class abstraction, and you’ll soon be adding a potentially unhealthy dose of monkey-patching to fill in the gaps.</p>

<p><a href="https://github.com/taf2/curb">curb</a> does not support anything of the kind. Either your needs are fulfilled by the wide array of <a href="https://curl.se/">curl</a> features it integrates with, or you’ll have a harder time beating it into shape.</p>

<p>And as for <a href="https://github.com/ruby/net-http/">net-http</a>… let’s just say that <a href="https://github.com/drbrain/net-http-persistent">there</a> <a href="https://github.com/drbrain/net-http-pipeline">are</a> <a href="https://github.com/drbrain/net-http-digest_auth">several</a> <code class="language-plaintext highlighter-rouge">net-http-$feature</code> gems around, which, at their best, inject APIs into core classes which work in isolation but rarely build well on top of each other, and at their worst, monkey-patch their way in (several tracing / logging / mock libraries do this).</p>

<p>To sum up, and discarding the ones which are not built for extension, most libraries allow extension based on a standard around chained hooks for “sending the request” and “getting a response” (the interpretation of which is library-dependent), and support a more or less friendlier (depending of which example, and personal opinion) API for registering extensions. In most cases, features are provided via these APIs. These extensions cover most of high-level use-cases, but start getting rather limiting for more advanced cases (such as getting information about DNS / socket-handshake / byte-level progress). And that’s where <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> flexible approach to extensions works best, by providing a higher- and low-level way of doing it, and on the latter, by building on a standard which has proven itself with some of the most respected gems within the ruby community.</p>

<h3 id="performance">Performance</h3>

<p>The first thing one can say about performance benchmarks, is that you cannot fully trust them. Some of the numbers you’ll see will always be context- or environment-specific: does the gem use a C extension optimized for x86, but that’s not the CPU arch from the machine the benchmark runs on? Is the network IPv4 optimized, thereby penalizing traffic going via IPv6? Are payloads exactly the same?</p>

<p>There are ways to ensure some level of confidence though. First, you must have access to the benchmark code, in order to gain context; you should also have access to the run logs and history; also, benchmarks must run regularly.</p>

<p>Because I didn’t find an acceptable public benchmark which fits these requirements, I went ahead and <a href="https://gitlab.com/os85/http-clients-benchmark">rolled my own</a> in order to measure the performance difference between some of ruby HTTP clients. While you’re free to inspect it, the gist of it is essentially a pair of containers running in a Gitlab CI pipeline, one with a test HTTP server, and another running the benchmark against it. It runs monthly, so it’s very up-to-date. Local area network ensures negligible network interference in the measurements. There’s a warmup phase, and garbage collection is turned off, ensuring no potential “stop-the-world” interference as well. The benchmark uses <a href="https://github.com/ruby/benchmark">the stdlib benchmark gem</a> to measure “real time”, and composes of a series of use-cases (alternatives may not support all of them, hence why not all of them are displayed in all graphs).</p>

<p><img src="/images/state-of-http-clients/http-single-bench.png" alt="Single Request Benchmark" />
<img src="/images/state-of-http-clients/http-persistent-bench.png" alt="Persistent Request Benchmark" />
<img src="/images/state-of-http-clients/http-pipelined-bench.png" alt="Pipelined Request Benchmark" />
<img src="/images/state-of-http-clients/http-concurrent-bench.png" alt="Concurrent Request Benchmark" /></p>

<p>While there could be more use-cases in the benchmarks (feel free to suggest by creating A Merge Request), this shows us that the performance gap between alternatives is not huge, which makes sense: even for such contained scenarios, most time is spent waiting on the network. As <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> maintainer, it’s definitely reassuring seeing it keeping up with the “top of the pack”, particularly when you consider that it is pure ruby (both the HTTP/1 and HTTP/2 parsers are written in ruby), and some of the alternatives claim much better performance due to using C-optimized code, ultimately not delivering (<a href="https://github.com/httprb/http">httprb</a> uses the nodeJS HTTP parser via FFI, and used to do it via a C extension; <a href="https://github.com/taf2/curb">curb</a> and <a href="https://github.com/typhoeus/typhoeus">typhoeus</a> use <a href="https://curl.se/libcurl/">libcurl</a> under the hood as well).</p>

<p>Honorable mention to <a href="https://github.com/ruby/net-http/">net-http</a>, which actually shows quite good numbers, which may mitigate a bit some of its UX shortcomings (caveat though: the “pipelined” and “persistent” benchmarks were performed using <code class="language-plaintext highlighter-rouge">net-http_pipeline</code> and <code class="language-plaintext highlighter-rouge">net-http_persistent</code> gems respectively).</p>

<h3 id="packaging">Packaging</h3>

<p>With the advent of containers as the ultimate deployment target, the art of setting up VMs has slowly been lost, and shifted into writing recipes, of which dockerfiles are the most popular today. That’s not to say everyone deploys to containers though: there’s also serverless platforms. And “on-premise” never went anywhere either (it’s just under-practised). And what about ruby-based scripting tools (like Homebrew) for your laptop? Don’t forget Windows either: that &lt;2% of the community will chase you in your dreams if they are faced with difficulties. Last resort, you can “write it in JRuby once and run it everywhere”. Bottom line, ruby is everywhere, and when building gems, you best take all this diversity into account, lest you’ll be reminded periodically by someone having troubles with the things you build.</p>

<h4 id="system">System</h4>

<p>So, first thing, how hard it is to install any of our candidates? The options range from “relatively hard”, to “easy”, to “zilch”. Let’s start by the end. <a href="https://github.com/ruby/net-http/">net-http</a> is already there. Done. Now that we got that out of the way, we can go to the easy part of the equation: pure ruby gems. Which ones are they? As already mentioned, <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> is pure ruby; the only thing you need to do is use the <code class="language-plaintext highlighter-rouge">gem</code> command, or <code class="language-plaintext highlighter-rouge">bundler</code>, like you do with any of the other alternatives. <a href="https://github.com/excon/excon">excon</a> and <a href="https://github.com/jnunemaker/httparty">httparty</a> are no different: they’re also pure ruby. On the <code class="language-plaintext highlighter-rouge">moderate</code> side, you’ll find <a href="https://github.com/httprb/http">httprb</a>; it requires the compilation of the <code class="language-plaintext highlighter-rouge">llhttp</code> C extension or FFI binding (for the aforementioned nodeJS parser). This means that, in order to install it, you’ll require the whole “C compilation toolchain” including CMake, gcc, and the like. And that includes the deployment environment, as all of them compile-on-install (take that into account in your slim/alpine images). And last of this bunch, you have <a href="https://github.com/taf2/curb">curb</a>,  which not only carries the same requirement of compiling a C extension on install, it also requires a (compatible) installation of <a href="https://curl.se/libcurl/">libcurl</a> (and bear in mind what was discussed about <a href="https://curl.se/libcurl/">libcurl</a>-based libs when you need something specific). While not <code class="language-plaintext highlighter-rouge">nokogiri</code>-bad in terms of compilation times, its still setup overhead (credit to <code class="language-plaintext highlighter-rouge">nokogiri</code> though for adopting pre-compiled binaries, something which none of the extension-dependent libraries researched here does). I’ll omit <a href="https://github.com/lostisland/faraday">faraday</a> from the conversation here, as the bulk of the cost lies in the chosen adapter.</p>

<h4 id="rubygems">Rubygems</h4>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># dependency list</span>
httpx
  http-2-next
excon
faraday
  faraday-net_http
  ruby2_keywords
http
  addressable
    public_suffix
  http-cookie
    domain_name
      unf <span class="c"># C Extensions</span>
  http-form_data
  llhttp-ffi <span class="c"># C Extensions of FFI</span>
    ffi-compiler
httparty
  mini_mime
  multi_xml
curb <span class="c"># C extensions</span>
net-http
</code></pre></div></div>

<p>Dependency-wise, the mileage also varies. As mentioned, <a href="https://github.com/ruby/net-http/">net-http</a> is all standard library built. <a href="https://github.com/excon/excon">excon</a> also ships with no direct dependencies, which is impressive all things considered. <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> ships with one (the <a href="https://gitlab.com/os85/http-2-next/">http-2-next</a> parser, which is at least maintained by the same person). <a href="https://github.com/jnunemaker/httparty">httparty</a> ships with 2 (why is <a href="https://github.com/sferik/multi_xml">multi_xml</a> even required? Not sure). <a href="https://github.com/lostisland/faraday">faraday</a> has at least 2 (that is, if you do not switch from the default adapter for <a href="https://github.com/ruby/net-http/">net-http</a>); <a href="https://github.com/httprb/http">httprb</a> has 4 direct dependencies, 8 total. <a href="https://github.com/taf2/curb">curb</a> has no direct dependencies either (ruby dependencies that is; it does require <a href="https://curl.se/libcurl/">libcurl</a>).</p>

<p>Is that all necessary? Perhaps, it depends. But I don’t see the point in <a href="https://github.com/httprb/http">httprb</a> carrying so much baggage <strong>by default</strong>: besides the aforementioned parser complication, it also declares <a href="https://github.com/httprb/form_data">http-form_data</a> (same-team maintained, for multipart support), <a href="https://github.com/sparklemotion/http-cookie">http-cookie</a>, and <a href="https://github.com/sporkmonger/addressable">addressable</a>, aka things that could be optional (ruby already ships with a URI parser), or not loaded by default (I doubt that the majority of its users have used the cookies feature, although everyone seems to be paying the cost). The same could be said of <a href="https://github.com/jnunemaker/httparty">httparty</a> requiring <a href="https://github.com/sferik/multi_xml">multi_xml</a> (who’s still using XML?). For instance, consider <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> and <a href="https://github.com/excon/excon">excon</a>’s approach, where certain features do require the installation of a separate gem, but you only pay the cost if you enable the feature (<a href="https://github.com/excon/excon">excon</a> supports <a href="https://github.com/sporkmonger/addressable">addressable</a> as an alternative URI parser, and just to name an example for <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a>, the <a href="https://honeyryderchuck.gitlab.io/httpx/wiki/GRPC">grpc</a> plugin requires the <a href="https://rubygems.org/gems/google-protobuf/versions/3.24.3">protobuf</a> gem).</p>

<p>Nevertheless, if packaging is the most important variable to consider, you can’t really beat “shipped with ruby”, i.e. <a href="https://github.com/ruby/net-http/">net-http</a>.</p>

<h3 id="features">Features</h3>

<p>The feature set that can be built on top of HTTP client is so immense, that it’s impossible to cover in a single blog post (I’d need a book for that, or several). Fortunately, nahi, the former maintainer of <a href="https://github.com/nahi/httpclient">httpclient</a>, made my job easier by having built a “common feature matrix” for a presentation he did many years ago in a ruby conference, that I’ll partially use here to highlight the intersection of features across the alternatives covered:</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>httpx</th>
      <th>excon</th>
      <th>faraday</th>
      <th>HTTPrb</th>
      <th>httparty</th>
      <th>curb</th>
      <th>net-http</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>compression</td>
      <td>✅ (also brotli)</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>Auth</td>
      <td>✅ (basic, digest, ntlm, bearer, aws-sigv4)</td>
      <td>✅ (basic)</td>
      <td>✅ (basic, bearer, token)</td>
      <td>✅ (basic, bearer)</td>
      <td>✅ (basic, digest)</td>
      <td>✅ (basic, digest, gssnegotiate, ntlm)</td>
      <td>✅ (basic)</td>
    </tr>
    <tr>
      <td>proxy</td>
      <td>✅ (HTTP, HTTPS, Socks4(a)/5, SSH )</td>
      <td>✅ (HTTP, HTTPS)</td>
      <td>🟠 (adapter-specific)</td>
      <td>✅ (HTTP, HTTPS)</td>
      <td>✅ (HTTP, HTTPS)</td>
      <td>✅ (HTTP, HTTPS, Socks4(a)/5, SSH )</td>
      <td>✅ (HTTP, HTTPS)</td>
    </tr>
    <tr>
      <td>proxy auth</td>
      <td>✅ (basic, digest, ntlm)</td>
      <td>❌</td>
      <td>🟠 (adapter-specific)</td>
      <td>✅ (basic)</td>
      <td>✅ (basic)</td>
      <td>✅ (basic, digest, gssnegotiate, ntlm)</td>
      <td>✅ (basic)</td>
    </tr>
    <tr>
      <td>cookies</td>
      <td>✅</td>
      <td>✅</td>
      <td>🟠 (separate middleware gem)</td>
      <td>✅</td>
      <td>❌</td>
      <td>✅</td>
      <td>❌</td>
    </tr>
    <tr>
      <td>follow redirects</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>❌</td>
    </tr>
    <tr>
      <td>retries</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>multipart</td>
      <td>✅</td>
      <td>❌</td>
      <td>✅</td>
      <td>🟠 (extra gem)</td>
      <td>✅</td>
      <td>✅</td>
      <td>❌</td>
    </tr>
    <tr>
      <td>streaming</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>expect-100</td>
      <td>✅</td>
      <td>❌</td>
      <td>🟠 (adapter-specific</td>
      <td>❌</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>UNIX Sockets</td>
      <td>✅</td>
      <td>✅</td>
      <td>🟠 (adapter-specific)</td>
      <td>❌</td>
      <td>❌</td>
      <td>✅</td>
      <td>❌</td>
    </tr>
    <tr>
      <td>HTTP/2</td>
      <td>✅</td>
      <td>❌</td>
      <td>🟠 (adapter-specific)</td>
      <td>❌</td>
      <td>❌</td>
      <td>✅</td>
      <td>❌</td>
    </tr>
    <tr>
      <td>jruby support</td>
      <td>✅</td>
      <td>❌</td>
      <td>🟠 (adapter-specific)</td>
      <td>✅</td>
      <td>❌</td>
      <td>❌</td>
      <td>✅</td>
    </tr>
  </tbody>
</table>

<p>One important thing to take into account is, just because the ✅ is there, it does not necessarily mean that all alternatives implement a feature the same way. For instance, <a href="https://github.com/taf2/curb">curb</a> support of GSSAPI requires a <a href="https://curl.se/">curl</a> build compiled with <code class="language-plaintext highlighter-rouge">gssapi</code>; <a href="https://github.com/httprb/http">httprb</a> proxy support does not cover the <code class="language-plaintext highlighter-rouge">http_proxy/https_proxy/no_proxy</code> environment variables (which will always come out as surprising if you’re a sysadmin); all of the alternatives, except <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> and <a href="https://github.com/taf2/curb">curb</a> (via <a href="https://curl.se/libcurl/">libcurl</a>), implement poor, or simply do not implement, mime type detection of file parts (as already mentioned in the multipart-related section); and as I’ve exposed earlier, the question about streaming response support is not “if”, but “how”.</p>

<p>Still it does show that, when it comes to having the obvious features expected from an HTTP client, the set of alternatives do cover a sufficient chunk not to be considered useless. The only option ticking all the boxes here is <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a>, but then again I selected the boxes, so I’d be interested to know whether you think this feature matrix is fair.</p>

<h3 id="extensions">Extensions</h3>

<p>An HTTP client is not an island. In most cases, it’s a really small from a large program. This program will have certain expectations from some of its dependencies. In the context of an http client, it’ll probably not want to send real requests in test mode. Some metrics / tracing support is usually a must. Can it easily log request information? The answer to these question may make or break the chance of a library being adopted in a given project.</p>

<p>While there’s plenty of tooling available, the ruby community has been settling on a group of dependencies which provide these type of extensions on top of well-known libraries. For instance, there’s <a href="https://github.com/bblimke/webmock">webmock</a> or <a href="https://github.com/vcr/vcr">vcr</a> for mocking HTTP requests. Tracing is usually vendor-specific (<a href="https://github.com/DataDog/dd-trace-rb">the datadog SDK</a>, for instance, ships API and shims for well-known libraries in its SDKs), although things are slowly getting a bit more standardized thanks to the <a href="https://opentelemetry.io/">Open Telemetry toolchain</a>. And there are several tools for logging HTTP information (of which <a href="https://github.com/trusche/httplog">httplog</a> is one of).</p>

<p>How these libraries choose which HTTP clients to support is up to how standard, or how popular they are, how many users rely on them (and for how long), or how much community “weight” these alternatives command. It’s expected, for instance, that <a href="https://github.com/ruby/net-http/">net-http</a> is supported by all of the above (no matter how anti “built-for-extension” it is).</p>

<table>
  <thead>
    <tr>
      <th> </th>
      <th>httpx</th>
      <th>excon</th>
      <th>faraday</th>
      <th>HTTPrb</th>
      <th>httparty</th>
      <th>curb</th>
      <th>net-http</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>webmock</td>
      <td>✅</td>
      <td>✅</td>
      <td>🟠</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>vcr</td>
      <td>❌</td>
      <td>✅</td>
      <td>✅</td>
      <td>❌</td>
      <td>✅</td>
      <td>✅ (partially)</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>datadog</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>🟠</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>opentelemetry</td>
      <td>❌</td>
      <td>✅</td>
      <td>✅</td>
      <td>✅</td>
      <td>🟠</td>
      <td>❌</td>
      <td>✅</td>
    </tr>
    <tr>
      <td>httplog</td>
      <td>❌</td>
      <td>✅</td>
      <td>🟠</td>
      <td>✅</td>
      <td>✅</td>
      <td>❌</td>
      <td>✅</td>
    </tr>
  </tbody>
</table>

<p>This list is not exhaustive, but it does show where more recent alternatives like <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> struggle: joining the group of “well-known” libraries is hard work. Specially when the library was created post-2014, and missed the heyday of when every exciting application in the internet was being built in ruby, and every option was getting a slice of the pie.</p>

<h2 id="what-sets-httpx-apart">What sets httpx apart</h2>

<p>So far, the focus of this analysis was to provide perspective, and a wider overview of how well the current well-maintained ruby HTTP clients cover a reasonable set of MUST HAVE and NICE TO HAVE features, enough at least to make this reading enjoyable.</p>

<p>Still, there are things that only <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> does, which you’ll never think about until things don’t work and you <strong>need them</strong>.</p>

<p>For instance, did you know that <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> is the only pure ruby (excluding <a href="https://curl.se/">curl</a>-based tools here) HTTP client (the only networking library, I think?) that does connection establishment using <a href="https://www.rfc-editor.org/rfc/rfc8305">Happy Eyeballs 2</a>? It will hardly be noticeable to you if production is about “always on IPv4” server-side deployments, or perhaps you don’t care as long as the tool “just works”, no matter whether the tool you’re using gives preference to IPv4 (<a href="https://github.com/excon/excon/issues/794">this is what excon does by the way</a>), until it doesn’t, and then you blame the server. It is certainly a SHOULD HAVE when doing client-side programs on multi-homed networks where connectivity may not be properly set. Such as games, or running <code class="language-plaintext highlighter-rouge">bundle install</code> as well (in fact, it’s so important that <a href="https://github.com/rubygems/rubygems/commit/f1d27c9d1b04129c3a1c239b805155145a9928f4">bundler has its own monkey-patch around TCP connection establishment which half-implements Happy Eyeballs</a>).</p>

<p>It also supports DNS resolution via <a href="https://developers.cloudflare.com/1.1.1.1/encryption/dns-over-https/">DoH</a>, a feature so hard to backport to existing networking tools in general, that there are products (such as Cloudflare Zero Trust) which will intercept local UDP/TCP-based DNS traffic through a program installed in your machine, and “translate” them to DoH-based DNS traffic. (<a href="https://curl.se/">curl</a> supports DoH, but <a href="https://github.com/taf2/curb">curb</a> does not seem to interface with it).</p>

<p>The ability to perform concurrent requests, very useful for scraping scripts for example, is also not to be found often (<a href="https://github.com/typhoeus/typhoeus">typhoeus</a> provides something similar, and via a less user-friendly API, as well as <a href="https://taf2.github.io/curb/">curb via Curl::Multi</a>).</p>

<p>It ships <a href="https://honeyryderchuck.gitlab.io/httpx/wiki/GRPC">a plugin to perform GRPC requests</a>, in case you want to forego the heavy dependency that is the <a href="https://github.com/grpc/grpc">grpc</a> gem (over 100Mb pre-compiled, it can take you Gbs of space if you have to compile it) or are on JRuby. And another <a href="https://honeyryderchuck.gitlab.io/httpx/wiki/WebDav">supporting WebDAV</a>.</p>

<p>Even something as simple as passing the IP address to use for a given request <strong>and</strong> which hostname to set in the SNI extension, or in the “Host” header, is practically impossible with any other library, <a href="https://honeyryderchuck.gitlab.io/httpx/wiki/TLS#sni">and dead easy with httpx</a>.</p>

<p>Bottom line, while most HTTP clients cover the 70% just fine, and 85% with a few adjustments, <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> works really hard in making the 99% of use-cases accessible.</p>

<p>(Speaking about coverage, <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a> publishes how much of the code is covered in CI. Good luck finding numbers for any of the others.)</p>

<h2 id="conclusion">Conclusion</h2>

<p>The main takeaway from this “state of the ruby HTTP clients” is that, no matter whether the “HTTP fringe features” aren’t of your interest, and you’re just interested in covering the 80%, <strong>choose a library which is still maintained</strong>. If you have a favourite library that wasn’t taken into account in this article, that’s probably why it isn’t here.</p>

<p>Beyond that, the choice will probably be based on prior experience and risk apettite for “trying new toys”, and the requirements you favour the most, which I (hopefully) have outlined and made a good analyis about. Whether it’s API UX, adoption rate, performance or anything else, any of these options will give you some level of acceptable quality.</p>

<p>And when in doubt, use <a href="https://honeyryderchuck.gitlab.io/httpx">httpx</a>. As it was shown, it stacks well against the competition in any available metric, and is working hard to curb the adoption gap. So help me change that :)</p>]]></content><author><name></name></author><summary type="html"><![CDATA[TL;DR most http clients you’ve been using since the ruby heyday are either broken, unmaintained, or stale, and you should be using httpx nowadays.]]></summary></entry><entry><title type="html">Introducing tobox</title><link href="honeyryderchuck.gitlab.io/2023/04/29/introducing-tobox.html" rel="alternate" type="text/html" title="Introducing tobox" /><published>2023-04-29T00:00:00+00:00</published><updated>2023-04-29T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2023/04/29/introducing-tobox</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2023/04/29/introducing-tobox.html"><![CDATA[<p><a href="https://gitlab.com/os85/tobox">tobox</a> is a framework-as-a-gem I’ve been developing over the last year, to solve a particular requirement: guarantee that callback/post-action tasks and emission of events resulting from a business transaction stored in the database happen 100% of the time.</p>

<p>In order to talk about its value, and defend some of choices made, some background is required.</p>

<h2 id="context">Context</h2>

<p>For the problem of offloading processing resulting from a given business transaction, the ruby community defaults to using background jobs. Most of us have used <a href="https://github.com/sidekiq/sidekiq">sidekiq</a> at one point or another in the last 10 years, while the elders among us may also be familiar with <a href="https://github.com/resque/resque">resque</a> or <a href="https://github.com/collectiveidea/delayed_job">delayed_job</a>, and here’s an honourable mention to <a href="https://github.com/ruby-shoryuken/shoryuken">shoryuken</a>, as integration with SQS is something that every other framework lacks.</p>

<p>These frameworks have mostly commoditized the “how do I defer this business flow after another one completes, while not making the client wait for it to finish” problem for us all. They achieve this by providing some sort of simple DSL to delegate the execution of a routine, by serializing and writing the required state into some broker, only to have another process read and and execute it:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">Foo</span>
  <span class="k">def</span> <span class="nf">activate</span>
    <span class="c1"># heavy duty</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># then</span>

<span class="n">foo</span><span class="p">.</span><span class="nf">async</span><span class="p">.</span><span class="nf">activate</span>

<span class="c1"># service object style, most common nowadays</span>
<span class="k">class</span> <span class="nc">ActivateJob</span> <span class="o">&lt;</span> <span class="no">SpecialFrameworkSubclass</span>
  <span class="k">def</span> <span class="nf">perform</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
    <span class="n">user</span><span class="p">.</span><span class="nf">activate</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="no">ActivateJob</span><span class="p">.</span><span class="nf">perform_async</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>

</code></pre></div></div>

<p>The solution is fairly similar for all of them (they mostly “stole” features from each other), so they differentiate themselves on other aspects, such as performance of the execution model (process/thread based), choice of broker (database, redis, SQS, rabbitMQ…), or advanced features (plugins, retry configuration, on complete callbacks, etc…).</p>

<p>One problem that is common to all of them, is how one needs to be aware of the storage and execution characteristics of the deferred routines, in order not to be surprised by some unexpected behavior. Argument serialization is one: while <a href="https://github.com/rails/globalid">rails provides a solution for serializing model instances for activejob</a>, most complex objects can’t be serialized, so documentation and FAQ sections will contain caveat warnings and recommendations about which types of objects can be used. Primitive types tend to be supported, however simple objects such as symbols aren’t supported everywhere (as an example, <a href="https://github.com/sidekiq/sidekiq">sidekiq</a> only accepts primitives which can be serialized into json).</p>

<p>But the main problem that gets everyone at some point in their careers, is when the state being stored in the database <strong>before</strong> deferring a function, is <strong>not</strong> available once it gets executed. In fact, <a href="https://github.com/sidekiq/sidekiq/wiki/FAQ#why-am-i-seeing-a-lot-of-cant-find-modelname-with-id12345-errors-with-sidekiq">one of sidekiq wiki FAQ oldest entries</a> contains the following:</p>

<blockquote>
  <p>Why am I seeing a lot of “Can’t find ModelName with ID=12345” errors with Sidekiq?</p>
</blockquote>

<blockquote>
  <p>Your client is creating the Model instance within a transaction and pushing a job to Sidekiq. Sidekiq is trying to execute your job before the transaction has actually committed. Use Rails’s after_commit :on =&gt; :create hook or move the job creation outside of the transaction block.</p>
</blockquote>

<h3 id="database-transactions">Database transactions</h3>

<p>Most rubyists building web services are using an <a href="https://en.wikipedia.org/wiki/ACID">ACID-compliant</a> database, usually over their favorite ORM; mine is <a href="https://github.com/jeremyevans/sequel">sequel</a>, but the majority probably knows <a href="https://guides.rubyonrails.org/active_record_basics.html">activerecord</a> the most. For the context of this post, the most important property of the ACID family is <em>Atomicity</em>, which ensures that all operations from a group all completes, or not at all. This includes errors in the operations themselves, but also “out of our control” events such as power outages or computer crashes. This is achieved by wrapping this group of operations (or SQL statements) in a database transaction:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">BEGIN</span><span class="p">;</span> <span class="c1">-- transaction starts here</span>
<span class="c1">-- UPDATE / INSERT / DELETE statements here</span>
<span class="k">COMMIT</span><span class="p">;</span> <span class="c1">-- or ROLLBACK; transaction ends here</span>
</code></pre></div></div>

<p>A transaction is a first-class citizen of your business logic, as it has to be explicitly started and finished. Ruby ORMs usually expose block-based DSLs to manage transactions:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># using sequel</span>
<span class="no">DB</span><span class="p">.</span><span class="nf">transaction</span> <span class="k">do</span> <span class="c1"># BEGIN</span>
  <span class="no">DB</span><span class="p">[</span><span class="ss">:foo</span><span class="p">].</span><span class="nf">insert</span><span class="p">(</span><span class="ss">bar: </span><span class="mi">1</span><span class="p">)</span> <span class="c1"># INSERT</span>
<span class="k">end</span> <span class="c1"># COMMIT; ROLLBACK if an error is raise inside the block</span>

<span class="c1"># using activerecord</span>
<span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">transaction</span> <span class="k">do</span>
  <span class="no">Foo</span><span class="p">.</span><span class="nf">create</span><span class="p">(</span><span class="ss">bar: </span><span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Transactions are also managed via other features, such as model callbacks, and one has to be aware of it when using deferred routines:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">User</span>
  <span class="n">after_save</span> <span class="ss">:activate</span>

  <span class="k">def</span> <span class="nf">activate</span>
    <span class="no">ActivateJob</span><span class="p">.</span><span class="nf">perform_async</span><span class="p">(</span><span class="nb">self</span><span class="p">)</span>
    <span class="c1"># TRANSACTION DID NOT COMMIT YET HERE!</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>(The above is fine if you’re using <code class="language-plaintext highlighter-rouge">delayed_job</code>, as the broker is your database; not as fine if you’re using <code class="language-plaintext highlighter-rouge">sidekiq</code> or <code class="language-plaintext highlighter-rouge">shoryuken</code> though.)</p>

<p>And then there are some other 3rd party gems which hide these calls under layers of DSL (looking at you, state machine gems). Given all the options available, and how convenient these deferred DSLs seem, it’s no wonder that, when using them, one is either oblivious, or lost, on whether a transaction is open. Specially if this feature needs to be shipped by next Friday.</p>

<p>And if you deferred a function before a transaction is committed, and you need the state you’re writing into the database, and that transaction either fails, or takes too long to commit, you’ll see yourselves staring at some similar FAQ like the one I shared above.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># service object style, most common nowadays</span>
<span class="k">class</span> <span class="nc">ActivateJob</span> <span class="o">&lt;</span> <span class="no">SpecialFrameworkSubclass</span>
  <span class="k">def</span> <span class="nf">perform</span><span class="p">(</span><span class="n">user</span><span class="p">)</span>
    <span class="n">user</span><span class="p">.</span><span class="nf">activate</span> <span class="c1">#=&gt; Exception raised, RecordNotFound</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>But let’s say you lived to fight another day, you learned your lesson, untangled that 3rd party code you don’t own, and now you’re sure that the deferred function call happens after the transaction successfully commits. Problem solved, right?</p>

<h3 id="storagebroker-consistency">Storage/Broker consistency</h3>

<p>So you’re committing a database transaction to fully store the state of your business transaction, and then you’re invoking the “defer function” routine, which will push the serialized state into your broker:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">foo</span> <span class="o">=</span> <span class="no">ActiveRecord</span><span class="o">::</span><span class="no">Base</span><span class="p">.</span><span class="nf">transaction</span> <span class="k">do</span>
  <span class="no">Foo</span><span class="p">.</span><span class="nf">create</span><span class="p">(</span><span class="ss">bar: </span><span class="mi">1</span><span class="p">)</span>
<span class="k">end</span>
<span class="no">ActivateJob</span><span class="p">.</span><span class="nf">perform_async</span><span class="p">(</span><span class="n">foo</span><span class="p">)</span>
</code></pre></div></div>

<p>What if there’s an outage between the transaction committing <strong>and</strong> the job being enqueued? It’s terrible, given that your “jobs to be done” will probably be silently lost.</p>

<p>Such a conundrum is only possible to avoid if the database and the broker are protected by the same transaction guarantees, i.e. if the broker is the same database where your business resources are stored. From the background job alternatives mentioned above, only <a href="https://github.com/collectiveidea/delayed_job">delayed_job</a> fits the bill, given that the queue is a database table. Everything else (yes, including <a href="https://github.com/sidekiq/sidekiq">sidekiq</a>) is vulnerable to this problem.</p>

<p>This has been discussed at length in <a href="https://brandur.org/job-drain">this 2017 blog post</a>.</p>

<h2 id="transactional-outbox-pattern">Transactional outbox pattern</h2>

<p>While the description of the problem above mostly focuses on the background jobs ruby frameworks use-case, the same type of problem happens if your business transaction requires to perform some rpc call (HTTP, GRPC) to a separate system, which happens a lot if you’re using microservices.</p>

<p>A solution for this general problem was formalized in the <a href="https://microservices.io/patterns/data/transactional-outbox.html">transaction outbox pattern</a>. The gist of it is, business transactions store their “events to be emitted” in a separate database table (typically called “outbox”) <strong>within the same database transaction</strong>. This in itself ensures that the events associated with the business resources will always be stored <strong>if</strong> the resources are stored successfully. Then there is a separate worker (thread in same process, separate process…) reading entries from the “outbox” table, and doing the actual publishing of the event (or enqueuing of the job) before deleting the entry.</p>

<h2 id="tobox">tobox</h2>

<p>So what is <a href="https://gitlab.com/os85/tobox">tobox</a> again? In a nutshell, it’s a “transactional outbox” framework.</p>

<p>I built it because I needed its properties, and I couldn’t find a transactional outbox implementation for any programming language, just blog posts on how to hypothetically do your own.</p>

<p>The DSL is declarative and “event-based”, which means that one can register handlers bound to specific events:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># this is the config DSL</span>
<span class="c1"># tobox.rb</span>
<span class="n">on</span><span class="p">(</span><span class="s2">"order_processed"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">event</span><span class="o">|</span>
  <span class="no">Payment</span><span class="o">::</span><span class="no">Start</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">on</span><span class="p">(</span><span class="s2">"order_cancelled"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">event</span><span class="o">|</span>
  <span class="no">CustomerSupport</span><span class="o">::</span><span class="no">Notify</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
<span class="k">end</span>

<span class="c1"># if handling multiple events</span>
<span class="n">on</span><span class="p">(</span><span class="s2">"order_processed"</span><span class="p">,</span> <span class="s2">"order_cancelled"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">event</span><span class="o">|</span>
  <span class="no">Logs</span><span class="o">::</span><span class="no">Order</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
<span class="k">end</span>

<span class="c1">### app/handlers/payment_start.rb</span>
<span class="k">module</span> <span class="nn">Payment::Start</span>
  <span class="kp">module_function</span>

  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
    <span class="n">data</span> <span class="o">=</span> <span class="n">event</span><span class="p">[</span><span class="ss">:after</span><span class="p">]</span>
    <span class="c1"># do something with the event data hash, perhaps enqueue it as a background job?</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>An entry point script is also provided, to start a separate process to consume events from the <code class="language-plaintext highlighter-rouge">outbox</code> table:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> tobox <span class="nt">-r</span> ./app.rb —config tobox.rb

<span class="c"># if you’re using rails</span>
<span class="o">&gt;</span> tobox <span class="nt">-r</span> ./config/environment.rb —config tobox-dsl.rb
</code></pre></div></div>

<p>In the process, it handles the complexity of the “plumbing” involved in building a transactional outbox consumer, using a set of conventions and tricks:</p>

<h3 id="thread-and-fiber-based-worker-pools">Thread and Fiber-based worker pools</h3>

<p><a href="https://gitlab.com/os85/tobox">tobox</a>, by default, uses threads to handle many events at the same time in the same process, <a href="https://github.com/sidekiq/sidekiq">just like sidekiq’s</a>. You can tweak the number of threads in the config:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># tobox.rb</span>
<span class="n">concurrency</span> <span class="mi">25</span>
</code></pre></div></div>

<p>You can, however, switch to using fibers instead of threads, if your event handling is very IO-bound (if you’re just relaying the events to SNS, it is):</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># tobox.rb</span>
<span class="n">worker</span> <span class="ss">:fiber</span> <span class="c1"># :thread by default</span>
<span class="n">concurrency</span> <span class="mi">100</span> <span class="c1"># max 100 fibers running at the same time</span>
</code></pre></div></div>

<p>(This requires using the <a href="https://github.com/bruno-/fiber_scheduler">fiber_scheduler</a> gem).</p>

<h3 id="skip-locked">SKIP LOCKED</h3>

<p>When enabling multiple consumers for a given queue, one has to have the guarantee that a given event won’t be processed more than once by separate workers at the same time. One way to achieve that using the database is by locking the row where the event is stored, and delete it after it has been handled. However, if two workers try to lock the same row, one of them will remain idle, instead of picking up the next available event.</p>

<p>While the database row-level locking model wasn’t built to support the queue use-case, some recent features were added to some of the most popular database engines to accommodate it. One of these features is <a href="https://www.postgresql.org/docs/current/sql-select.html">the SKIP LOCKED clause</a>, a non-standard SQL clauuse which can be used with <code class="language-plaintext highlighter-rouge">SELECT …. FOR UPDATE</code>, and will result in already locked rows being ignored (“skipped”) by the <code class="language-plaintext highlighter-rouge">SELECT</code> statement.</p>

<p>This feature is <a href="https://gitlab.com/os85/tobox#requirements">core to how tobox works, which is why it only supports databases including the <code class="language-plaintext highlighter-rouge">SKIP LOCKED</code> feature</a>.</p>

<p>(Supporting this many databases is only possible thanks to the <a href="https://github.com/jeremyevans/sequel">sequel gem</a>, by the way).</p>

<h3 id="plugin-dsl">Plugin DSL</h3>

<p><code class="language-plaintext highlighter-rouge">tobox</code> ships with a simple plugin system which supports intercepting handlers before and after they’re handled (or error out). It’s the foundation of a few plugins which already ship with the gem:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># tobox.rb</span>
<span class="n">plugin</span><span class="p">(</span><span class="ss">:zeitwerk</span><span class="p">)</span>
<span class="n">plugin</span><span class="p">(</span><span class="ss">:datadog</span><span class="p">)</span>
<span class="n">plugin</span><span class="p">(</span><span class="ss">:sentry</span><span class="p">)</span>
</code></pre></div></div>

<h3 id="multilang-support">multilang support</h3>

<p>Until now, I didn’t show how to insert events into the queue. That’s because, for now, <code class="language-plaintext highlighter-rouge">tobox</code> does not provide any DSL for it. The reason is, working with database objects is probably already such a big part of your day-to-day work, that moving that concern into a 3rd party gem may end up having more drawbacks than benefits. Moreover, perhaps this way it’s clear that you can use a transactional outbox <strong>even if your application is not made in ruby</strong>.</p>

<p>For instance, here are several examples of how to write an event into the outbox:</p>

<h4 id="ruby">ruby</h4>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># using sequel dataset API</span>
<span class="no">DB</span><span class="p">[</span><span class="ss">:outbox</span><span class="p">].</span><span class="nf">insert</span><span class="p">(</span><span class="ss">type: </span><span class="s2">"order_created"</span><span class="p">,</span> <span class="ss">data_after: </span><span class="n">to_json</span><span class="p">(</span><span class="n">order</span><span class="p">))</span>
<span class="c1"># or an ActiveRecord model</span>
<span class="no">OutboxEvent</span><span class="p">.</span><span class="nf">create</span><span class="p">(</span><span class="ss">type: </span><span class="s2">"order_created"</span><span class="p">,</span> <span class="ss">data_after: </span><span class="n">to_json</span><span class="p">(</span><span class="n">order</span><span class="p">))</span>
</code></pre></div></div>

<h4 id="python">python</h4>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># SQLAlchemy
</span><span class="n">event</span> <span class="o">=</span> <span class="nc">OutboxEvent</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="sh">"</span><span class="s">order_created</span><span class="sh">"</span><span class="p">,</span> <span class="n">data_after</span><span class="o">=</span><span class="nf">to_json</span><span class="p">(</span><span class="n">order</span><span class="p">))</span>
<span class="n">db</span><span class="p">.</span><span class="n">session</span><span class="p">.</span><span class="nf">add</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
</code></pre></div></div>

<h4 id="elixir">elixir</h4>

<div class="language-elixir highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">OutboxRepo</span><span class="o">.</span><span class="n">insert</span> <span class="p">%</span><span class="no">OutboxEvent</span><span class="p">{</span>
  <span class="ss">type:</span> <span class="s2">"order_created"</span><span class="p">,</span>
  <span class="n">data_after</span><span class="o">=</span><span class="n">to_json</span><span class="p">(</span><span class="n">order</span><span class="p">)</span>
<span class="p">}</span> <span class="k">do</span> <span class="err">…</span>
</code></pre></div></div>

<h4 id="database-triggers">database triggers</h4>

<p>There are also other ways to “go implicit”, if that fits your use-case. One way you can do it is by using database triggers, such as this postgresql example:</p>

<div class="language-plsql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">CREATE</span> <span class="kr">OR</span> <span class="kr">REPLACE</span> <span class="kr">FUNCTION</span> <span class="n">order_created_outbox_event</span><span class="p">()</span>
  <span class="n">RETURNS</span> <span class="kr">TRIGGER</span>
  <span class="k">LANGUAGE</span> <span class="n">PLPGSQL</span>
  <span class="kr">AS</span>
<span class="err">$$</span>
<span class="k">BEGIN</span>
	<span class="kr">INSERT</span> <span class="kr">INTO</span> <span class="n">outbox</span><span class="p">(</span><span class="n">event_type</span><span class="p">,</span> <span class="n">data_after</span><span class="p">)</span>
		 <span class="kr">VALUES</span><span class="p">(</span><span class="o">'</span><span class="s1">order_created</span><span class="o">'</span><span class="p">,</span> <span class="n">row_to_json</span><span class="p">(</span><span class="k">NEW</span><span class="p">.</span><span class="o">*</span><span class="p">));</span>
	<span class="k">RETURN</span> <span class="k">NEW</span><span class="p">;</span>
<span class="k">END</span><span class="p">;</span>
<span class="err">$$</span>

<span class="kr">CREATE</span> <span class="kr">TRIGGER</span> <span class="n">order_created_outbox_event</span>
  <span class="k">AFTER</span> <span class="kr">INSERT</span>
  <span class="kr">ON</span> <span class="n">orders</span>
  <span class="kr">FOR</span> <span class="k">EACH</span> <span class="k">ROW</span>
  <span class="k">EXECUTE</span> <span class="k">PROCEDURE</span> <span class="n">order_created_outbox_event</span><span class="p">();</span>
</code></pre></div></div>

<h2 id="conclusion">Conclusion</h2>

<p><code class="language-plaintext highlighter-rouge">tobox</code> is a lightweight tool that you can use to ensure robustness and guarantee at-least-once semantics in your business workflows with little to no performance impact. It’s therefore not a silver bullet: it trades off some E2E latency (the extra step of putting and taking the event from the database) to achieve that robustness.</p>

<p>While it may “quack” like a background job framework, it is not designed to be one. Its features (do check the <a href="https://gitlab.com/os85/tobox/-/blob/master/README.md">README</a>) are more focused on the transactional outbox use-case, so if you require background job features, you should use <code class="language-plaintext highlighter-rouge">tobox</code> alongside such a framework.</p>

<p>The declarative DSL is a departure from the current “standard” for background jobs, IMO leaner, and eliminates the antipattern of creating a job class, only to call some other service object in the <code class="language-plaintext highlighter-rouge">#perform</code> method.</p>

<p>Some edges are still rough, and some features are still missing (no web dashboard yet, for example). But it already does “one thing well”, so that’s the 80% right there.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[tobox is a framework-as-a-gem I’ve been developing over the last year, to solve a particular requirement: guarantee that callback/post-action tasks and emission of events resulting from a business transaction stored in the database happen 100% of the time.]]></summary></entry><entry><title type="html">Aggregating data for analytics using postgresql, ruby and sequel</title><link href="honeyryderchuck.gitlab.io/2022/12/19/aggregating-data-for-analytics-using-postgresql-ruby-and-sequel.html" rel="alternate" type="text/html" title="Aggregating data for analytics using postgresql, ruby and sequel" /><published>2022-12-19T00:00:00+00:00</published><updated>2022-12-19T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2022/12/19/aggregating-data-for-analytics-using-postgresql-ruby-and-sequel</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2022/12/19/aggregating-data-for-analytics-using-postgresql-ruby-and-sequel.html"><![CDATA[<p>At my dayjob, I’ve been working, for the most of this year of our lord 2022, in a team taking this new flagship product from “alpha” to “general availability”. With products in such an early stage, you don’t know a lot of things: what your users want, how they will use (or you want them to use) the platform, whether the thing you’re building is as valuable as you think it is. In such a stage, the most important skill you should have, as a team building and maintaining a product, is to be able to ship features quickly; the sooner you know what “sticks” with your userbase, the sooner you’ll know how worthwhile will it be to improve it, whether to “pivot” to something else, or whether you’re better off throwing it all away.</p>

<p>How you get to “quickly” is usually a combination of a few factors: team skill, project scope, and a healthy dose of pragmatism. Shipping quickly means refusing perfect solutions; it’s focusing about “0 to 1” before you consider “1 to 100”; it means “good enough” first. And you have to be comfortable absorbing the right amount of “tech debt”. Why the right amount? Because there’s a very big chance that the thing you’re building quickly now is going to be the thing you’ll be maintaining in 1 year (no matter how much product calls it “throw-away POC” or your engineering manager tells you will be able to rewrite it in kubernetes when it “wins”). Given enough experience, you learn how to compartimentalize debt in a way that it doesn’t “leak” too much into other sections of the codebase. You learn how to leverage these limitations, and reuse it to solve other problems. You learn how to do some forecasting, and ask yourself some questions such as “can the way this feature was architected survive the next 3 months, or the next 3 years? How long can this reasonably hold in production until something entirely different needs to be considered?”.</p>

<p>After the initial core features, the next thing we knew was going to be very valuable for our customer base, was providing customer-facing analytics dashboards. The product team wasn’t sure exactly how that would look like, and constantly debated how “can we learn from our users quickly”. A team of engineers was assigned with the task of scoping the technical aspects of the project, and they went about designing a full-fledged “analytics pipeline”, with some of the most “state-of-the-art” technologies, such as <a href="https://spark.apache.org/">Spark</a>, <a href="https://flink.apache.org/">flink</a> or <a href="https://openwhisk.apache.org/">openwhisk</a>, to solve not only the immediate product’s analytics needs, but also aggregating analytics data for all the other products and services from the company. With that, scope only grew, and needless to say, none of this was going to be ready in 2 weeks. Or 2 months. No “quickly”.</p>

<p>The plan of building an analytics pipeline could have other theoretical compound benefits for the company, but it’d take more than 6 months to ship. That’s quite risky, considering company priorities change all the time, and in times of economic uncertainty (hello 2023!), long-running costly projects are quickly thrown into the doghouse when they do not generate immediate revenue, and kept there until the good times roll again. So it could take years, in reality.</p>

<p>I proposed another approach for a shortcut: we could provide a couple of API endpoints for querying data aggregations, around which some dashboards could be built; the analytics team could then focus on building those dashboards immediately, while designing the long term analytics pipeline for when this proposed solution would not scale anymore. When I mentioned this could be shipped in about 2 weeks, everyone was sold on the idea.</p>

<p>So the question was, how could we deliver something in 2 weeks, that would not simply fall off the rails in 2 months, and could potentially still be operational 2 years from now, at a reasonable scale, if need be?</p>

<h2 id="data">Data</h2>

<p>Without going too much into detail, the product revolves around allowing customers defining user journeys, and costumer users running them. This is what an over-simplification of the database would look like:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">definitions</span> <span class="p">(</span>
    <span class="n">id</span> <span class="n">UUID</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="k">DEFAULT</span> <span class="n">gen_random_uuid</span><span class="p">(),</span>
    <span class="n">client_id</span> <span class="n">UUID</span><span class="p">,</span>
    <span class="o">#</span> <span class="p">....</span>
<span class="p">);</span>

<span class="k">CREATE</span> <span class="k">TABLE</span> <span class="n">journeys</span> <span class="p">(</span>
    <span class="n">id</span> <span class="n">UUID</span> <span class="k">PRIMARY</span> <span class="k">KEY</span> <span class="k">DEFAULT</span> <span class="n">gen_random_uuid</span><span class="p">(),</span>
    <span class="n">client_id</span> <span class="n">UUID</span><span class="p">,</span>
    <span class="n">client_user_id</span> <span class="n">UUID</span><span class="p">,</span>
    <span class="n">definition_id</span> <span class="n">UUID</span><span class="p">,</span>
    <span class="n">status</span> <span class="nb">varchar</span><span class="p">(</span><span class="mi">255</span><span class="p">),</span> <span class="c1">-- "pass", "fail", "review"</span>
    <span class="n">error_code</span> <span class="nb">varchar</span><span class="p">(</span><span class="mi">255</span><span class="p">),</span> <span class="c1">-- "network_error", "file_error" ...</span>
    <span class="n">created_at</span> <span class="nb">TIMESTAMP</span> <span class="k">WITHOUT</span> <span class="nb">TIME</span> <span class="k">ZONE</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="n">NOW</span><span class="p">(),</span>
    <span class="n">updated_at</span> <span class="nb">TIMESTAMP</span> <span class="k">WITHOUT</span> <span class="nb">TIME</span> <span class="k">ZONE</span> <span class="k">NOT</span> <span class="k">NULL</span> <span class="k">DEFAULT</span> <span class="n">NOW</span><span class="p">(),</span>
    <span class="k">CONSTRAINT</span> <span class="n">fk_definition</span> <span class="k">FOREIGN</span> <span class="k">KEY</span><span class="p">(</span><span class="n">definition_id</span><span class="p">)</span> <span class="k">REFERENCES</span> <span class="n">definitions</span><span class="p">(</span><span class="n">id</span><span class="p">)</span>
<span class="p">);</span>
</code></pre></div></div>

<p>The initial requirements revolved around fetching, for example, how many journeys were in-progress, percentage of completed/cancelled, average, or max or median duration, all this for time intervals. The data could be fully aggregated or time-series split (initially, by day).</p>

<h2 id="planning">Planning</h2>

<p>Querying product data tables directly was not an option; the required queries would be very complex, and require all kinds of indexes, which would affect write performance; and even if that would be done, certain client accounts volume could render all those optimizations uselss, and even moreso if the requested time intervals stretched more into the past.</p>

<p>We decided it’d be better to pre-aggregate data in a separate table. It’d be aggregated by day, as this would be the requested time range minimal unit (“today”, “last 15 days”…). And we are using Postgresql, so using something like <a href="https://www.postgresql.org/docs/current/ddl-partitioning.html">table partitioning</a> as time passes and data grows, gave us enough confidence that this solution could scale well, in case something better never came along.</p>

<p>So there were two things to be done: aggregate data, and serve it via the API.</p>

<h2 id="aggregating">Aggregating</h2>

<p>Aggregation was to be done on the fly. While bulk-aggregating it in cron jobs was certainly possible, we would like to serve data as fresh as possible, as the default time interval would be “current day”. UTC was to be used everywhere.</p>

<p>This was to be done using two pieces: one of them would be <a href="https://gitlab.com/os85/tobox">tobox</a>, a transactional outbox framework I was developing at the time, and which I was already considering integrating to solve other issues in our architecture (which will deserve its own post), and <a href="sequel.jeremyevans.net/">sequel</a>, the best ORM/database toolkit you can find in any stack. Period.</p>

<p>The analytics table was created as per the following <code class="language-plaintext highlighter-rouge">sequel</code> migration:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">Sequel</span><span class="p">.</span><span class="nf">migration</span> <span class="k">do</span>
  <span class="n">up</span> <span class="k">do</span>
    <span class="n">create_table?</span><span class="p">(</span><span class="ss">:journeys_analytics_daily</span><span class="p">)</span> <span class="k">do</span>
      <span class="n">column</span> <span class="ss">:client_id</span><span class="p">,</span> <span class="ss">:uuid</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span>
      <span class="n">column</span> <span class="ss">:journey_id</span><span class="p">,</span> <span class="ss">:uuid</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span>
      <span class="n">column</span> <span class="ss">:definition_id</span><span class="p">,</span> <span class="ss">:uuid</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span>
      <span class="n">column</span> <span class="ss">:day</span><span class="p">,</span> <span class="ss">:date</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span>
      <span class="n">column</span> <span class="ss">:started_count</span><span class="p">,</span> <span class="ss">:integer</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span><span class="p">,</span> <span class="ss">default: </span><span class="mi">0</span>
      <span class="n">column</span> <span class="ss">:completed_count</span><span class="p">,</span> <span class="ss">:integer</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span><span class="p">,</span> <span class="ss">default: </span><span class="mi">0</span>
      <span class="n">column</span> <span class="ss">:cancelled_count</span><span class="p">,</span> <span class="ss">:integer</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span><span class="p">,</span> <span class="ss">default: </span><span class="mi">0</span>
      <span class="n">column</span> <span class="ss">:status_count</span><span class="p">,</span> <span class="ss">:jsonb</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span><span class="p">,</span> <span class="ss">default: </span><span class="no">Sequel</span><span class="p">.</span><span class="nf">pg_jsonb_wrap</span><span class="p">({})</span>
      <span class="n">column</span> <span class="ss">:error_code_count</span><span class="p">,</span> <span class="ss">:jsonb</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">false</span><span class="p">,</span> <span class="ss">default: </span><span class="no">Sequel</span><span class="p">.</span><span class="nf">pg_jsonb_wrap</span><span class="p">({})</span>
      <span class="n">column</span> <span class="ss">:min_duration</span><span class="p">,</span> <span class="ss">:integer</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">true</span><span class="p">,</span> <span class="ss">default: </span><span class="mi">0</span>
      <span class="n">column</span> <span class="ss">:max_duration</span><span class="p">,</span> <span class="ss">:integer</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">true</span><span class="p">,</span> <span class="ss">default: </span><span class="mi">0</span>
      <span class="n">column</span> <span class="ss">:avg_duration</span><span class="p">,</span> <span class="ss">:integer</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">true</span><span class="p">,</span> <span class="ss">default: </span><span class="mi">0</span>
      <span class="n">column</span> <span class="ss">:median_duration</span><span class="p">,</span> <span class="ss">:integer</span><span class="p">,</span> <span class="ss">nullable: </span><span class="kp">true</span><span class="p">,</span> <span class="ss">default: </span><span class="mi">0</span>
      <span class="n">primary_key</span> <span class="sx">%i[client_id journey_id definition_id day]</span>
    <span class="k">end</span>
    <span class="c1"># ...</span>
</code></pre></div></div>

<p>Why were <code class="language-plaintext highlighter-rouge">status</code> and <code class="language-plaintext highlighter-rouge">error_code</code> data aggregated into JSONB columns, and not have each possible value be mapped into its own column? The reason was that new statuses and error codes would eventually be defined, which would then cause new columns to be added/removed, therefore requiring schema changes. JSONB could satisfy the same requirements without the need for it, with a bit of postgres “JSON function-fu”.</p>

<p>In <code class="language-plaintext highlighter-rouge">tobox</code>, two handlers were subscribed to the “journey started” and “journey finished” outbox events:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># tobox config fifle</span>
<span class="n">on</span><span class="p">(</span><span class="s2">"journey_started"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">event</span><span class="o">|</span>
  <span class="no">Aggregation</span><span class="o">::</span><span class="no">JourneyStartedEvent</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
<span class="k">end</span>
<span class="n">on</span><span class="p">(</span><span class="s2">"journey_finished"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">event</span><span class="o">|</span>
  <span class="no">Aggregation</span><span class="o">::</span><span class="no">JourneyFinishedEvent</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
<span class="k">end</span>

<span class="c1"># Event structure:</span>
<span class="c1"># {</span>
<span class="c1">#   "event" =&gt; "journey_finished" # or "journey_started"</span>
<span class="c1">#   "event_id" =&gt; "e30aedaa-8eba-462c-b2c8-086b5c6ee824",</span>
<span class="c1">#   "emitted_at" =&gt; "2022-12-24T00:00:00Z",</span>
<span class="c1">#   "client_id" =&gt; "4cffca63-7f1f-48b6-a8bc-5b39b515d854",</span>
<span class="c1">#   "journey_id" =&gt;"87536aa8-de39-4428-9567-5824287111ff",</span>
<span class="c1">#   "definition_id" =&gt; "9ee3d3b9-2930-4212-bbdb-ef4e5852bde4",</span>
<span class="c1">#   "status" =&gt; "pass" # or "fail", "review", "drop", "delay"...</span>
<span class="c1">#   "error_code" =&gt; nil # or "network_error", "file_error", etc...</span>
<span class="c1">#   "created_at" =&gt; "2022-12-23T00:00:00Z",</span>
<span class="c1">#   "updated_at" =&gt; "2022-12-24T00:00:00Z",</span>
<span class="c1"># }</span>
</code></pre></div></div>

<p>The handlers would then use the data sent in the event to craft an SQL query which would atomically increment counter columns and duration calculations.</p>

<p>Events could be processed “out of order”, and a “started” event could receive the respective “finished” event the following day. UPSERTs could help manage that.</p>

<p>Atomic counter increments for integer columns were a no brainer. But how could this work for JSONB columns? The solution is a combination of Postgresql JSONB functions. Let’s look at the <code class="language-plaintext highlighter-rouge">sequel</code> code first:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="no">TABLE_NAME</span> <span class="o">=</span> <span class="ss">:journeys_analytics_daily</span>
<span class="c1"># Aggregation::JourneyStartedEvent</span>
<span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
  <span class="n">event_time</span> <span class="o">=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">event</span><span class="p">[</span><span class="s2">"emitted_at"</span><span class="p">])</span>

  <span class="no">DB</span><span class="p">[</span><span class="no">TABLE_NAME</span><span class="p">].</span><span class="nf">insert_conflict</span><span class="p">(</span>
    <span class="ss">constraint: :journeys_analytics_daily_pkey</span><span class="p">,</span>
    <span class="ss">update: </span><span class="p">{</span> <span class="ss">started_count: </span><span class="no">Sequel</span><span class="p">[</span><span class="no">TABLE_NAME</span><span class="p">][</span><span class="ss">:started_count</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span><span class="p">}</span>
  <span class="p">).</span><span class="nf">insert</span><span class="p">(</span>
    <span class="ss">client_id: </span><span class="n">data</span><span class="p">[</span><span class="s2">"client_id"</span><span class="p">],</span>
    <span class="ss">journey_id: </span><span class="n">data</span><span class="p">[</span><span class="s2">"journey_id"</span><span class="p">],</span>
    <span class="ss">definition_id: </span><span class="n">data</span><span class="p">[</span><span class="s2">"definition_id"</span><span class="p">],</span>
    <span class="ss">day: </span><span class="n">event_time</span><span class="p">.</span><span class="nf">strftime</span><span class="p">(</span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span>
    <span class="ss">started_count: </span><span class="mi">1</span><span class="p">,</span>
  <span class="p">)</span>
<span class="k">end</span>


<span class="c1"># Aggregation::JourneyCompletedEvent</span>
<span class="k">def</span> <span class="nf">call</span><span class="p">(</span><span class="n">event</span><span class="p">)</span>
  <span class="n">event_time</span> <span class="o">=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">event</span><span class="p">[</span><span class="s2">"emitted_at"</span><span class="p">])</span>

  <span class="n">insert_args</span> <span class="o">=</span> <span class="p">{</span>
    <span class="ss">client_id: </span><span class="n">data</span><span class="p">[</span><span class="s2">"client_id"</span><span class="p">],</span>
    <span class="ss">journey_id: </span><span class="n">data</span><span class="p">[</span><span class="s2">"journey_id"</span><span class="p">],</span>
    <span class="ss">definition_id: </span><span class="n">data</span><span class="p">[</span><span class="s2">"definition_id"</span><span class="p">],</span>
    <span class="ss">day: </span><span class="n">event_time</span><span class="p">.</span><span class="nf">strftime</span><span class="p">(</span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span>
  <span class="p">}</span>

  <span class="n">update_args</span> <span class="o">=</span> <span class="p">{}</span>

  <span class="k">if</span> <span class="n">error_code</span> <span class="o">=</span> <span class="n">event</span><span class="p">[</span><span class="s2">"error_code"</span><span class="p">]</span>
    <span class="c1"># journeys with errors aren't accounted for in duration metrics</span>
    <span class="n">insert_args</span><span class="p">[</span><span class="ss">:cancelled_count</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="n">update_args</span><span class="p">[</span><span class="ss">:cancelled_count</span><span class="p">]</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">[</span><span class="no">TABLE_NAME</span><span class="p">][</span><span class="ss">:cancelled_count</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span>

    <span class="n">error_code_column</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">[</span><span class="ss">:error_code_count</span><span class="p">].</span><span class="nf">pg_jsonb</span>
    <span class="n">insert_args</span><span class="p">[</span><span class="ss">:error_code_count</span><span class="p">]</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">pg_json_wrap</span><span class="p">({</span><span class="n">error_code</span> <span class="o">=&gt;</span> <span class="mi">1</span><span class="p">})</span>
    <span class="n">update_args</span><span class="p">[</span><span class="ss">:error_code_count</span><span class="p">]</span> <span class="o">=</span> <span class="n">error_code_column</span><span class="p">.</span><span class="nf">set</span><span class="p">(</span>
      <span class="s2">"{</span><span class="si">#{</span><span class="n">error_code</span><span class="si">}</span><span class="s2">})"</span><span class="p">,</span>
      <span class="p">(</span><span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="ss">:coalesce</span><span class="p">,</span> <span class="n">error_code_column</span><span class="p">[</span><span class="n">error_code</span><span class="p">],</span> <span class="s2">"0"</span><span class="p">).</span><span class="nf">cast_numeric</span> <span class="o">+</span> <span class="mi">1</span><span class="p">).</span><span class="nf">cast</span><span class="p">(</span><span class="ss">:text</span><span class="p">).</span><span class="nf">cast</span><span class="p">(</span><span class="ss">:jsonb</span><span class="p">),</span>
      <span class="kp">true</span>
    <span class="p">)</span>
  <span class="k">else</span>
    <span class="n">insert_args</span><span class="p">[</span><span class="ss">:completed_count</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
    <span class="n">update_args</span><span class="p">[</span><span class="ss">:completed_count</span><span class="p">]</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">[</span><span class="no">TABLE_NAME</span><span class="p">][</span><span class="ss">:completed_count</span><span class="p">]</span> <span class="o">+</span> <span class="mi">1</span>

    <span class="n">status</span> <span class="o">=</span> <span class="n">event</span><span class="p">[</span><span class="s2">"status"</span><span class="p">]</span>
    <span class="n">status_column</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">[</span><span class="ss">:status_count</span><span class="p">].</span><span class="nf">pg_jsonb</span>
    <span class="n">insert_args</span><span class="p">[</span><span class="ss">:status_count</span><span class="p">]</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">pg_json_wrap</span><span class="p">({</span><span class="n">status</span> <span class="o">=&gt;</span> <span class="mi">1</span><span class="p">})</span>
    <span class="n">update_args</span><span class="p">[</span><span class="ss">:status_count</span><span class="p">]</span> <span class="o">=</span> <span class="n">status_column</span><span class="p">.</span><span class="nf">set</span><span class="p">(</span>
      <span class="s2">"{</span><span class="si">#{</span><span class="n">status</span><span class="si">}</span><span class="s2">})"</span><span class="p">,</span>
      <span class="p">(</span><span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="ss">:coalesce</span><span class="p">,</span> <span class="n">status_column</span><span class="p">[</span><span class="n">status</span><span class="p">],</span> <span class="s2">"0"</span><span class="p">).</span><span class="nf">cast_numeric</span> <span class="o">+</span> <span class="mi">1</span><span class="p">).</span><span class="nf">cast</span><span class="p">(</span><span class="ss">:text</span><span class="p">).</span><span class="nf">cast</span><span class="p">(</span><span class="ss">:jsonb</span><span class="p">),</span>
      <span class="kp">true</span>
    <span class="p">)</span>

    <span class="c1"># duration</span>
    <span class="n">duration</span> <span class="o">=</span> <span class="p">(</span><span class="no">Time</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">event</span><span class="p">[</span><span class="s2">"updated_at"</span><span class="p">])</span> <span class="o">-</span> <span class="no">Time</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">event</span><span class="p">[</span><span class="s2">"created_at"</span><span class="p">])).</span><span class="nf">to_i</span>
    <span class="n">insert_args</span><span class="p">[</span><span class="ss">:min_duration</span><span class="p">]</span> <span class="o">=</span> <span class="n">insert_args</span><span class="p">[</span><span class="ss">:max_duration</span><span class="p">]</span> <span class="o">=</span> <span class="n">insert_args</span><span class="p">[</span><span class="ss">:avg_duration</span><span class="p">]</span> <span class="o">=</span> <span class="n">insert_args</span><span class="p">[</span><span class="ss">:median_duration</span><span class="p">]</span> <span class="o">=</span> <span class="n">duration</span>

    <span class="n">update_args</span><span class="p">[</span><span class="ss">:min_duration</span><span class="p">]</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="ss">:least</span><span class="p">,</span> <span class="no">Sequel</span><span class="p">[</span><span class="no">TABLE_NAME</span><span class="p">][</span><span class="ss">:min_duration</span><span class="p">],</span>  <span class="no">Sequel</span><span class="p">[</span><span class="ss">:excluded</span><span class="p">][</span><span class="ss">:min_duration</span><span class="p">])</span>
    <span class="n">update_args</span><span class="p">[</span><span class="ss">:max_duration</span><span class="p">]</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="ss">:greatest</span><span class="p">,</span> <span class="no">Sequel</span><span class="p">[</span><span class="no">TABLE_NAME</span><span class="p">][</span><span class="ss">:max_duration</span><span class="p">],</span>  <span class="no">Sequel</span><span class="p">[</span><span class="ss">:excluded</span><span class="p">][</span><span class="ss">:max_duration</span><span class="p">])</span>
    <span class="n">update_args</span><span class="p">[</span><span class="ss">:avg_duration</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span>
      <span class="p">(</span> <span class="p">(</span><span class="no">Sequel</span><span class="p">[</span><span class="no">TABLE_NAME</span><span class="p">][</span><span class="ss">:avg_duration</span><span class="p">]</span> <span class="o">*</span> <span class="no">Sequel</span><span class="p">[</span><span class="no">TABLE_NAME</span><span class="p">][</span><span class="ss">:completed_count</span><span class="p">])</span> <span class="o">+</span> <span class="no">Sequel</span><span class="p">[</span><span class="ss">:excluded</span><span class="p">][</span><span class="ss">:avg_duration</span><span class="p">])</span>
      <span class="sr">/ (Sequel[TABLE_NAME][:completed_count] + 1)
    )

    # median calculation is a bit more involved and requires a query to product data
    update_args[:median_duration] = DB[:journeys].where(
      :client_id =&gt; data["client_id"],
      :journey_id =&gt; data["journey_id"],
      :definition_id =&gt; data["definition_id"],
      :error_code =&gt; nil
    ).where(Sequel.cast(Sequel[:journeys][:updated_at], :date) =&gt; event_time.strftime("%Y-%m-%d"))
      .select(
        Sequel.function(:coalesce,
          Sequel.function(:percentile_cont, 0.5)
            .within_group(Sequel.extract(:epoch, Sequel[:journeys][:updated_at] - Sequel[:journeys][:created_at])),
          0)
      )
  end

  DB[TABLE_NAME].insert_conflict(
    constraint: :journeys_analytics_daily_pkey,
    update: update_args
  ).insert(insert_args)
end
</span></code></pre></div></div>

<p>These generate the following SQL:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">-- for journey started</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="nv">"journeys_analytics_daily"</span>
<span class="p">(</span><span class="nv">"client_id"</span><span class="p">,</span> <span class="nv">"journey_id"</span><span class="p">,</span> <span class="nv">"definition_id"</span><span class="p">,</span> <span class="nv">"day"</span><span class="p">,</span> <span class="nv">"started_count"</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span>
  <span class="s1">'4cffca63-7f1f-48b6-a8bc-5b39b515d854'</span><span class="p">,</span>
  <span class="s1">'e30aedaa-8eba-462c-b2c8-086b5c6ee824'</span><span class="p">,</span>
  <span class="s1">'9ee3d3b9-2930-4212-bbdb-ef4e5852bde4'</span><span class="p">,</span>
  <span class="s1">'2022-12-23'</span><span class="p">,</span>
  <span class="mi">1</span>
<span class="p">)</span>
<span class="k">ON</span> <span class="n">CONFLICT</span> <span class="k">ON</span> <span class="k">CONSTRAINT</span> <span class="nv">"journeys_analytics_daily_pkey"</span>
<span class="k">DO</span> <span class="k">UPDATE</span> <span class="k">SET</span> <span class="nv">"started_count"</span> <span class="o">=</span> <span class="p">(</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"started_count"</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="n">RETURNING</span> <span class="nv">"client_id"</span>
<span class="c1">-- for journey completed with errors</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="nv">"journeys_analytics_daily"</span>
<span class="p">(</span><span class="nv">"client_id"</span><span class="p">,</span> <span class="nv">"journey_id"</span><span class="p">,</span> <span class="nv">"definition_id"</span><span class="p">,</span> <span class="nv">"day"</span><span class="p">,</span> <span class="nv">"cancelled_count"</span><span class="p">,</span> <span class="nv">"cancelled_count"</span><span class="p">)</span>
 <span class="k">VALUES</span> <span class="p">(</span>
  <span class="s1">'4cffca63-7f1f-48b6-a8bc-5b39b515d854'</span><span class="p">,</span>
  <span class="s1">'e30aedaa-8eba-462c-b2c8-086b5c6ee824'</span><span class="p">,</span>
  <span class="s1">'9ee3d3b9-2930-4212-bbdb-ef4e5852bde4'</span><span class="p">,</span>
  <span class="s1">'2022-12-23'</span><span class="p">,</span>
  <span class="mi">1</span><span class="p">,</span>
  <span class="s1">'{"network_error":1}'</span><span class="p">::</span><span class="n">json</span>
<span class="p">)</span>
 <span class="k">ON</span> <span class="n">CONFLICT</span> <span class="k">ON</span> <span class="k">CONSTRAINT</span> <span class="nv">"journeys_analytics_daily_pkey"</span>
 <span class="k">DO</span> <span class="k">UPDATE</span> <span class="k">SET</span>
  <span class="nv">"cancelled_count"</span> <span class="o">=</span> <span class="p">(</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"cancelled_count"</span> <span class="o">+</span> <span class="mi">1</span><span class="p">),</span>
  <span class="nv">"error_code_count"</span> <span class="o">=</span> <span class="n">jsonb_set</span><span class="p">(</span>
    <span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"error_code_count"</span><span class="p">,</span>
    <span class="s1">'{network_error}'</span><span class="p">,</span>
    <span class="k">CAST</span><span class="p">(</span><span class="k">CAST</span><span class="p">((</span><span class="k">CAST</span><span class="p">(</span><span class="n">coalesce</span><span class="p">((</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"error_code_count"</span> <span class="o">-&gt;</span> <span class="s1">'network_error'</span><span class="p">),</span> <span class="s1">'0'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">integer</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">text</span><span class="p">)</span> <span class="k">AS</span> <span class="n">jsonb</span><span class="p">),</span>
    <span class="k">true</span>
  <span class="p">)</span> <span class="n">RETURNING</span> <span class="nv">"client_id"</span>
<span class="c1">-- for journey completed successfully</span>
<span class="k">INSERT</span> <span class="k">INTO</span> <span class="nv">"journeys_analytics_daily"</span>
<span class="p">(</span><span class="nv">"client_id"</span><span class="p">,</span> <span class="nv">"journey_id"</span><span class="p">,</span> <span class="nv">"definition_id"</span><span class="p">,</span> <span class="nv">"day"</span><span class="p">,</span> <span class="nv">"completed_count"</span><span class="p">,</span> <span class="nv">"state_count"</span><span class="p">,</span> <span class="nv">"min_duration"</span><span class="p">,</span> <span class="nv">"max_duration"</span><span class="p">,</span> <span class="nv">"avg_duration"</span><span class="p">,</span> <span class="nv">"median_duration"</span><span class="p">)</span>
<span class="k">VALUES</span> <span class="p">(</span>
  <span class="s1">'4cffca63-7f1f-48b6-a8bc-5b39b515d854'</span><span class="p">,</span>
  <span class="s1">'e30aedaa-8eba-462c-b2c8-086b5c6ee824'</span><span class="p">,</span>
  <span class="s1">'9ee3d3b9-2930-4212-bbdb-ef4e5852bde4'</span><span class="p">,</span>
  <span class="s1">'2022-12-24'</span><span class="p">,</span>
  <span class="mi">1</span><span class="p">,</span>
  <span class="s1">'{"pass":1}'</span><span class="p">::</span><span class="n">json</span><span class="p">,</span>
  <span class="mi">3600</span><span class="p">,</span>
  <span class="mi">3600</span><span class="p">,</span>
  <span class="mi">3600</span><span class="p">,</span>
  <span class="mi">3600</span>
<span class="p">)</span>
<span class="k">ON</span> <span class="n">CONFLICT</span> <span class="k">ON</span> <span class="k">CONSTRAINT</span> <span class="nv">"journeys_analytics_daily_pkey"</span>
<span class="k">DO</span> <span class="k">UPDATE</span> <span class="k">SET</span>
  <span class="nv">"completed_count"</span> <span class="o">=</span> <span class="p">(</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"completed_count"</span> <span class="o">+</span> <span class="mi">1</span><span class="p">),</span>
  <span class="nv">"state_count"</span> <span class="o">=</span> <span class="n">jsonb_set</span><span class="p">(</span>
    <span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"state_count"</span><span class="p">,</span>
    <span class="s1">'{pass}'</span><span class="p">,</span>
    <span class="k">CAST</span><span class="p">(</span><span class="k">CAST</span><span class="p">((</span><span class="k">CAST</span><span class="p">(</span><span class="n">coalesce</span><span class="p">((</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"state_count"</span> <span class="o">-&gt;</span> <span class="s1">'pass'</span><span class="p">),</span> <span class="s1">'0'</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">integer</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="k">AS</span> <span class="nb">text</span><span class="p">)</span> <span class="k">AS</span> <span class="n">jsonb</span><span class="p">),</span>
    <span class="k">true</span>
  <span class="p">),</span>
  <span class="nv">"min_duration"</span> <span class="o">=</span> <span class="n">least</span><span class="p">(</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"min_duration"</span><span class="p">,</span> <span class="nv">"excluded"</span><span class="p">.</span><span class="nv">"min_duration"</span><span class="p">),</span>
  <span class="nv">"max_duration"</span> <span class="o">=</span> <span class="n">greatest</span><span class="p">(</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"max_duration"</span><span class="p">,</span> <span class="nv">"excluded"</span><span class="p">.</span><span class="nv">"max_duration"</span><span class="p">),</span>
  <span class="nv">"avg_duration"</span> <span class="o">=</span> <span class="p">(((</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"avg_duration"</span> <span class="o">*</span> <span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"completed_count"</span><span class="p">)</span> <span class="o">+</span> <span class="nv">"excluded"</span><span class="p">.</span><span class="nv">"avg_duration"</span><span class="p">)</span> <span class="o">/</span> <span class="p">(</span><span class="nv">"journeys_analytics_daily"</span><span class="p">.</span><span class="nv">"completed_count"</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)),</span>
  <span class="nv">"median_duration"</span> <span class="o">=</span> <span class="p">(</span>
    <span class="k">SELECT</span> <span class="n">percentile_cont</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="p">)</span> <span class="n">WITHIN</span> <span class="k">GROUP</span> <span class="p">(</span>
      <span class="k">ORDER</span> <span class="k">BY</span> <span class="k">extract</span><span class="p">(</span>
        <span class="n">epoch</span> <span class="k">FROM</span> <span class="p">(</span><span class="nv">"journeys"</span><span class="p">.</span><span class="nv">"updated_at"</span> <span class="o">-</span> <span class="nv">"journeys"</span><span class="p">.</span><span class="nv">"created_at"</span><span class="p">))</span>
      <span class="p">)</span> <span class="k">FROM</span> <span class="nv">"journeys"</span>
        <span class="k">WHERE</span> <span class="p">(</span>
          <span class="p">(</span><span class="nv">"client_id"</span> <span class="o">=</span> <span class="s1">'4cffca63-7f1f-48b6-a8bc-5b39b515d854'</span><span class="p">)</span> <span class="k">AND</span>
          <span class="p">(</span><span class="nv">"journey_id"</span> <span class="o">=</span> <span class="s1">'e30aedaa-8eba-462c-b2c8-086b5c6ee824'</span><span class="p">)</span> <span class="k">AND</span>
          <span class="p">(</span><span class="nv">"definition_id"</span> <span class="o">=</span> <span class="s1">'9ee3d3b9-2930-4212-bbdb-ef4e5852bde4'</span><span class="p">)</span> <span class="k">AND</span>
          <span class="p">(</span><span class="nv">"error_code"</span> <span class="k">IS</span> <span class="k">NULL</span><span class="p">)</span> <span class="k">AND</span>
          <span class="p">(</span><span class="k">CAST</span><span class="p">(</span><span class="nv">"journeys"</span><span class="p">.</span><span class="nv">"updated_at"</span> <span class="k">AS</span> <span class="nb">date</span><span class="p">)</span> <span class="o">=</span> <span class="s1">'2022-12-24'</span><span class="p">)))</span> <span class="n">RETURNING</span> <span class="nv">"client_id"</span>
</code></pre></div></div>

<p>That’s quite a lot of complex <code class="language-plaintext highlighter-rouge">sequel</code> and <code class="language-plaintext highlighter-rouge">SQL</code>. Let’s digest the hardest parts:</p>

<ul>
  <li>data is aggregated by day, which is achieved by including the date in the primary key; this will become the constraint on which <code class="language-plaintext highlighter-rouge">ON CONFLICT DO UPDATE</code> works.</li>
  <li>“counter columns”, which are initiated “on insert” and incremented “on update”, use SQL atomic increments, in the form of queries such as <code class="language-plaintext highlighter-rouge">"completed_count" = ("journeys_analytics_daily"."completed_count" + 1)</code>; in this way, there is no need to manage exclusive access for updating rows via techniques such as <code class="language-plaintext highlighter-rouge">SELECT FOR UPDATE</code>.</li>
  <li>the jsonb “counter columns” use a variation of the same technique, however they require some specialization, via the usage of the <code class="language-plaintext highlighter-rouge">jsonb_set</code> postgresql function; given that the initial value for a given status/error code may not be present, usage of the “coalesce” function is used to establish a default; what happens afterwards is the operation sequence “convert to integer -&gt; increment -&gt; convert to text -&gt; convert to jsonb”, which requires more overhead than the tradicional integer column increments, but still works without explicit locks and multiple SQL statements.</li>
  <li>calculating the average on the fly can be described as “multiple current average duration by total number of evaluated journeys, add ingested duration, divide by total + 1”.</li>
  <li>median duration is calculation by using the technique described <a href="https://www.skillslogic.com/blog/dashboards-data-warehousing/calculating-medians-in-postgresql-with-percentile_cont">here</a>.</li>
</ul>

<p>And with that, we can start ingesting analytics data.</p>

<h2 id="querying">Querying</h2>

<p>Using your framework of choice, it’s only a matter of what to query. The request interface could be handled by something like <a href="https://roda.jeremyevans.net/">roda</a>, which takes care of parsing request parameters, and JSON-encoding the analytics data in the response:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">App</span> <span class="o">&lt;</span> <span class="no">Roda</span>
  <span class="n">plugin</span> <span class="ss">:json_parser</span>
  <span class="n">plugin</span> <span class="ss">:jsons</span>

  <span class="n">route</span> <span class="k">do</span> <span class="o">|</span><span class="n">r</span><span class="o">|</span>
    <span class="n">client_id</span> <span class="o">=</span> <span class="n">read_client_id_from_session</span>
    <span class="n">r</span><span class="p">.</span><span class="nf">is</span> <span class="s2">"journey-analytics"</span> <span class="k">do</span>
    <span class="c1"># GET /analytics request</span>
      <span class="n">r</span><span class="p">.</span><span class="nf">get</span> <span class="k">do</span>
        <span class="c1"># data fetching delegated to separate module</span>
        <span class="n">query</span> <span class="o">=</span> <span class="no">JourneyAnalyticsQuery</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span>
          <span class="n">client_id</span><span class="p">,</span>
          <span class="n">request</span><span class="p">.</span><span class="nf">params</span><span class="p">[</span><span class="s2">"journey_id"</span><span class="p">],</span>
          <span class="n">request</span><span class="p">.</span><span class="nf">params</span><span class="p">[</span><span class="s2">"definition_id"</span><span class="p">],</span>
          <span class="n">request</span><span class="p">.</span><span class="nf">params</span><span class="p">[</span><span class="s2">"after"</span><span class="p">],</span>
          <span class="n">request</span><span class="p">.</span><span class="nf">params</span><span class="p">[</span><span class="s2">"before"</span><span class="p">],</span>
        <span class="p">)</span>

        <span class="n">data</span> <span class="o">=</span> <span class="n">apply_pagination</span><span class="p">(</span><span class="n">query</span><span class="p">)</span>
        <span class="n">json_serialize</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
      <span class="k">end</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The actual querying can be handled by a separate module, which takes care of picking up the table we’ve been ingesting data into, and applies the filters as per the parameters the client sent in the request.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/journey_analytics_query.rb</span>

<span class="k">module</span> <span class="nn">JourneyAnalyticsQuery</span>
  <span class="kp">module_function</span>

  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span>
    <span class="n">client_id</span><span class="p">,</span>
    <span class="n">journey_id</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">,</span>
    <span class="n">definition_id</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">,</span>
    <span class="n">after</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">,</span>
    <span class="n">before</span> <span class="o">=</span> <span class="kp">nil</span>
  <span class="p">)</span>
    <span class="n">query</span> <span class="o">=</span> <span class="no">DB</span><span class="p">[</span><span class="ss">:journeys_analytics_daily</span><span class="p">]</span>

    <span class="n">query</span> <span class="o">=</span> <span class="n">query</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="ss">journey_id: </span><span class="n">journey_id</span><span class="p">)</span> <span class="k">if</span> <span class="n">journey_id</span>
    <span class="n">query</span> <span class="o">=</span> <span class="n">query</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="ss">definition_id: </span><span class="n">definition_id</span><span class="p">)</span> <span class="k">if</span> <span class="n">definition_id</span>
    <span class="n">query</span> <span class="o">=</span> <span class="n">query</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="no">Sequel</span><span class="p">.</span><span class="nf">expr</span><span class="p">(</span><span class="no">Sequel</span><span class="p">[</span><span class="ss">:journeys_analytics_daily</span><span class="p">][</span><span class="ss">:date</span><span class="p">]</span> <span class="o">=&gt;</span> <span class="no">Time</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">after</span><span class="p">))</span> <span class="k">if</span> <span class="n">after</span>
    <span class="n">query</span> <span class="o">=</span> <span class="n">query</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="no">Sequel</span><span class="p">.</span><span class="nf">expr</span><span class="p">(</span><span class="no">Sequel</span><span class="p">[</span><span class="ss">:journeys_analytics_daily</span><span class="p">][</span><span class="ss">:date</span><span class="p">]</span> <span class="o">&lt;=</span> <span class="no">Time</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">before</span><span class="p">))</span> <span class="k">if</span> <span class="n">before</span>

    <span class="c1"># COUNTERS</span>
    <span class="c1">#</span>
    <span class="c1"># aggregate sum of normalized columns</span>
    <span class="n">selectors</span> <span class="o">=</span> <span class="o">%</span><span class="n">wi</span><span class="p">[</span><span class="n">started</span> <span class="n">completed</span> <span class="n">cancelled</span><span class="p">].</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">key</span><span class="o">|</span>
      <span class="no">DB</span><span class="p">[</span><span class="ss">:journeys_analytics_daily</span><span class="p">].</span><span class="nf">sum</span><span class="p">(</span><span class="ss">:"</span><span class="si">#{</span><span class="n">key</span><span class="si">}</span><span class="ss">_count"</span><span class="p">).</span><span class="nf">as</span><span class="p">(</span><span class="ss">:"</span><span class="si">#{</span><span class="n">key</span><span class="si">}</span><span class="ss">_count"</span><span class="p">)</span>
    <span class="k">end</span>

    <span class="c1"># aggregate sum of denormalized values</span>
    <span class="c1">#</span>
    <span class="c1"># this expects the full set of values to be stored in static variables</span>
    <span class="n">status_column</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">[</span><span class="ss">:status_count</span><span class="p">].</span><span class="nf">pg_jsonb</span>
    <span class="n">selectors</span> <span class="o">+=</span> <span class="no">STATUSES</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">status</span><span class="o">|</span>
      <span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="ss">:sum</span><span class="p">,</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="ss">:coalesce</span><span class="p">,</span> <span class="n">status_column</span><span class="p">[</span><span class="n">status</span><span class="p">].</span><span class="nf">cast</span><span class="p">(</span><span class="ss">:integer</span><span class="p">),</span> <span class="mi">0</span><span class="p">)).</span><span class="nf">as</span><span class="p">(</span><span class="ss">:"status_</span><span class="si">#{</span><span class="n">status</span><span class="si">}</span><span class="ss">_count"</span><span class="p">)</span>
    <span class="k">end</span>
    <span class="c1"># this expects the full set of values to be stored in static variables</span>
    <span class="n">error_code_column</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">[</span><span class="ss">:error_code_count</span><span class="p">].</span><span class="nf">pg_jsonb</span>
    <span class="n">selectors</span> <span class="o">+=</span> <span class="no">ERROR_CODES</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">error_code</span><span class="o">|</span>
      <span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="ss">:sum</span><span class="p">,</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="ss">:coalesce</span><span class="p">,</span> <span class="n">error_code_column</span><span class="p">[</span><span class="n">error_code</span><span class="p">].</span><span class="nf">cast</span><span class="p">(</span><span class="ss">:integer</span><span class="p">),</span> <span class="mi">0</span><span class="p">)).</span><span class="nf">as</span><span class="p">(</span><span class="ss">:"error_code_</span><span class="si">#{</span><span class="n">error_code</span><span class="si">}</span><span class="ss">_count"</span><span class="p">)</span>
    <span class="k">end</span>

    <span class="c1"># DURATION</span>
    <span class="c1">#</span>
    <span class="n">selectors</span> <span class="o">+=</span> <span class="sx">%i[min max avg]</span><span class="p">.</span><span class="nf">map</span> <span class="k">do</span> <span class="o">|</span><span class="n">agg</span><span class="o">|</span>
      <span class="no">Sequel</span><span class="p">.</span><span class="nf">function</span><span class="p">(</span><span class="n">agg</span><span class="p">,</span> <span class="s2">"</span><span class="si">#{</span><span class="n">agg</span><span class="si">}</span><span class="s2">_duration"</span><span class="p">).</span><span class="nf">as</span><span class="p">(</span><span class="s2">"</span><span class="si">#{</span><span class="n">agg</span><span class="si">}</span><span class="s2">_duration"</span><span class="p">)</span>
    <span class="k">end</span>

    <span class="n">selectors</span> <span class="o">&lt;&lt;</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">func</span><span class="p">(</span><span class="ss">:percentile_cont</span><span class="p">).</span><span class="nf">within_group</span><span class="p">(</span><span class="ss">:median_duration</span><span class="p">).</span><span class="nf">as</span><span class="p">(</span><span class="s2">"median_duration"</span><span class="p">)</span>

    <span class="n">query</span><span class="p">.</span><span class="nf">select</span><span class="p">(</span><span class="n">selectors</span><span class="p">).</span><span class="nf">reverse</span><span class="p">(</span><span class="ss">:day</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>This generates queries such as:</p>

<div class="language-sql highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">SELECT</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">started_count</span><span class="p">)</span> <span class="k">AS</span> <span class="n">started_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">completed_count</span><span class="p">)</span> <span class="k">AS</span> <span class="n">completed_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">cancelled_count</span><span class="p">)</span> <span class="k">AS</span> <span class="n">cancelled_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">COALESCE</span><span class="p">(</span><span class="k">CAST</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">state_count</span> <span class="o">-&gt;</span> <span class="s1">'pass'</span> <span class="k">AS</span> <span class="nb">INTEGER</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="k">AS</span> <span class="n">status_pass_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">COALESCE</span><span class="p">(</span><span class="k">CAST</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">state_count</span> <span class="o">-&gt;</span> <span class="s1">'fail'</span> <span class="k">AS</span> <span class="nb">INTEGER</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="k">AS</span> <span class="n">status_fail_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">COALESCE</span><span class="p">(</span><span class="k">CAST</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">state_count</span> <span class="o">-&gt;</span> <span class="s1">'review'</span> <span class="k">AS</span> <span class="nb">INTEGER</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="k">AS</span> <span class="n">status_review_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">COALESCE</span><span class="p">(</span><span class="k">CAST</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">state_count</span> <span class="o">-&gt;</span> <span class="s1">'drop'</span> <span class="k">AS</span> <span class="nb">INTEGER</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="k">AS</span> <span class="n">status_drop_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">COALESCE</span><span class="p">(</span><span class="k">CAST</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">error_code_count</span> <span class="o">-&gt;</span> <span class="s1">'network_error'</span> <span class="k">AS</span> <span class="nb">INTEGER</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="k">AS</span> <span class="n">error_code_network_error_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">COALESCE</span><span class="p">(</span><span class="k">CAST</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">error_code_count</span> <span class="o">-&gt;</span> <span class="s1">'file_error'</span> <span class="k">AS</span> <span class="nb">INTEGER</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="k">AS</span> <span class="n">error_code_file_error_count</span><span class="p">,</span>
  <span class="k">SUM</span><span class="p">(</span><span class="n">COALESCE</span><span class="p">(</span><span class="k">CAST</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">error_code_count</span> <span class="o">-&gt;</span> <span class="s1">'mailroom_error'</span> <span class="k">AS</span> <span class="nb">INTEGER</span><span class="p">),</span> <span class="mi">0</span><span class="p">)</span> <span class="k">AS</span> <span class="n">error_code_mailroom_error_count</span><span class="p">,</span>
  <span class="k">MIN</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">min_duration</span><span class="p">)</span> <span class="k">AS</span> <span class="n">min_duration</span><span class="p">,</span>
  <span class="k">MAX</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">max_duration</span><span class="p">)</span> <span class="k">AS</span> <span class="n">max_duration</span><span class="p">,</span>
  <span class="k">AVG</span><span class="p">(</span><span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">avg_duration</span><span class="p">)</span> <span class="k">AS</span> <span class="n">avg_duration</span><span class="p">,</span>
  <span class="n">PERCENTILE_CONT</span><span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="p">)</span> <span class="n">WITHIN</span> <span class="k">GROUP</span> <span class="p">(</span><span class="k">ORDER</span> <span class="k">BY</span> <span class="n">journeys_analytics_daily</span><span class="p">.</span><span class="n">median_duration</span><span class="p">)</span> <span class="k">AS</span> <span class="n">median_duration</span>
<span class="k">FROM</span> <span class="n">journeys_analytics_daily</span>
<span class="k">WHERE</span>
  <span class="n">workflow_runs_analytics_daily</span><span class="p">.</span><span class="n">client_id</span> <span class="o">=</span> <span class="s1">'4cffca63-7f1f-48b6-a8bc-5b39b515d854'</span> <span class="k">AND</span>
  <span class="n">workflow_runs_analytics_daily</span><span class="p">.</span><span class="n">journey_id</span> <span class="k">IN</span> <span class="s1">'87536aa8-de39-4428-9567-5824287111ff'</span> <span class="k">AND</span>
  <span class="c1">-- and so on</span>
</code></pre></div></div>

<h3 id="time-series">Time-series</h3>

<p>One thing you may want to show your customers is the progress over time. If your metric is “per-day”, the data’s already aggregated by day! One easy way to accomplish it is then to set a <code class="language-plaintext highlighter-rouge">"by"</code> parameter, and allow <code class="language-plaintext highlighter-rouge">"day"</code>, or even <code class="language-plaintext highlighter-rouge">"definition_id"</code> (if you’d rather want to show statistics by definition) as possible values:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># api</span>
<span class="n">route</span> <span class="k">do</span> <span class="o">|</span><span class="n">r</span><span class="o">|</span>
  <span class="n">client_id</span> <span class="o">=</span> <span class="n">read_client_id_from_session</span>
  <span class="n">r</span><span class="p">.</span><span class="nf">is</span> <span class="s2">"journey-analytics"</span> <span class="k">do</span>
  <span class="c1"># GET /analytics request</span>
    <span class="n">r</span><span class="p">.</span><span class="nf">get</span> <span class="k">do</span>
      <span class="c1"># data fetching delegated to separate module</span>
      <span class="no">JourneyAnalyticsQuery</span><span class="p">.</span><span class="nf">call</span><span class="p">(</span>
        <span class="n">client_id</span><span class="p">,</span>
        <span class="n">request</span><span class="p">.</span><span class="nf">params</span><span class="p">[</span><span class="s2">"journey_id"</span><span class="p">],</span>
        <span class="n">request</span><span class="p">.</span><span class="nf">params</span><span class="p">[</span><span class="s2">"definition_id"</span><span class="p">],</span>
        <span class="n">request</span><span class="p">.</span><span class="nf">params</span><span class="p">[</span><span class="s2">"by"</span><span class="p">],</span>
      <span class="p">)</span>
    <span class="k">end</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># module</span>
<span class="k">module</span> <span class="nn">JourneyAnalyticsQuery</span>
  <span class="kp">module_function</span>

  <span class="k">def</span> <span class="nf">call</span><span class="p">(</span>
    <span class="n">client_id</span><span class="p">,</span>
    <span class="n">journey_id</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">,</span>
    <span class="n">definition_id</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">,</span>
    <span class="n">before</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">.</span>
    <span class="nf">afer</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">,</span>
    <span class="n">by</span> <span class="o">=</span> <span class="kp">nil</span><span class="p">,</span> <span class="c1"># or ["day", "definition_id"]</span>
  <span class="p">)</span>
    <span class="n">query</span> <span class="o">=</span> <span class="no">DB</span><span class="p">[</span><span class="ss">:journeys_analytics_daily</span><span class="p">]</span>

    <span class="c1"># ...</span>

    <span class="n">query</span> <span class="o">=</span> <span class="n">query</span><span class="p">.</span><span class="nf">group_by</span><span class="p">(</span><span class="o">*</span><span class="n">by</span><span class="p">.</span><span class="nf">map</span><span class="p">(</span><span class="o">&amp;</span><span class="ss">:to_sym</span><span class="p">))</span> <span class="k">if</span> <span class="n">by</span>

    <span class="c1"># ...</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>this will apply a <code class="language-plaintext highlighter-rouge">GROUP BY</code> clause to the query above, generating a distribution per row of the select grouping keys.</p>

<p>With such an endpoint, you can start creating a few useful dashboards and features!</p>

<h2 id="going-forward">Going forward</h2>

<p>If you manage to get here, congratulations! Now go do that MVP!</p>

<p>I hope this post shows how powerful the <code class="language-plaintext highlighter-rouge">ruby</code>/<code class="language-plaintext highlighter-rouge">sequel</code>/<code class="language-plaintext highlighter-rouge">postgresql</code> combo is, and how much adaptability it provides as your requirements change. This is, after all, the foundation on top of which you’ll build everything else.</p>

<p>And now it’s up to you to decide what to do next: is “by day” too big of an aggregation interval? You can adjust the aggregation time index interval. You can, i.e. choose to aggregate per hour; or you can use the same strategy to aggregate to separate tables strategically and aggregate, i.e. hourly, daily and/or weekly, thereby ensuring performance of your queries according to the desired range. You can ingest to one table, and ingest “indirectly” into the other by using <a href="https://www.postgresql.org/docs/current/sql-createtrigger.html">database triggers</a>; or you can aggregate periodically using cronjobs, if you don’t need “soft real time”.</p>

<p>In time, <a href="https://www.postgresql.org/docs/current/ddl-partitioning.html">Postgres range partitioning</a> can further help you keeping your queries performing responsively. You can then follow the instructions <a href="https://janko.io/anything-i-want-with-sequel-and-postgres/">of this blog post, which explains how to do range partitioning using sequel</a>, which is just another example of these two technologies working in harmony.</p>

<p>And when none of that works anymore, time to build the spaceship. Hope you made some money by then!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[At my dayjob, I’ve been working, for the most of this year of our lord 2022, in a team taking this new flagship product from “alpha” to “general availability”. With products in such an early stage, you don’t know a lot of things: what your users want, how they will use (or you want them to use) the platform, whether the thing you’re building is as valuable as you think it is. In such a stage, the most important skill you should have, as a team building and maintaining a product, is to be able to ship features quickly; the sooner you know what “sticks” with your userbase, the sooner you’ll know how worthwhile will it be to improve it, whether to “pivot” to something else, or whether you’re better off throwing it all away.]]></summary></entry><entry><title type="html">How to “bundle install” in deployment mode, using bundler in docker</title><link href="honeyryderchuck.gitlab.io/2022/10/03/how-to-bundle-production-mode-in-docker.html" rel="alternate" type="text/html" title="How to “bundle install” in deployment mode, using bundler in docker" /><published>2022-10-03T00:00:00+00:00</published><updated>2022-10-03T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2022/10/03/how-to-bundle-production-mode-in-docker</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2022/10/03/how-to-bundle-production-mode-in-docker.html"><![CDATA[<p><strong>tl;dr</strong>: <code class="language-plaintext highlighter-rouge">BUNDLE_PATH=$GEM_HOME</code>.</p>

<p>I was recently setting up the deployment of a <code class="language-plaintext highlighter-rouge">ruby</code> service, in my employer’s production environment, which uses <a href="https://aws.amazon.com/pt/eks/">EKS on AWS</a> and <a href="https://docs.docker.com/get-docker/">docker</a> containers. This time though, I wanted to try how hard would be to generate a production image, as well the dev/test one we use in CI, from the same <a href="https://docs.docker.com/engine/reference/builder/">Dockerfile</a>.</p>

<p>I figured that it was just a matter of juggling the right combination of <a href="https://docs.docker.com/engine/reference/builder/">ARG</a> and <a href="https://docs.docker.com/compose/environment-variables/">ENV</a> declarations. And while I was right, I thought the outcome was worth documenting in a blog post about, in order to spare the next rubyist suffering when going down the same path. And while I can still appreciate <code class="language-plaintext highlighter-rouge">bundler</code>’s role and leadership in the <code class="language-plaintext highlighter-rouge">ruby</code> community, and array of features and configurability, its defaults and user/permissions handling leave some to be desired.</p>

<h2 id="development-setup">Development setup</h2>

<p>The initial Dockerfile used for development looked roughly like this:</p>

<div class="language-Dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">FROM</span><span class="s"> ruby:3.1.2-bullseye</span>

<span class="k">LABEL</span><span class="s"> maintainer=me</span>

<span class="k">RUN </span>adduser <span class="nt">--disabled-password</span> <span class="nt">--gecos</span> <span class="s1">''</span> app <span class="se">\
</span>    <span class="o">&amp;&amp;</span> <span class="nb">mkdir</span> <span class="nt">-p</span> /home/service <span class="se">\
</span>    <span class="o">&amp;&amp;</span> <span class="nb">chown </span>app:app /home/service

<span class="k">USER</span><span class="s"> app:app</span>

<span class="k">WORKDIR</span><span class="s"> /home/service</span>

<span class="k">COPY</span><span class="s"> --chown=app:app Gemfile Gemfile.lock /home/service</span>

<span class="k">RUN </span>bundle <span class="nb">install</span>
<span class="k">COPY</span><span class="s"> --chown=app:app . .</span>

<span class="k">CMD</span><span class="s"> ["bundle", "exec", "start-it-up"]</span>
</code></pre></div></div>

<p>The Gemfile was very simple, with a test group:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Gemfile</span>

<span class="n">source</span> <span class="s2">"https://rubygems.org"</span>

<span class="n">gem</span> <span class="s2">"rake"</span>
<span class="n">gem</span> <span class="s2">"zeitwerk"</span>
<span class="n">gem</span> <span class="s2">"sentry-ruby"</span>
<span class="c1"># ...</span>

<span class="n">group</span> <span class="ss">:test</span> <span class="k">do</span>
  <span class="n">gem</span> <span class="s2">"minitest"</span>
  <span class="n">gem</span> <span class="s2">"standard"</span>
  <span class="n">gem</span> <span class="s2">"debug"</span>
  <span class="c1"># ...</span>
<span class="k">end</span>
</code></pre></div></div>

<p>This was all tied up locally using <a href="https://docs.docker.com/get-started/08_using_compose/">Docker Compose</a>, where the service declaration looked like this:</p>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># docker-compose.yml</span>

<span class="na">services</span><span class="pi">:</span>
  <span class="na">foo</span><span class="pi">:</span>
    <span class="na">env_file</span><span class="pi">:</span> <span class="s">.env</span>
    <span class="na">volumes</span><span class="pi">:</span>
      <span class="pi">-</span> <span class="s">./:/home/service</span>
</code></pre></div></div>

<p>This setup worked well locally, and was reused to run the tests in CI (we use <a href="https://docs.gitlab.com/runner/executors/docker.html">Gitlab CI docker executors</a>).</p>

<p>It was ready to go to production.</p>

<h2 id="bundler-in-production">bundler in production</h2>

<p><a href="https://bundler.io/guides/deploying.html">Bundler how to deploy page</a> gives you a simple advice: <code class="language-plaintext highlighter-rouge">bundle install --deployment</code> and you’re good to go. My use-case wasn’t as simple though, as I wanted to follow some best practices from the get-go, rather than retrofitting them when it’s too costly to do so.</p>

<p>For once, I didn’t want to install test dependencies in the final production image (benefit: leaner production image, less exposure to vulnerabilities I don’t need in servers). I also didn’t want to use commmand-line options, as dealing with the development/production options would make my single Dockerfile harder to read. Fortunately, <a href="https://bundler.io/man/bundle-config.1.html">bundler covers that by supporting environment variables for configuration</a>:</p>

<div class="language-Dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Dockerfile</span>
<span class="k">FROM</span><span class="s"> ruby:3.1.2-bullseye</span>

<span class="c"># to declare which bundler groups to ignore, aka bundle install --without</span>
<span class="k">ARG</span><span class="s"> BUNDLE_WITHOUT</span>
</code></pre></div></div>

<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># .gitlab-ci.yml</span>

<span class="na">Build Production Image</span><span class="pi">:</span>
  <span class="na">variables</span><span class="pi">:</span>
    <span class="na">DOCKER_BUILD_ARGS</span><span class="pi">:</span> <span class="s2">"</span><span class="s">BUNDLE_DEPLOYMENT=1</span><span class="nv"> </span><span class="s">BUNDLE_WITHOUT=test"</span>
  <span class="na">script</span><span class="pi">:</span>
    <span class="pi">-</span> <span class="s">docker build ${DOCKER_BUILD_ARGS} ...</span>
</code></pre></div></div>

<div class="language-yml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># kubernetes service.yml</span>
<span class="na">env</span><span class="pi">:</span>
  <span class="na">BUNDLE_WITHOUT</span><span class="pi">:</span>
    <span class="na">value</span><span class="pi">:</span> <span class="s2">"</span><span class="s">test"</span>
  <span class="na">BUNDLE_DEPLOYMENT</span><span class="pi">:</span>
    <span class="na">value</span><span class="pi">:</span> <span class="m">1</span>
</code></pre></div></div>

<p>Simple, right? So I thought, so I deployed. And the service didn’t boot. Looking at the logs, I was seeing a variation of the following error:</p>

<pre><code class="language-log">Could not find rake-13.0.6, zeitwerk-2.6.0, ...(the rest) in any of the sources (Bundler::GemNotFound)
</code></pre>

<p>I couldn’t figure out. It worked on my machine. And I vaguely remembered doing similar work in the past. So I start googling for “ruby dockerfile setup”, only to find similar dockerfiles. I initialize a pod, and quickly check for <code class="language-plaintext highlighter-rouge">GEM_PATH</code>, pointing to <code class="language-plaintext highlighter-rouge">/usr/local/bundle</code>, and nothing was there in fact.</p>

<p>I then spent the next two days, playing with several other bundler flags, adding, removing, editing them, trying to get to a positive outcome, and in the process almost giving up the idea altogether.</p>

<p>But this post is not about the journey. It’s about the solution. Which eventually became clear.</p>

<h2 id="root-non-root-bundler-and-rubygems">Root, non-root, bundler, and rubygems</h2>

<p>The main difference between my dockerfile, and most of the “ruby docker” examples on the web: I wasn’t running the process as root.</p>

<p>The <a href="https://github.com/docker-library/ruby/blob/master/3.1/bullseye/Dockerfile">ruby base image</a> sets up some variables, some of them involving <code class="language-plaintext highlighter-rouge">bundler</code> and <code class="language-plaintext highlighter-rouge">rubygems</code> (both ship with ruby as “bundled gems”):</p>

<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># from ruby 3.1.2 bullseye dockerfile</span>

<span class="c"># don't create ".bundle" in all our apps</span>
<span class="k">ENV</span><span class="s"> GEM_HOME /usr/local/bundle</span>
<span class="k">ENV</span><span class="s"> BUNDLE_SILENCE_ROOT_WARNING=1 \</span>
	BUNDLE_APP_CONFIG="$GEM_HOME"
<span class="k">ENV</span><span class="s"> PATH $GEM_HOME/bin:$PATH</span>
<span class="c"># adjust permissions of a few directories for running "gem install" as an arbitrary user</span>
<span class="k">RUN </span><span class="nb">mkdir</span> <span class="nt">-p</span> <span class="s2">"</span><span class="nv">$GEM_HOME</span><span class="s2">"</span> <span class="o">&amp;&amp;</span> <span class="nb">chmod </span>777 <span class="s2">"</span><span class="nv">$GEM_HOME</span><span class="s2">"</span>
</code></pre></div></div>

<p>This means that:</p>

<ul>
  <li>gems are installed in <code class="language-plaintext highlighter-rouge">$GEM_HOME</code>;</li>
  <li>gem-installed binstubs are accessible in the <code class="language-plaintext highlighter-rouge">$PATH</code>;</li>
  <li><code class="language-plaintext highlighter-rouge">bundler</code> configs can be found under <code class="language-plaintext highlighter-rouge">$GEM_HOME</code>;</li>
</ul>

<p>When I switch to a non-privileged user, as the initial Dockerfile shows, and run <code class="language-plaintext highlighter-rouge">bundle install</code>, gems are installed under <code class="language-plaintext highlighter-rouge">$GEM_HOME/gems</code>; executables are under <code class="language-plaintext highlighter-rouge">$GEM_HOME/bin</code>. It works on my machine.</p>

<p>But when I do it with <code class="language-plaintext highlighter-rouge">BUNDLE_DEPLOYMENT=1</code>? Gems still get installed in the same place. Executables too. But running <code class="language-plaintext highlighter-rouge">bundle exec</code> breaks. That’s because, in deployment mode, <code class="language-plaintext highlighter-rouge">bundler</code> sets its internal bundle path, used for dependency resolution and lookup, <a href="https://github.com/rubygems/rubygems/blob/def27af571af48f7375cc0bdc58b845122dcb5b4/bundler/lib/bundler/settings.rb#L4">to <code class="language-plaintext highlighter-rouge">"vendor/bundle"</code></a>.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># from lib/bundler/settings.rb</span>
<span class="k">def</span> <span class="nf">path</span>
  <span class="n">configs</span><span class="p">.</span><span class="nf">each</span> <span class="k">do</span> <span class="o">|</span><span class="n">_level</span><span class="p">,</span> <span class="n">settings</span><span class="o">|</span>
    <span class="n">path</span> <span class="o">=</span> <span class="n">value_for</span><span class="p">(</span><span class="s2">"path"</span><span class="p">,</span> <span class="n">settings</span><span class="p">)</span>
    <span class="n">path</span> <span class="o">=</span> <span class="s2">"vendor/bundle"</span> <span class="k">if</span> <span class="n">value_for</span><span class="p">(</span><span class="s2">"deployment"</span><span class="p">,</span> <span class="n">settings</span><span class="p">)</span> <span class="o">&amp;&amp;</span> <span class="n">path</span><span class="p">.</span><span class="nf">nil?</span>
    <span class="c1"># ...</span>
</code></pre></div></div>

<p>But there’s nothing there, because as it was mentioned, gems were installed under <code class="language-plaintext highlighter-rouge">$GEM_HOME</code>.</p>

<p>So the solution is right in the line above: just set the bundle path. The most straightforward way to do this in this setup was via <code class="language-plaintext highlighter-rouge">BUNDLE_PATH</code>:</p>

<div class="language-dockerfile highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># Dockerfile</span>
<span class="k">ENV</span><span class="s"> BUNDLE_PATH $GEM_HOME</span>
<span class="c"># and now, you can bundle exec</span>
</code></pre></div></div>

<p>That’s it. Annoying, but simple to fix.</p>

<h2 id="conclusion">Conclusion</h2>

<p>While the solution was very straightforward (patch this environment variable and you’re good to go), it took me some time and a lot of trial and error to get there. Due to a combination of factors.</p>

<p>First one is docker defaults and best practices; while it’s been known for some time in the security realm that <a href="https://stackoverflow.com/questions/68155641/should-i-run-things-inside-a-docker-container-as-non-root-for-safety">“thou shalt not run containers as root”</a>, if I type “dockerfile ruby” in google, from the <a href="https://lipanski.com/posts/dockerfile-ruby-best-practices">first</a> <a href="https://semaphoreci.com/community/tutorials/dockerizing-a-ruby-on-rails-application">5</a> <a href="https://www.cloudbees.com/blog/build-minimal-docker-container-ruby-apps">relevant</a> <a href="https://www.digitalocean.com/community/tutorials/containerizing-a-ruby-on-rails-application-for-development-with-docker-compose">results</a> <a href="https://docs.docker.com/samples/rails/">I</a> get (the last one being docker official recommendation for using <code class="language-plaintext highlighter-rouge">compose</code> and <code class="language-plaintext highlighter-rouge">rails</code>), only one of them sets a non-privileged user for running the container. And that single example does it <strong>after</strong> running <code class="language-plaintext highlighter-rouge">bundle install</code>.</p>

<p>Why is it important to run <code class="language-plaintext highlighter-rouge">bundle install</code> as non-root? You can read the details in <a href="https://snyk.io/blog/ruby-gem-installation-lockfile-injection-attacks/">this Snyk blog post</a>, but the tl;dr is, if the gem requires compiling C extensions, a <a href="https://blog.costan.us/2008/11/post-install-post-update-scripts-for.html">post-install callback can be invoked</a> which allows arbitrary code to run with the privileges of the user invoking <code class="language-plaintext highlighter-rouge">bundle install</code>, which becomes a privilege escalation attack when exploited.</p>

<p>Why does <code class="language-plaintext highlighter-rouge">bundler</code> default to setting <code class="language-plaintext highlighter-rouge">"vendor/bundle"</code> as the default gems lookup dir, which is different than the default gem install dir, when deployment-mode is activated? I have no idea. I’d say it looks like a bug, as <a href="https://github.com/rubygems/rubygems/blob/def27af571af48f7375cc0bdc58b845122dcb5b4/bundler/lib/bundler/man/bundle-install.1.ronn#deployment-mode">the docs do say that gems are installed to “vendor/bundle” in deployment mode</a>, and ruby docker defaults overriding <code class="language-plaintext highlighter-rouge">GEM_HOME</code> causes <code class="language-plaintext highlighter-rouge">bundler</code> to use it to install gems, but then it gets ignored for path lookups? But somehow works when user can <code class="language-plaintext highlighter-rouge">sudo</code>? Do <code class="language-plaintext highlighter-rouge">bundler</code> and <code class="language-plaintext highlighter-rouge">rubygems</code> still have a few misalignments to work out? <code class="language-plaintext highlighter-rouge">bundler</code> defaults don’t seem to be the sanest, as <a href="https://felipec.wordpress.com/2022/08/25/fixing-ruby-gems-installation/">this blog post puts it, whether you agree with the tone or not</a>, it can definitely do better.</p>

<p>But don’t get me wrong, as it’s still better than dealing with the absolute scorched earth equivalent in <code class="language-plaintext highlighter-rouge">python</code> or <code class="language-plaintext highlighter-rouge">nodejs</code>.</p>

<p>No bundler options were deprecated while performing these reproductions.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[tl;dr: BUNDLE_PATH=$GEM_HOME.]]></summary></entry><entry><title type="html">Standing on the shoulders of giants and leaky abstractions</title><link href="honeyryderchuck.gitlab.io/2022/05/04/standing-on-the-shoulders-of-giants-and-leaky-abstractions.html" rel="alternate" type="text/html" title="Standing on the shoulders of giants and leaky abstractions" /><published>2022-05-04T00:00:00+00:00</published><updated>2022-05-04T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2022/05/04/standing-on-the-shoulders-of-giants-and-leaky-abstractions</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2022/05/04/standing-on-the-shoulders-of-giants-and-leaky-abstractions.html"><![CDATA[<p>Recently, a <a href="https://old.reddit.com/r/ruby/comments/tqlhsw/how_to_use_activerecord_in_a_library/">blog post about how to use activerecord as a library was shared on r/ruby</a>, which started an interesting discussion thread (where I was involved) from the premise “instead of using activerecord out of the rails, why not sequel”? While several arguments were made both for and against the premise, it felt that, at times, discussion deviated towards the merits of <code class="language-plaintext highlighter-rouge">sequel</code> vs. <code class="language-plaintext highlighter-rouge">activerecord</code>, rather than using or building a gem on top of them, as a dependency; and as usual in the social network sphere, comments may have been misunderstood, everybody went their separate ways, and the Earth completed another orbit around the sun.</p>

<p>While the topic of which of the ORMs <a href="https://samsaffron.com/archive/2018/06/01/an-analysis-of-memory-bloat-in-active-record-5-2">has better performance</a>, <a href="https://janko.io/ode-to-sequel/">more useful features</a>, <a href="https://ruby.libhunt.com/compare-sequel-vs-activerecord">is more popular</a> or has more plugins, has been discussed <em>ad eternum</em>, most of them start from the premise of the ORM as a primary dependency, exposed to the application developer. This usually leads to less technical, more “pragmatic” discussions, given how usually, constraints around the choice of tech stack is established by “less technical more political” reasons, i.e. whatever the CTO likes more, or whatever the team is most familiar with, what can the company find more specialists for, or risk appettite in experimenting with alternative stacks.</p>

<p>But if you’re building a library, then picking any DB library/ORM as a dependency which does not “leak” to the end user (or just a little sometimes), can make one weigh alternatives differently. What’s the maintenance burden ratio gonna look like? How hard will it be to support the API as new versions come along? Will the API change a lot? Does it support all the features my library requires? Will it be community-friendly, will I get help maintaining it? These questions aren’t limited to the case of relying on a db library, they’re also valid when considering building on top of any 3rd party dependency, like a web framework or HTTP client.</p>

<p>So on the topic, I’ll share my opinion on the matter based on my experience as an OSS maintainer building on top of <code class="language-plaintext highlighter-rouge">sequel</code> versus an alternative built for rails (and therefore, <code class="language-plaintext highlighter-rouge">activerecord</code>).</p>

<h2 id="rodauth-oauth-vs-doorkeeper">rodauth-oauth vs doorkeeper</h2>

<p>I’m the maintainer of <a href="https://honeyryderchuck.gitlab.io/rodauth-oauth/">rodauth-oauth</a>, the most complete and featureful OAuth/OIDC provider framework in the ruby ecosystem. This claim is backed by it being the ruby gem implementing the most OAuth 2.0 and OIDC RFCs.</p>

<p>It’s far from the most popular though, which is <a href="https://github.com/doorkeeper-gem/doorkeeper">doorkeeper</a>. The huge gap between them in terms of popularity can be explained by <code class="language-plaintext highlighter-rouge">doorkeeper</code> having existed for +10 years and gone through the “ruby hype” years, whereas <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> has only existed since 2020. But it’s nonetheless the reference implementation in the OAuth provider space, and both <a href="https://gitlab.com/">GitLab</a> and <a href="https://mastodon.social/about">Mastodon</a> are known products using it in production.</p>

<p>Tech-wise, <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> is built on top of the <a href="http://rodauth.jeremyevans.net/">rodauth</a>/<a href="http://roda.jeremyevans.net/">roda</a>/<a href="http://sequel.jeremyevans.net/">sequel</a> stack, whereas <code class="language-plaintext highlighter-rouge">doorkeeper</code> is a rails-only gem, managed as a classic rails engine, just like <a href="https://github.com/heartcombo/devise">devise</a>.</p>

<p>Product-wise, <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> has more features and covers more of the <a href="https://oauth.net/specs/">OAuth</a> and <a href="https://openid.net/developers/specs/">OpenID</a> specs (check <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth/-/wikis/Home#comparisons">this feature matrix</a>); these are shipped and can be tested together. The <code class="language-plaintext highlighter-rouge">doorkeeper</code> gem is not as comprehensive: it ships with support for opaque tokens only, the original 4 OAuth 2.0 grant flows (+ refresh code grant), and PKCE; it has a bigger community of both users and contributors, and some of the missing features are provided by the community as 3rd-party “entension” gems (which, as usual in such a setup, not always work well together. As an example, <a href="https://github.com/doorkeeper-gem/doorkeeper-jwt/blob/master/doorkeeper-jwt.gemspec#L25">doorkeeper-jwt</a> and <a href="https://github.com/doorkeeper-gem/doorkeeper-openid_connect/blob/master/doorkeeper-openid_connect.gemspec#L28">doorkeeper-openid_connect</a> don’t even agree on which JWT library to use).</p>

<h2 id="building-for-rails-vs-building-for-rodauth">Building for rails vs. building for rodauth</h2>

<p><code class="language-plaintext highlighter-rouge">rails</code> being the most used framework in the ruby ecosystem, you’ll have a hard time getting your gem adopted if it doesn’t work on rails.</p>

<p>Although built in a different stack, <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> can be used with rails, thanks to <a href="https://github.com/janko/rodauth-rails">rodauth-rails</a>, which does the heavy lifting of providing a sane default configuration for rails, as well as a few handy rake tasks (the author published <a href="https://janko.io/how-i-enabled-sequel-to-reuse-active-record-connection/">a blog post recently about how sequel reuses activerecord connection pool in rodauth-rails</a> which is very enlightening).</p>

<p><code class="language-plaintext highlighter-rouge">doorkeeper</code> ships as a rails engine, and in a very similar way to <code class="language-plaintext highlighter-rouge">devise</code>: a <code class="language-plaintext highlighter-rouge">doorkeeper:install</code> generator to bootstrap config files and database migrations, a route helper to load <code class="language-plaintext highlighter-rouge">doorkeeper</code> routes, default views and controllers one may copy to app folders and costumize or not, and an initializer where most of the configuration happens. By using “vanilla rails” features, one can say that, at least from the “looking for an OAuth provider gem for my rails app” angle, that <code class="language-plaintext highlighter-rouge">doorkeeper</code> seems like the obvious choice.</p>

<p>That said, building a gem targeting rails first brings a lot of maintenance baggage with it.</p>

<h3 id="release-policy">Release policy</h3>

<p>Every year since 2004, there’s a new major/minor version of rails which gets released to as much fanfare and enthusiasm by the people looking forward to new features, as well as dread and despair by the people in charge of upgrading the rails version in huge production apps. That’s because rails upgrades tend to change a lot of APIs, often in a breaking way, which may require months of multiple developers time to upgrade. While one can argue about the point of a few of those changes, or just repeat that rails does not follow SemVer, that’s just a fact. Which also impacts libraries built for rails.</p>

<p><code class="language-plaintext highlighter-rouge">doorkeeper</code> covers a lot of rails API “surface”, which means that, inevitably, it is affected by these changes, and a certain amount of time and energy has to be invested yearly in fixing and adapting them as well (this is not a <code class="language-plaintext highlighter-rouge">doorkeeper</code>-only phenomenom, any gem building on rails goes through the same).</p>

<p>Due to the simple and stable APIs and commmitment to backwards compatibility from the roda/sequel/rodauth stack, <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> has not had to release a fix due to backwards-incompatible APIs yet. The rails integration bits have also been stable, although they cover less rails API “surface” in comparison (just generators and view templates).</p>

<p>(Take this analysis with a grain of salt, as <code class="language-plaintext highlighter-rouge">doorkeeper</code> blast radius is wider.)</p>

<h3 id="community-practices">Community practices</h3>

<p>A lot of rails “convention over configuration” culture is all over <code class="language-plaintext highlighter-rouge">activesupport</code>. And a lot of practices exposed via its public APIs become teaching subject of “how to do” in rails, also sometimes called the Rails Way. The practice I’ll focus on is the “class to tag to class again”, whereas, given a class, <code class="language-plaintext highlighter-rouge">ToothPick</code>, or an instance of it, certain operations (such as, i.e. calculating html tag ids) will automatically infer <code class="language-plaintext highlighter-rouge">"tooth_pick"</code> (or <code class="language-plaintext highlighter-rouge">:tooth_pick</code>) by applying a sequence of operations on the class name, namely <code class="language-plaintext highlighter-rouge">.demodulize</code> and <code class="language-plaintext highlighter-rouge">.underscore</code>, and in some other cases, such as deserialization, the inverse set of operations, i.e. <code class="language-plaintext highlighter-rouge">classify</code> and <code class="language-plaintext highlighter-rouge">constantize</code>, will be applied to infer the class from the string tag.</p>

<p>It’s, for instance, how you do <code class="language-plaintext highlighter-rouge">form_for @tooth_pick</code>, and a <code class="language-plaintext highlighter-rouge">&lt;form id="tooth_pick"&gt;</code> tag is automatically created. This blueprint can be found all over rails and rails-only gems.</p>

<p>Instead of telling what I find about this practice, I’ll show an example where this creates limitations, namely, <code class="language-plaintext highlighter-rouge">doorkeeper</code> inability of supporting the <a href="https://github.com/doorkeeper-gem/doorkeeper/issues/764">saml2 bearer grant</a>, or any other assertion grant type <a href="https://datatracker.ietf.org/doc/html/rfc7521#section-4.1">as defined by the IETF</a>.</p>

<p><code class="language-plaintext highlighter-rouge">doorkeeper</code> allows one to enable grant flows via an initializer option:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># config/initializers/doorkeeper.rb</span>
<span class="no">Doorkeeper</span><span class="p">.</span><span class="nf">configure</span> <span class="k">do</span>
  <span class="n">grant_flows</span> <span class="p">[</span><span class="s2">"client_credentials"</span><span class="p">]</span>
<span class="k">end</span>
</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">"client_credentials"</code> grant flow is implemented by many resources with <code class="language-plaintext highlighter-rouge">ClientCredentials</code> in its namespace: there’s a <code class="language-plaintext highlighter-rouge">Doorkeeper::Request::ClientCredentials</code>, a <code class="language-plaintext highlighter-rouge">Doorkeeper::OAuth::ClientCredentials::Validator</code>, a <code class="language-plaintext highlighter-rouge">Doorkeeper::OAuth::ClientCredentials::Issuer</code>, and so on. All of these will be auto-inferred at some point in the execution of the program thanks to the sequence of the transformations explained above.</p>

<p>This works well when your grant flow is called <code class="language-plaintext highlighter-rouge">"client_credentials"</code>, but not when it’s called <code class="language-plaintext highlighter-rouge">"urn:ietf:params:oauth:grant-type:saml2-bearer"</code>.</p>

<p>This situation is exacerbated by the refusal of <code class="language-plaintext highlighter-rouge">doorkeeper</code> maintainers of supporting any of these features themselves, instead suggesting the community to rather do it as “extension” gems (<code class="language-plaintext highlighter-rouge">devise</code> also does the same). This creates a problem of incentives, where a fundamental risky (and potentially breaking) change is required in the “base” gem for this extension to be unlocked, however the “base” gem gets little from it beyond burden of maintenance, so is thereby reluctant to commit the change, whereas someone willing to develop the extension gem may stop at the workarounds necessary to support an edge-case the “base” gem never considered, and the community gets nothing in the process.</p>

<p>None of the above apply to <code class="language-plaintext highlighter-rouge">rodauth-oauth</code>, given that grant flow identifiers do not have to map to anything internally (they’re just literals), and oauth extensions ship and are tested together (shipping extra functionality as a standalone gem is certainly possible, but I encourage anyone to contribute to mainline as long as it’s about OAuth).</p>

<p>If we move away from the macro perspective of “building on top of a web/auth framework” back to “building on top of ActiveRecord vs. Sequel”, there are also interesting points to discuss.</p>

<h3 id="activerecord-vs-sequel">ActiveRecord vs. Sequel</h3>

<p>A point that arguably needs little discussion is that <code class="language-plaintext highlighter-rouge">sequel</code> is the most flexible and featureful DB toolkit in ruby, whereas <code class="language-plaintext highlighter-rouge">activerecord</code> is certainly more popular and has more available plugins/extensions. And while the latter may turn the tables in favour of <code class="language-plaintext highlighter-rouge">activerecord</code> when it comes to supporting a particular use-case or feature, in most cases, when building a library with DB functionality abstracted away from your end user, one will tilt to the solution which allows one to write the most terse, simple and maintainable code. In most cases, that’d be <code class="language-plaintext highlighter-rouge">sequel</code>, and that’s exactly the choice many libraries have made.</p>

<p>Except if you’re building on top of rails, where it’s probably best to stick to the defaults, and your default will be <code class="language-plaintext highlighter-rouge">activerecord</code>. <code class="language-plaintext highlighter-rouge">doorkeeper</code> falls in the latter case; it ships with support for <code class="language-plaintext highlighter-rouge">activerecord</code>, although there are other community-maintained extensions supporting <a href="https://github.com/nbulaj/doorkeeper-sequel">sequel</a> or <a href="https://github.com/acaprojects/doorkeeper-couchbase">couchbase</a> (how well do they work? No idea, but one of them as seen no updates in 6 years).</p>

<p><code class="language-plaintext highlighter-rouge">rodauth-oauth</code> builds on top of <code class="language-plaintext highlighter-rouge">rodauth</code>, which uses <code class="language-plaintext highlighter-rouge">sequel</code> under the hood. However, what’s worth mentioning here is that the ORM layer isn’t used at all; instead, only the dataset API (aka <code class="language-plaintext highlighter-rouge">sequel/core</code>) is used. This has several performance benefits (lower memory footprint, faster by skipping <em>to-model</em> transformations), while also allowing the maintainer to focus on “required data for the functionality” data access patterns, and keeping the other advantages of building on top of a general db library rather than the db client adapters directly (i.e. <a href="http://sequel.jeremyevans.net/rdoc/files/doc/opening_databases_rdoc.html">free support for a multitude of databases</a>).</p>

<p>Recently, a <a href="https://github.com/doorkeeper-gem/doorkeeper/pull/1542">performance-related issue</a> was reported in the <code class="language-plaintext highlighter-rouge">doorkeeper</code> repo which got my attention.</p>

<p>In <code class="language-plaintext highlighter-rouge">doorkeeper</code>, one can avoid creating multiple access tokens for the same account/client application, by reusing an existing and valid access token, via the <a href="https://github.com/doorkeeper-gem/doorkeeper/issues/383">reuse_access_token option</a>. This works by performing a database lookup for an access token for the given account/client application which has not expired yet.</p>

<p>The version prior to the pull request shared above used a fairly naive heuristic: it would load all access tokens for the given account/client application (in memory, AR instances), then it would return the first one which hadn’t expired. Hardly a problem while your tables are small, this could potentially grind your application to a halt as tables grow and a sufficiently ammount of access tokens have been emitted for each user.</p>

<p>The solution was clear: eliminate the expired access tokens from the returned dataset. Given access tokens store the <code class="language-plaintext highlighter-rouge">expires_in</code> seconds, this required reaching for SQL time-based operations to build a query which could accomplish that. There’s just one problem: <code class="language-plaintext highlighter-rouge">activerecord</code> does not provide functions for that. So in order to fix the performance issue, <code class="language-plaintext highlighter-rouge">doorkeeper</code> had to <a href="https://github.com/doorkeeper-gem/doorkeeper/blob/b67046ee2d81c1c1d5017d62b6550ca1d273e13e/lib/doorkeeper/models/concerns/expiration_time_sql_math.rb#L17">drop down to raw SQL, for all supported database engines</a>:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># mysql</span>
<span class="no">Arel</span><span class="p">.</span><span class="nf">sql</span><span class="p">(</span><span class="s2">"DATE_ADD(</span><span class="si">#{</span><span class="n">table_name</span><span class="si">}</span><span class="s2">.created_at, INTERVAL </span><span class="si">#{</span><span class="n">table_name</span><span class="si">}</span><span class="s2">.expires_in SECOND)"</span><span class="p">)</span>
<span class="c1"># sqlite</span>
<span class="no">Arel</span><span class="p">.</span><span class="nf">sql</span><span class="p">(</span><span class="s2">"DATETIME(</span><span class="si">#{</span><span class="n">table_name</span><span class="si">}</span><span class="s2">.created_at, '+' || </span><span class="si">#{</span><span class="n">table_name</span><span class="si">}</span><span class="s2">.expires_in || ' SECONDS')"</span><span class="p">)</span>
<span class="c1"># postgres</span>
<span class="no">Arel</span><span class="p">.</span><span class="nf">sql</span><span class="p">(</span><span class="s2">"</span><span class="si">#{</span><span class="n">table_name</span><span class="si">}</span><span class="s2">.created_at + </span><span class="si">#{</span><span class="n">table_name</span><span class="si">}</span><span class="s2">.expires_in * INTERVAL '1 SECOND'"</span><span class="p">)</span>
<span class="c1"># and so on...</span>
</code></pre></div></div>

<p>And so, in this way, some raw SQL just leaked.</p>

<p><code class="language-plaintext highlighter-rouge">rodauth-oauth</code> also supports this feature, but it does not suffer from the same issue, for 2 key reasons. First, it uses a <code class="language-plaintext highlighter-rouge">sequel</code> plugin which <a href="http://sequel.jeremyevans.net/rdoc-plugins/files/lib/sequel/extensions/date_arithmetic_rb.html">adds DSL to support SQL time-based math</a> for supported databases. No need to drop down to SQL, the ORM does it for e.</p>

<p>The second reason is, <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> does not store the <code class="language-plaintext highlighter-rouge">expires_in</code> seconds, it instead calculates the expiration timestamp on <code class="language-plaintext highlighter-rouge">INSERT</code> (using the DSL mentioned above to perform a “current time + expires in” op), which is then used in subsequent queries as a simple and more optimizable filter (you can add indexes for it, which you can’t in the <code class="language-plaintext highlighter-rouge">doorkeeper</code> variant, when the calculation happens on <code class="language-plaintext highlighter-rouge">SELECT</code>):</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># on insert</span>
<span class="n">create_params</span><span class="p">[</span><span class="n">oauth_tokens_expires_in_column</span><span class="p">]</span> <span class="o">=</span> <span class="no">Sequel</span><span class="p">.</span><span class="nf">date_add</span><span class="p">(</span><span class="no">Sequel</span><span class="o">::</span><span class="no">CURRENT_TIMESTAMP</span><span class="p">,</span> <span class="ss">seconds: </span><span class="n">oauth_token_expires_in</span><span class="p">)</span>
<span class="n">db</span><span class="p">[</span><span class="n">oauth_tokens_table</span><span class="p">].</span><span class="nf">insert</span><span class="p">(</span><span class="n">create_params</span><span class="p">)</span><span class="o">...</span>
<span class="c1"># on select</span>
<span class="n">ds</span> <span class="o">=</span> <span class="n">db</span><span class="p">[</span><span class="n">oauth_tokens_table</span><span class="p">].</span><span class="nf">where</span><span class="p">(</span><span class="no">Sequel</span><span class="p">[</span><span class="n">oauth_tokens_table</span><span class="p">][</span><span class="n">oauth_tokens_expires_in_column</span><span class="p">]</span> <span class="o">&gt;=</span> <span class="no">Sequel</span><span class="o">::</span><span class="no">CURRENT_TIMESTAMP</span><span class="p">)</span>
</code></pre></div></div>

<p>One could pick up this approach and implement it in <code class="language-plaintext highlighter-rouge">doorkeeper</code>, at the cost of some backwards-incompatibility, which means it would require a data migration. But the fact that such an optimization wasn’t obvious from the get-go seems to arguably be a by-product of having the abstraction layer “obscuring” the generated SQL in a way that the costs aren’t visible until late in the road, where the cost of “redoing it the right way” may outweigh it.</p>

<h2 id="conclusion">Conclusion</h2>

<p>This is not all to say that <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> is better than <code class="language-plaintext highlighter-rouge">doorkeeper</code> (<a href="https://honeyryderchuck.gitlab.io/rodauth-oauth/wiki/FAQ">Although I believe it is</a>, after all, I maintain it :) ). <code class="language-plaintext highlighter-rouge">doorkeeper</code> can be objectively considered more mature, and if you’re looking for a solution for rails and you don’t require the extra features <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> provides, no one ever got fired for buying IBM. I could have picked up the same discussion using <a href="https://github.com/collectiveidea/delayed_job">delayed_job</a> as an example, but I don’t maintain a similar database-backed background job framework, so any points made by me could be deemed as just “theoretical”.</p>

<p>Bottom line, when it comes to how much the extra dependencies one builds on top of might influence its maintainability, overhead time spent on unrelated chores, and focus on building the best solution for whatever problem one wants to solve, <code class="language-plaintext highlighter-rouge">sequel</code> should definitely be up there in the consideration list.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[Recently, a blog post about how to use activerecord as a library was shared on r/ruby, which started an interesting discussion thread (where I was involved) from the premise “instead of using activerecord out of the rails, why not sequel”? While several arguments were made both for and against the premise, it felt that, at times, discussion deviated towards the merits of sequel vs. activerecord, rather than using or building a gem on top of them, as a dependency; and as usual in the social network sphere, comments may have been misunderstood, everybody went their separate ways, and the Earth completed another orbit around the sun.]]></summary></entry><entry><title type="html">HTTPX 0.19.0 - happy eyeballs, proxy improvements, curl to httpx</title><link href="honeyryderchuck.gitlab.io/2022/01/26/httpx-0-19-happy-eyeballs-curl-to-httpx.html" rel="alternate" type="text/html" title="HTTPX 0.19.0 - happy eyeballs, proxy improvements, curl to httpx" /><published>2022-01-26T00:00:00+00:00</published><updated>2022-01-26T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2022/01/26/httpx-0-19-happy-eyeballs-curl-to-httpx</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2022/01/26/httpx-0-19-happy-eyeballs-curl-to-httpx.html"><![CDATA[<p><code class="language-plaintext highlighter-rouge">httpx</code> v0.19.0, the first major (minor version) update of 2022 of the ruby HTTP “swiss-army-knife” client, has just been released. It brings a lot of improvements and bugfixes, as well as a feature that has been a long time coming.</p>

<p>But first, I’d like to share with you my “weekend project”.</p>

<h2 id="curl-to-httpx">curl to httpx</h2>

<p>Presenting you the new addition to the <a href="https://honeyryderchuck.gitlab.io/httpx/">httpx website</a>: <code class="language-plaintext highlighter-rouge">curl to httpx</code>, a small widget where you can paste a <code class="language-plaintext highlighter-rouge">curl</code> command and get the equivalent <code class="language-plaintext highlighter-rouge">httpx</code> ruby code snippet.</p>

<p><img src="/images/curl-to-ruby.png" alt="curl to ruby" /></p>

<h3 id="why">Why?</h3>

<p>As the maintainer of <code class="language-plaintext highlighter-rouge">httpx</code>, I mostly interact with users via bug reports, and focus on “making it work”. But sometimes, I get to see how others use it, and there’s things to point out usually: users tend to forget error handling (<code class="language-plaintext highlighter-rouge">response.raise_for_status</code>), reimplement <code class="language-plaintext highlighter-rouge">httpx</code> native features (<code class="language-plaintext highlighter-rouge">http.post(url, body: JSON.parse(hash), headers: {"content-type" =&gt; "application/json"})</code> instead of <code class="language-plaintext highlighter-rouge">http.post(url, json: hash)</code>, handling retries or redirects themselves…), among other things.</p>

<p>Although there’s plenty of documentation (and a <a href="https://honeyryderchuck.gitlab.io/httpx/wiki/home.html">wiki</a>), I’m mindful that most users don’t have the time to go through it, and “whatever works first” is a decent success metric. It could be better though. But how?</p>

<p>Turns out I wasn’t the first to think about it. Recently I found <a href="https://jhawthorn.github.io/curl-to-ruby/">curl-to-ruby</a>, a webform which translates <code class="language-plaintext highlighter-rouge">curl</code>-based commands (<a href="https://curl.se/">curl</a> is used extensively to query HTTP APIs) into ruby code using the <code class="language-plaintext highlighter-rouge">net-http</code> standard library (this webform is itself based on <a href="https://mholt.github.io/curl-to-go/">curl-to-go</a>, a similar tool for the <code class="language-plaintext highlighter-rouge">go</code> language). I found it pretty cool, because it diminishes the cognitive load (and inevitably going through several <code class="language-plaintext highlighter-rouge">net-http</code> cheatsheet and “how-to-make-sense-of-net-http” websites) of using <code class="language-plaintext highlighter-rouge">net-http</code>’s terrible API, and still get the benefit of not installing another HTTP client gem.</p>

<p>I’d like to think that <code class="language-plaintext highlighter-rouge">httpx</code> API isn’t that terrible, however it’s still a pretty useful tool. So I looked on how to adapt it to use <code class="language-plaintext highlighter-rouge">httpx</code> instead. One issue though: <code class="language-plaintext highlighter-rouge">curl-to-ruby</code> code is Javascript. I wasn’t excited at the prospect of programming Javascript to generate ruby code.</p>

<p>So I started looking into how to solve this problem using ruby instead.</p>

<h3 id="how">How?</h3>

<p>The first step was to develop a simple script, using stdlib’s <a href="https://github.com/ruby/optparse">optparse</a>, which would “parse” the <code class="language-plaintext highlighter-rouge">curl</code> call and paste the ruby script using <code class="language-plaintext highlighter-rouge">httpx</code> to standard out. That turned out to be straightforward, even if repetitive (there are &gt;100 <code class="language-plaintext highlighter-rouge">curl</code> cli options):</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># something like:</span>
<span class="nb">require</span> <span class="s2">"optparse"</span>

<span class="c1"># ...</span>

<span class="n">options</span> <span class="o">=</span> <span class="p">{}</span>
<span class="no">OptionParser</span><span class="p">.</span><span class="nf">new</span> <span class="k">do</span> <span class="o">|</span><span class="n">opts</span><span class="o">|</span>
	<span class="n">opts</span><span class="p">.</span><span class="nf">on</span><span class="p">(</span><span class="s2">"--basic"</span><span class="p">)</span> <span class="k">do</span> <span class="c1">#         Use HTTP Basic Authentication</span>
		<span class="n">options</span><span class="p">[</span><span class="ss">:auth</span><span class="p">]</span> <span class="o">=</span> <span class="ss">:basic_authentication</span>
		<span class="n">options</span><span class="p">[</span><span class="ss">:auth_method</span><span class="p">]</span> <span class="o">=</span> <span class="ss">:basic_auth</span>
	<span class="k">end</span>
	<span class="n">opts</span><span class="p">.</span><span class="nf">on</span><span class="p">(</span><span class="s2">"-F"</span><span class="p">,</span> <span class="s2">"--form NAME=CONTENT"</span><span class="p">)</span> <span class="k">do</span> <span class="o">|</span><span class="n">data</span><span class="o">|</span>
	<span class="c1"># ... and so one ...</span>
	<span class="k">end</span>
<span class="k">end</span><span class="p">.</span><span class="nf">parse</span><span class="p">(</span><span class="n">curl_command</span><span class="p">)</span>

<span class="nb">puts</span> <span class="n">to_httpx</span><span class="p">(</span><span class="n">options</span><span class="p">)</span>
</code></pre></div></div>

<p>The second step was to compile it to Javascript that could be used in the website. For that, I used <a href="https://github.com/opal/opal">opal</a>, a known “ruby to javascript” compiler.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># the gist of how handling inputs via opal/js</span>
<span class="n">on_txt_change</span> <span class="o">=</span> <span class="nb">lambda</span> <span class="k">do</span> <span class="o">|</span><span class="n">evt</span><span class="o">|</span>
	<span class="n">command</span> <span class="o">=</span> <span class="sb">`</span><span class="si">#{</span><span class="n">evt</span><span class="si">}</span><span class="sb">.target.value`</span>
	<span class="n">options</span> <span class="o">=</span> <span class="p">{}</span>
	<span class="n">urls</span> <span class="o">=</span> <span class="n">parse_options</span><span class="p">(</span><span class="n">command</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
	<span class="n">output</span> <span class="o">=</span> <span class="n">to_httpx_output</span><span class="p">(</span><span class="n">urls</span><span class="p">,</span> <span class="n">options</span><span class="p">)</span>
<span class="k">end</span>

<span class="sx">%x{
	var input = document.getElementById('curl-command-input');
	input.addEventListener('input', on_txt_change, false);
	input.addEventListener('change', on_txt_change, false);
}</span>
</code></pre></div></div>

<p>I may switch to using WASM in the future, now that <a href="https://bugs.ruby-lang.org/issues/18462">ruby will support webassembly</a>, but this works well for now.</p>

<p>Then it was a matter of adding the HTML input tags in the <code class="language-plaintext highlighter-rouge">jekyll</code> templates, and it was a wrap.</p>

<p>(It took more than a weekend though 😂).</p>

<p>Doing this type of integration using (mostly) ruby felt very enabling. Cheers to the commmunity! Hope you find the widget useful.</p>

<p>Now, back to the <code class="language-plaintext highlighter-rouge">v0.19.0</code> feature announcements.</p>

<h2 id="happy-eyeballs-v2">Happy Eyeballs v2</h2>

<p>The main new feature coming in <code class="language-plaintext highlighter-rouge">v0.19.0</code> is Happy Eyeballs support. If you want to know about it in detail <a href="https://datatracker.ietf.org/doc/html/rfc8305">you can read the RFC</a>. But the tl;dr is: the DNS layer will request for IPv6 and IPv4 addresses in parallel, and privilege IPv6 connectivity whenever possible (under the conditions defined by the RFC).</p>

<h3 id="why-1">Why?</h3>

<p>Prior to <code class="language-plaintext highlighter-rouge">v0.19.0</code>, <code class="language-plaintext highlighter-rouge">httpx</code> would resolve hostnames by first attempting an IPv4 address resolution (DNS A record), and only if the request would fail it’d request for an IPv6 address (DNS AAAA record). In a nutshell, “IPv4 first”.</p>

<p>This decision was taken a long time ago, due to personal experiences with poor quality IPv6-enabled networks, and an assumption that if you target “stable legacy” IPv4 connectivity, I’d have less worries about support.</p>

<p>Yet this always seemed counter-intuitive to <code class="language-plaintext highlighter-rouge">httpx</code> mission: it enables seamless HTTP/2, but it gets you stuck with IPv4? That sounds off. Sure, ruby’s mainly used in the cloud, where private networks have been IPv4-only for a long time, but <a href="https://aws.amazon.com/pt/blogs/networking-and-content-delivery/dual-stack-ipv6-architectures-for-aws-and-hybrid-networks/">that’s changing</a>.</p>

<h3 id="how-1">How?</h3>

<p>All of the DNS strategies are using it now. The <code class="language-plaintext highlighter-rouge">:native</code> (default, pure ruby) resolver opens 2 sockets, one for each IP family, and uses them for each request; the <code class="language-plaintext highlighter-rouge">:https</code> (DoH) resolver uses the same HTTP/2 connection to multiplex both requests; the <code class="language-plaintext highlighter-rouge">:system</code> resolver was modified to use <code class="language-plaintext highlighter-rouge">getaddrinfo</code> (and doesn’t block anymore), which already does dual-stack under the hood. Caches are also dual-stack aware, as is the hosts resolver.</p>

<p>One thing to note is that both the <code class="language-plaintext highlighter-rouge">:native</code> and <code class="language-plaintext highlighter-rouge">:https</code> resolver are <a href="https://www.cloudflare.com/learning/performance/what-is-dns-load-balancing/">DNS-based load balancing friendly</a>, whereas the <code class="language-plaintext highlighter-rouge">:system</code> resolver is not, due to its reliance on <code class="language-plaintext highlighter-rouge">getaddrinfo</code>, which <a href="https://access.redhat.com/solutions/22132">orders IPs before handing them to the caller</a>, thereby changing the order in which they were returned by the DNS server.</p>

<h2 id="wrap-up">Wrap up</h2>

<p>There were also <a href="https://honeyryderchuck.gitlab.io/httpx/rdoc/files/doc/release_notes/0_19_0_md.html">plenty of improvements in the proxy layer, and another round of bugfixes</a>. Give it a try!</p>]]></content><author><name></name></author><summary type="html"><![CDATA[httpx v0.19.0, the first major (minor version) update of 2022 of the ruby HTTP “swiss-army-knife” client, has just been released. It brings a lot of improvements and bugfixes, as well as a feature that has been a long time coming.]]></summary></entry><entry><title type="html">Build an OIDC provider with rodauth-oauth in rails, while keeping your authentication</title><link href="honeyryderchuck.gitlab.io/2021/09/08/using-rodauth-oauth-in-rails-without-rodauth-based-auth.html" rel="alternate" type="text/html" title="Build an OIDC provider with rodauth-oauth in rails, while keeping your authentication" /><published>2021-09-08T00:00:00+00:00</published><updated>2021-09-08T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2021/09/08/using-rodauth-oauth-in-rails-without-rodauth-based-auth</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2021/09/08/using-rodauth-oauth-in-rails-without-rodauth-based-auth.html"><![CDATA[<p>I’ve written before about rodauth-oauth and <a href="https://honeyryderchuck.gitlab.io/httpx/2021/03/15/oidc-provider-on-rails-using-rodauth-oauth.html">how to use it to make an OAuth2 or OIDC Connect provider out of a rails application</a>, and where I <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth-demo-rails">built a rails demo app based out of Janko Mahronic’s rodauth-rails demo app as a workable tutorial</a>. It shows well what rodauth accomplishes, how integrating it in a rails app became significantly simpler thanks to <a href="https://github.com/janko/rodauth-rails">rodauth-rails</a>, and how one can building an OAuth/OIDC provider using <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth">rodauth-oauth</a> on top of that.</p>

<p>Recently, I got asked by a former co-worker what do I suggest for building an OAuth provider in a rails app. I suggested <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth">rodauth-oauth</a>. “But we already have our own authentication. Doesn’t <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth">rodauth-oauth</a> require that authentication is handled by <a href="https://github.com/jeremyevans/rodauth/">rodauth</a>?”.</p>

<p>I said “no, it does not, it just requires a few undocumented tweaks”. And then I realized that it’s not that obvious for anyone not familiar with the toolchain how this would get done, and how much of a barrier for adoption that is. A lot of Rails deployments rely on <a href="https://github.com/heartcombo/devise">devise</a> or something else based on <a href="https://github.com/wardencommunity/warden">warden</a> for authentication, and while it’s certainly reasonable to “sell” <a href="https://github.com/jeremyevans/rodauth/">rodauth</a> as a much better alternative to consider, buying into <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth">rodauth-oauth</a> shouldn’t ideally have to require a whole rewrite of the authentication system.</p>

<p>So if you’d like to try <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth">rodauth-oauth</a> for OAuth and keep your authentication logic, this tutorial is for you.</p>

<h2 id="1-rails-and-devise-sitting-in-a-tree">1. Rails and Devise sitting in a tree</h2>

<p>The first is having an example rails app to work with. In order to do so, I’ll <a href="https://janko.io/adding-authentication-in-rails-with-rodauth/">follow what Janko used in his first rodauth post</a> and use his blog bootstrapper example:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>git clone https://gitlab.com/janko-m/rails_bootstrap_starter.git rodauth-oauth-devise-demo
<span class="nv">$ </span><span class="nb">cd </span>rodauth-oauth-devise-demo
<span class="nv">$ </span>bin/setup
</code></pre></div></div>

<p>(This part was easier said than done. I have very little experience with <code class="language-plaintext highlighter-rouge">webpacker</code>, but it seems that everytime I need it, running a command will always seem to fail and send me in a journey searching for workarounds in google. This one landed <a href="https://stackoverflow.com/questions/69046801/brand-new-rails-6-1-4-1-fails-with-webpack-error-typeerror-class-constructor">here</a>, where I found out that latest-greatest <code class="language-plaintext highlighter-rouge">webpack</code> isn’t compatible with <code class="language-plaintext highlighter-rouge">webpacker</code>. Always something…)</p>

<p>Now, I will use <a href="https://github.com/heartcombo/devise">devise</a> for this tutorial.</p>

<p>(<strong>NOTE</strong>: I know there are other alternatives, but <a href="https://github.com/heartcombo/devise">devise</a> provides me with a “quick to prototype” bootstrap experience for this demo, while the tweaks can apply to any other framework):</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> bundle add devise
</code></pre></div></div>

<p>And run its initializers:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> bundle <span class="nb">exec </span>rails generate devise:install <span class="c"># adds initializers, configs...</span>
<span class="o">&gt;</span> bundle <span class="nb">exec </span>rails generate devise User <span class="c"># creates the user model and migrations</span>
</code></pre></div></div>

<p><strong>NOTE</strong>: make sure to uncoment the section in the migrations file generated by <code class="language-plaintext highlighter-rouge">devise</code> referring to the <code class="language-plaintext highlighter-rouge">:trackable</code> plugin, and enable it in the model as well:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># in the migration file</span>
<span class="c1">## Trackable</span>
<span class="n">t</span><span class="p">.</span><span class="nf">integer</span>  <span class="ss">:sign_in_count</span><span class="p">,</span> <span class="ss">default: </span><span class="mi">0</span><span class="p">,</span> <span class="ss">null: </span><span class="kp">false</span>
<span class="n">t</span><span class="p">.</span><span class="nf">datetime</span> <span class="ss">:current_sign_in_at</span>
<span class="n">t</span><span class="p">.</span><span class="nf">datetime</span> <span class="ss">:last_sign_in_at</span>
<span class="n">t</span><span class="p">.</span><span class="nf">string</span>   <span class="ss">:current_sign_in_ip</span>
<span class="n">t</span><span class="p">.</span><span class="nf">string</span>   <span class="ss">:last_sign_in_ip</span>

<span class="c1"># in the User model</span>
<span class="n">devise</span> <span class="ss">:database_authenticatable</span><span class="p">,</span>
        <span class="c1"># ...</span>
        <span class="ss">:trackable</span>
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> bundle <span class="nb">exec </span>rails db:migrate
</code></pre></div></div>

<p>Now let’s add some useful links in the navbar:</p>

<div class="language-erb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">&lt;!-- app/views/application/_navbar.html.erb --&gt;</span>
<span class="c">&lt;!-- ... ---&gt;</span>
<span class="cp">&lt;%</span> <span class="k">if</span> <span class="n">user_signed_in?</span> <span class="cp">%&gt;</span>
  <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"dropdown"</span><span class="nt">&gt;</span>
    <span class="cp">&lt;%=</span> <span class="n">link_to</span> <span class="n">current_user</span><span class="p">.</span><span class="nf">email</span><span class="p">,</span> <span class="s2">"#"</span><span class="p">,</span> <span class="ss">class: </span><span class="s2">"btn btn-info dropdown-toggle"</span><span class="p">,</span> <span class="ss">data: </span><span class="p">{</span> <span class="ss">toggle: </span><span class="s2">"dropdown"</span> <span class="p">}</span> <span class="cp">%&gt;</span>
    <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"dropdown-menu dropdown-menu-right"</span><span class="nt">&gt;</span>
      <span class="cp">&lt;%=</span> <span class="n">link_to</span> <span class="s2">"Change password"</span><span class="p">,</span> <span class="n">edit_user_password_path</span><span class="p">,</span> <span class="ss">class: </span><span class="s2">"dropdown-item"</span> <span class="cp">%&gt;</span>
      <span class="nt">&lt;div</span> <span class="na">class=</span><span class="s">"dropdown-divider"</span><span class="nt">&gt;&lt;/div&gt;</span>
      <span class="cp">&lt;%=</span> <span class="n">link_to</span> <span class="s2">"Sign out"</span><span class="p">,</span> <span class="n">destroy_user_session_path</span><span class="p">,</span> <span class="ss">method: :delete</span><span class="p">,</span> <span class="ss">class: </span><span class="s2">"dropdown-item"</span> <span class="cp">%&gt;</span>
    <span class="nt">&lt;/div&gt;</span>
  <span class="nt">&lt;/div&gt;</span>
<span class="cp">&lt;%</span> <span class="k">else</span> <span class="cp">%&gt;</span>
  <span class="nt">&lt;div&gt;</span>
    <span class="cp">&lt;%=</span> <span class="n">link_to</span> <span class="s2">"Sign in"</span><span class="p">,</span> <span class="n">new_user_session_path</span><span class="p">,</span> <span class="ss">class: </span><span class="s2">"btn btn-outline-primary"</span> <span class="cp">%&gt;</span>
    <span class="cp">&lt;%=</span> <span class="n">link_to</span> <span class="s2">"Sign up"</span><span class="p">,</span> <span class="n">new_user_registration_path</span><span class="p">,</span> <span class="ss">class: </span><span class="s2">"btn btn-success"</span> <span class="cp">%&gt;</span>
  <span class="nt">&lt;/div&gt;</span>
<span class="cp">&lt;%</span> <span class="k">end</span> <span class="cp">%&gt;</span>
<span class="c">&lt;!-- ... ---&gt;</span>
</code></pre></div></div>

<p>And lock the posts section for authenticated users:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">class</span> <span class="nc">PostsController</span> <span class="o">&lt;</span> <span class="no">ApplicationController</span>
  <span class="n">before_action</span> <span class="ss">:authenticate_user!</span>
  <span class="c1"># ...</span>
</code></pre></div></div>

<p><img src="/images/using-rodauth-oauth-devise-rails/login-screen-1.png" alt="login-screen-1" /></p>

<p>And that’s it, we’re set!</p>

<h2 id="2-install-rodauth-rails-but-not-use-it-for-authentication-and-rodauth-oauth">2. Install rodauth-rails (but not use it for authentication) and rodauth-oauth</h2>

<p>Installing is accomplished simply by doing:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> bundle add rodauth-rails
<span class="o">&gt;</span> bundle add rodauth-oauth
</code></pre></div></div>

<p>First thing we do is to run <code class="language-plaintext highlighter-rouge">rodauth-rails</code> main initializers:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> bundle <span class="nb">exec </span>rails generate rodauth:install
      create  db/migrate/20210906132849_create_rodauth.rb
      create  config/initializers/rodauth.rb
      create  config/initializers/sequel.rb
      create  app/lib/rodauth_app.rb
      create  app/controllers/rodauth_controller.rb
      create  app/models/account.rb
      create  app/mailers/rodauth_mailer.rb
      create  app/views/rodauth_mailer/email_auth.text.erb
      create  app/views/rodauth_mailer/password_changed.text.erb
      create  app/views/rodauth_mailer/reset_password.text.erb
      create  app/views/rodauth_mailer/unlock_account.text.erb
      create  app/views/rodauth_mailer/verify_account.text.erb
      create  app/views/rodauth_mailer/verify_login_change.text.erb
</code></pre></div></div>

<p>As you can see from the output above, <code class="language-plaintext highlighter-rouge">rodauth-rails</code> expects that you’ll start using <code class="language-plaintext highlighter-rouge">rodauth</code> for authentication. There are a few switches, such as <code class="language-plaintext highlighter-rouge">--json</code> or <code class="language-plaintext highlighter-rouge">--jwt</code>, but they’re not very useful for our use-case, which is “just initializers please”.</p>

<p>So now it’s time to delete things :) Let’s start by removing the files we won’t need:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> <span class="nb">rm</span> <span class="nt">-rf</span> app/views/rodauth_mailer/
<span class="o">&gt;</span> <span class="nb">rm </span>app/mailers/rodauth_mailer.rb app/models/account.rb db/migrate/20210906132849_create_rodauth.rb
</code></pre></div></div>

<p>And then update the auto-generated config files:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/rodauth_app.rb</span>
<span class="k">class</span> <span class="nc">RodauthApp</span> <span class="o">&lt;</span> <span class="no">Rodauth</span><span class="o">::</span><span class="no">Rails</span><span class="o">::</span><span class="no">App</span>
  <span class="n">configure</span> <span class="k">do</span>
    <span class="c1"># List of authentication features that are loaded.</span>
<span class="o">-</span>    <span class="n">enable</span> <span class="ss">:create_account</span><span class="p">,</span> <span class="ss">:verify_account</span><span class="p">,</span> <span class="ss">:verify_account_grace_period</span><span class="p">,</span>
<span class="o">-</span>      <span class="ss">:login</span><span class="p">,</span> <span class="ss">:logout</span><span class="p">,</span> <span class="ss">:remember</span><span class="p">,</span>
<span class="o">-</span>      <span class="ss">:reset_password</span><span class="p">,</span> <span class="ss">:change_password</span><span class="p">,</span> <span class="ss">:change_password_notify</span><span class="p">,</span>
<span class="o">-</span>      <span class="ss">:change_login</span><span class="p">,</span> <span class="ss">:verify_login_change</span><span class="p">,</span>
<span class="o">-</span>      <span class="ss">:close_account</span>
<span class="o">+</span>    <span class="n">enable</span> <span class="ss">:base</span>
  <span class="c1"># ... delete every other default option</span>
<span class="o">+</span>    <span class="n">accounts_table</span> <span class="ss">:users</span>
  <span class="k">end</span>

  <span class="n">route</span> <span class="k">do</span> <span class="o">|</span><span class="n">r</span><span class="o">|</span>
<span class="o">-</span>    <span class="n">rodauth</span><span class="p">.</span><span class="nf">load_memory</span> <span class="c1"># only useful for auth-driven rodauth</span>
<span class="o">-</span>
     <span class="n">r</span><span class="p">.</span><span class="nf">rodauth</span> <span class="c1"># route rodauth requests</span>
</code></pre></div></div>

<p>And now it’s time to auto-generate <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> files:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> bundle <span class="nb">exec </span>rails generate rodauth:oauth:install
      create  db/migrate/20210906134332_create_rodauth_oauth.rb
      create  app/models/oauth_application.rb
      create  app/models/oauth_grant.rb
      create  app/models/oauth_token.rb


<span class="o">&gt;</span> bundle <span class="nb">exec </span>rails generate rodauth:oauth:views <span class="nt">--all</span>
      create  app/views/rodauth/authorize.html.erb
      create  app/views/rodauth/oauth_applications.html.erb
      create  app/views/rodauth/oauth_application.html.erb
      create  app/views/rodauth/new_oauth_application.html.erb
</code></pre></div></div>

<p>Some changes will be required here as well before running the migrations, given that <code class="language-plaintext highlighter-rouge">devise</code> created a <code class="language-plaintext highlighter-rouge">users</code> table, not an <code class="language-plaintext highlighter-rouge">accounts</code> table like <code class="language-plaintext highlighter-rouge">rodauth</code> would have:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># db/migrate/20210906134332_create_rodauth_oauth.rb</span>
     <span class="n">create_table</span> <span class="ss">:oauth_applications</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
       <span class="n">t</span><span class="p">.</span><span class="nf">integer</span> <span class="ss">:account_id</span>
<span class="o">-</span>      <span class="n">t</span><span class="p">.</span><span class="nf">foreign_key</span> <span class="ss">:accounts</span><span class="p">,</span> <span class="ss">column: :account_id</span>
<span class="o">+</span>      <span class="n">t</span><span class="p">.</span><span class="nf">foreign_key</span> <span class="ss">:users</span><span class="p">,</span> <span class="ss">column: :account_id</span>
<span class="c1"># ...</span>
     <span class="n">create_table</span> <span class="ss">:oauth_grants</span> <span class="k">do</span> <span class="o">|</span><span class="n">t</span><span class="o">|</span>
       <span class="n">t</span><span class="p">.</span><span class="nf">integer</span> <span class="ss">:account_id</span>
<span class="o">-</span>      <span class="n">t</span><span class="p">.</span><span class="nf">foreign_key</span> <span class="ss">:accounts</span><span class="p">,</span> <span class="ss">column: :account_id</span>
<span class="o">+</span>      <span class="n">t</span><span class="p">.</span><span class="nf">foreign_key</span> <span class="ss">:users</span><span class="p">,</span> <span class="ss">column: :account_id</span>
<span class="c1"># ...</span>
</code></pre></div></div>

<p>And now you’re good to go. Run the migrations:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> bundle <span class="nb">exec </span>rails db:migrate
</code></pre></div></div>

<p>And enable the respective <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> plugins:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/rodauth_app.rb</span>

<span class="c1"># Declare public and private keys with which to verify the id_token</span>
<span class="c1"># PRIV_KEY = OpenSSL::PKey::RSA.new(File.read("path/to/privkey.pem"))</span>
<span class="c1"># PUB_KEY = OpenSSL::PKey::RSA.new(File.read("path/to/pubkey.pem"))</span>

<span class="n">enable</span> <span class="ss">:oidc</span><span class="p">,</span> <span class="ss">:oidc_dynamic_client_registration</span><span class="p">,</span> <span class="ss">:oauth_application_management</span>

<span class="c1"># list of OIDC and OAuth scopes you handle</span>
<span class="n">oauth_application_scopes</span> <span class="sx">%w[openid email profile posts.read]</span>


<span class="c1"># so helpers return model instances in rails, such as rodauth.current_oauth_account</span>
<span class="n">oauth_account_ds</span> <span class="p">{</span> <span class="o">|</span><span class="nb">id</span><span class="o">|</span> <span class="no">User</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="n">account_id_column</span> <span class="o">=&gt;</span> <span class="nb">id</span><span class="p">)</span> <span class="p">}</span>
<span class="n">oauth_application_ds</span> <span class="p">{</span> <span class="o">|</span><span class="nb">id</span><span class="o">|</span> <span class="no">OAuthApplication</span><span class="p">.</span><span class="nf">where</span><span class="p">(</span><span class="n">oauth_applications_id_column</span> <span class="o">=&gt;</span> <span class="nb">id</span><span class="p">)</span> <span class="p">}</span>

<span class="c1"># by default you're only allowed to use https redirect URIs. But we're developing,</span>
<span class="c1"># so it's fine.</span>
<span class="k">if</span> <span class="no">Rails</span><span class="p">.</span><span class="nf">env</span><span class="p">.</span><span class="nf">development?</span>
  <span class="n">oauth_valid_uri_schemes</span> <span class="sx">%w[http https]</span>
<span class="k">end</span>

<span class="n">oauth_jwt_keys</span><span class="p">(</span><span class="s2">"RS256"</span> <span class="o">=&gt;</span> <span class="no">PRIV_KEY</span><span class="p">)</span>
<span class="n">oauth_jwt_public_keys</span><span class="p">(</span><span class="s2">"RS256"</span> <span class="o">=&gt;</span> <span class="no">PUB_KEY</span><span class="p">)</span>

<span class="c1"># this callback is executed when gathering OIDC claims to build the</span>
<span class="c1"># ID token with.</span>
<span class="c1"># You should return the values for each of these claims.</span>
<span class="c1">#</span>
<span class="c1"># This callback is called in a loop for all available claims, so make sure</span>
<span class="c1"># you memoize access to to the database models to avoid the same query</span>
<span class="c1"># multiple times.</span>
<span class="n">get_oidc_param</span> <span class="k">do</span> <span class="o">|</span><span class="n">account</span><span class="p">,</span> <span class="n">param</span><span class="o">|</span>
  <span class="vi">@user</span> <span class="o">||=</span> <span class="no">User</span><span class="p">.</span><span class="nf">find_by</span><span class="p">(</span><span class="ss">id: </span><span class="n">account</span><span class="p">[</span><span class="ss">:id</span><span class="p">])</span>
  <span class="k">case</span> <span class="n">param</span>
  <span class="k">when</span> <span class="ss">:email</span>
    <span class="vi">@user</span><span class="p">.</span><span class="nf">email</span>
  <span class="k">when</span> <span class="ss">:email_verified</span>
    <span class="kp">true</span>
  <span class="k">when</span> <span class="ss">:name</span>
    <span class="vi">@user</span><span class="p">.</span><span class="nf">name</span>
  <span class="k">end</span>
<span class="k">end</span>
<span class="c1"># ...</span>
<span class="n">route</span> <span class="k">do</span> <span class="o">|</span><span class="n">r</span><span class="o">|</span>
  <span class="n">r</span><span class="p">.</span><span class="nf">rodauth</span> <span class="c1"># route rodauth requests</span>
  <span class="n">rodauth</span><span class="p">.</span><span class="nf">load_oauth_application_management_routes</span>
  <span class="n">rodauth</span><span class="p">.</span><span class="nf">load_openid_configuration_route</span>
  <span class="n">rodauth</span><span class="p">.</span><span class="nf">load_webfinger_route</span>
<span class="k">end</span>

<span class="c1"># app/models/user.rb</span>
<span class="k">class</span> <span class="nc">User</span> <span class="o">&lt;</span> <span class="no">ApplicationRecord</span>

  <span class="c1"># dirty hack, so that user has a name.</span>
  <span class="k">def</span> <span class="nf">name</span>
    <span class="n">email</span><span class="p">.</span><span class="nf">split</span><span class="p">(</span><span class="s2">"@"</span><span class="p">).</span><span class="nf">first</span> <span class="c1"># "john.doe@example.com" -&gt; "John Doe"</span>
  <span class="k">end</span>
  <span class="c1"># ...</span>
</code></pre></div></div>

<div class="language-erb highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">&lt;!-- app/views/application/_navbar.html.erb --&gt;</span>
<span class="c">&lt;!-- ... ---&gt;</span>
         <span class="nt">&lt;li</span> <span class="na">class=</span><span class="s">"nav-item"</span><span class="nt">&gt;</span>
           <span class="cp">&lt;%=</span> <span class="n">link_to</span> <span class="s2">"Posts"</span><span class="p">,</span> <span class="n">posts_path</span><span class="p">,</span> <span class="ss">class: </span><span class="s2">"nav-link"</span> <span class="cp">%&gt;</span>
         <span class="nt">&lt;/li&gt;</span>
+        <span class="cp">&lt;%</span> <span class="k">if</span> <span class="n">user_signed_in?</span> <span class="cp">%&gt;</span>
+          <span class="nt">&lt;li</span> <span class="na">class=</span><span class="s">"nav-item </span><span class="cp">&lt;%=</span> <span class="s2">"active"</span> <span class="k">unless</span> <span class="n">current_page?</span><span class="p">(</span><span class="n">rodauth</span><span class="p">.</span><span class="nf">oauth_applications_path</span><span class="p">)</span> <span class="cp">%&gt;</span><span class="s">"</span><span class="nt">&gt;</span>
+            <span class="cp">&lt;%=</span> <span class="n">link_to_unless_current</span> <span class="s2">"Client Applications"</span><span class="p">,</span> <span class="n">rodauth</span><span class="p">.</span><span class="nf">oauth_applications_path</span><span class="p">,</span> <span class="ss">class: </span><span class="s2">"nav-link"</span> <span class="cp">%&gt;</span>
+          <span class="nt">&lt;/li&gt;</span>
+        <span class="cp">&lt;%</span> <span class="k">end</span> <span class="cp">%&gt;</span>
</code></pre></div></div>

<p>Now, let’s add some seed data we can test things with, such as a test user account:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># db/seed.rb</span>
<span class="no">User</span><span class="p">.</span><span class="nf">create!</span><span class="p">(</span><span class="ss">email: </span><span class="s2">"john.doe@example.com"</span><span class="p">,</span> <span class="ss">password: </span><span class="s2">"password"</span><span class="p">)</span>
<span class="mi">10</span><span class="p">.</span><span class="nf">times</span> <span class="k">do</span> <span class="o">|</span><span class="n">i</span><span class="o">|</span>
  <span class="no">Post</span><span class="p">.</span><span class="nf">create!</span><span class="p">(</span><span class="ss">user: </span><span class="n">user</span><span class="p">,</span> <span class="ss">title: </span><span class="s2">"Post </span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="s2">"</span><span class="p">,</span> <span class="ss">body: </span><span class="s2">"a story about post </span><span class="si">#{</span><span class="n">i</span><span class="si">}</span><span class="s2">"</span><span class="p">)</span>
<span class="k">end</span>
</code></pre></div></div>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> bundle <span class="nb">exec </span>rails db:seed
</code></pre></div></div>

<p>Now we should be able to start registering our first OAuth application.</p>

<p><img src="/images/using-rodauth-oauth-devise-rails/login-1.png" alt="logging-in" /></p>

<p><img src="/images/using-rodauth-oauth-devise-rails/logged-in-1.png" alt="logged-in" /></p>

<p>Ok, now let’s add a new OAuth Application.</p>

<p><img src="/images/using-rodauth-oauth-devise-rails/oauth-applications-error-1.png" alt="oauth-applications-error" /></p>

<p>And here’s it is: <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> couldn’t recognize the user is logged in. This is where we’ll start tweaking the configuration.</p>

<h2 id="4-user-is-account">4. User is account</h2>

<p>The main thing here to stress out is that the default configuration is tailored for <code class="language-plaintext highlighter-rouge">rodauth</code>. However, it’s highly <strong>configurable</strong>! The first thing was already done, namely defined <code class="language-plaintext highlighter-rouge">accounts_table</code> as the <code class="language-plaintext highlighter-rouge">:users</code> table where <code class="language-plaintext highlighter-rouge">devise</code> writes. Now we have to tell <code class="language-plaintext highlighter-rouge">rodauth</code> when the user is logged in. We do that by adding the following set of custom configs:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># lib/rodauth_app.rb</span>

  <span class="n">configure</span> <span class="k">do</span>
    <span class="c1"># ... after everything else...</span>

    <span class="c1"># to tell rodauth where to redirect if user is not logged in</span>
    <span class="n">require_login_redirect</span> <span class="p">{</span> <span class="s2">"/users/sign_in"</span> <span class="p">}</span>

    <span class="c1"># reuse devise controller helper</span>
    <span class="n">logged_in?</span> <span class="p">{</span> <span class="n">rails_controller_instance</span><span class="p">.</span><span class="nf">user_signed_in?</span> <span class="p">}</span>

    <span class="c1"># tell rodauth where to get the user ID from devise's session cookie</span>
    <span class="n">session_value</span> <span class="k">do</span>
      <span class="n">rails_controller_instance</span><span class="p">.</span><span class="nf">session</span>
        <span class="p">.</span><span class="nf">fetch</span><span class="p">(</span><span class="s2">"warden.user.user.key"</span><span class="p">,</span> <span class="p">[])</span>
        <span class="p">.</span><span class="nf">dig</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">)</span> <span class="o">||</span> <span class="k">super</span><span class="p">()</span>
    <span class="k">end</span>

    <span class="c1"># used by the oidc plugin to get the "auth_time" claim</span>
    <span class="n">get_oidc_account_last_login_at</span> <span class="p">{</span> <span class="o">|</span><span class="n">user_id</span><span class="o">|</span> <span class="no">User</span><span class="p">.</span><span class="nf">find</span><span class="p">(</span><span class="n">user_id</span><span class="p">).</span><span class="nf">last_sign_in_at</span> <span class="p">}</span>
    <span class="c1"># ...</span>
</code></pre></div></div>

<p>Long story short, we hoist a couple of calls expecting a <code class="language-plaintext highlighter-rouge">rodauth</code> cookie session being defined, to determine whether user is logged in and which user that is, and we “route” those to <code class="language-plaintext highlighter-rouge">devise</code> entities (i.e. that <code class="language-plaintext highlighter-rouge">"warden.user.user.key"</code> cookie, which is where <code class="language-plaintext highlighter-rouge">devise</code> puts the user ID). And once we do that:</p>

<p><img src="/images/using-rodauth-oauth-devise-rails/oauth-applications-1.png" alt="oauth-applications-1" /></p>

<p>Et Voilà, applications section unlocked. After filling up the form <a href="https://honeyryderchuck.gitlab.io/httpx/2021/03/15/oidc-provider-on-rails-using-rodauth-oauth.html">exactly in the same way that was described in the previous blog post</a>, I end up with the OAuth application we’ll use for the following steps:</p>

<p><img src="/images/using-rodauth-oauth-devise-rails/oauth-application-1.png" alt="oauth-application-1" /></p>

<h2 id="5-business-as-usual">5. Business as usual</h2>

<p>Now it’s time to hook our client application. For this purpose, we’ll do the same as described in the <a href="https://honeyryderchuck.gitlab.io/httpx/2021/03/15/oidc-provider-on-rails-using-rodauth-oauth.html">previous rodauth-oauth post</a>, and <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth/-/blob/master/examples/oidc/client_application.rb">reuse the same OIDC client application</a>, a single-file single-page app listing some books, fetched via an API request authorized via the ID token.</p>

<p>The same tweaks described there are applied, and the following script is ran for it:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="o">&gt;</span> <span class="nb">export </span><span class="nv">RESOURCE_SERVER_URI</span><span class="o">=</span>http://localhost:3000/posts
<span class="o">&gt;</span> <span class="nb">export </span><span class="nv">AUTHORIZATION_SERVER_URI</span><span class="o">=</span>http://localhost:3000
<span class="o">&gt;</span> <span class="nb">export </span><span class="nv">CLIENT_ID</span><span class="o">=</span>WJ5hWI_h050Rw0Ve4834lFK2H9Z01urcXiBIs27A5lQ
<span class="o">&gt;</span> <span class="nb">export </span><span class="nv">CLIENT_SECRET</span><span class="o">=</span>owxhtwsruvcltsvhycamoqnmulvfqgdjgpdxappjgywamwnrqdkwpgdlqbonegdo
<span class="o">&gt;</span> bundle <span class="nb">exec </span>ruby scripts/client_application.rb
</code></pre></div></div>

<p><img src="/images/using-rodauth-oauth-devise-rails/client-application-1.png" alt="client-application-1" /></p>

<p>And here we go:</p>

<p><img src="/images/using-rodauth-oauth-devise-rails/authorize-1.png" alt="authorize-1" /></p>

<p><img src="/images/using-rodauth-oauth-devise-rails/authorize-error-2.png" alt="authorize-error-2" /></p>

<p>The problem here is that access to posts controller is protected via the <code class="language-plaintext highlighter-rouge">authenticate_user!</code> before action from <code class="language-plaintext highlighter-rouge">devise</code>. After the OIDC authentication however, requests are authenticated via ID token, which <code class="language-plaintext highlighter-rouge">devise</code> doesn’t know about. It’s up to you now to provide a new set of before actions, or override the existing ones. For the sake of completeness, I’m going with the latter, but just bear in mind there are other ways to accomplish this.</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># app/controllers/application_controller.rb</span>
<span class="k">class</span> <span class="nc">ApplicationController</span> <span class="o">&lt;</span> <span class="no">ActionController</span><span class="o">::</span><span class="no">Base</span>
  <span class="k">def</span> <span class="nf">authenticate_user!</span>
    <span class="n">rodauth</span><span class="p">.</span><span class="nf">session_value</span> <span class="o">||</span> <span class="k">super</span>
  <span class="k">end</span>
<span class="k">end</span>

<span class="c1"># app/controller/posts_controller.rb</span>
<span class="k">class</span> <span class="nc">PostsController</span> <span class="o">&lt;</span> <span class="no">ApplicationController</span>
  <span class="c1"># expose via authorization header with bearer token</span>
  <span class="n">before_action</span> <span class="ss">:authenticate_user!</span><span class="p">,</span> <span class="ss">except: </span><span class="p">[</span><span class="ss">:index</span><span class="p">,</span> <span class="ss">:show</span><span class="p">]</span>

  <span class="k">def</span> <span class="nf">index</span>
    <span class="n">account</span> <span class="o">=</span> <span class="n">current_user</span> <span class="o">||</span> <span class="n">current_oauth_account</span>
    <span class="vi">@posts</span> <span class="o">=</span> <span class="n">account</span><span class="p">.</span><span class="nf">posts</span><span class="p">.</span><span class="nf">all</span>
    <span class="c1"># ...</span>
  <span class="k">end</span>

  <span class="kp">private</span>

  <span class="k">def</span> <span class="nf">require_read_access</span>
    <span class="k">return</span> <span class="n">require_authentication</span> <span class="k">unless</span> <span class="n">request</span><span class="p">.</span><span class="nf">authorization</span> <span class="o">&amp;&amp;</span> <span class="n">request</span><span class="p">.</span><span class="nf">authorization</span><span class="p">.</span><span class="nf">start_with?</span><span class="p">(</span><span class="s2">"Bearer"</span><span class="p">)</span>

    <span class="n">rodauth</span><span class="p">.</span><span class="nf">require_oauth_authorization</span><span class="p">(</span><span class="s2">"posts.read"</span><span class="p">)</span>
  <span class="k">end</span>
<span class="k">end</span>
</code></pre></div></div>

<p>Now let’s do this again:</p>

<p><img src="/images/using-rodauth-oauth-devise-rails/authorize-1.png" alt="authorize-1" /></p>

<p><img src="/images/using-rodauth-oauth-devise-rails/authorized-1.png" alt="authorized-1" /></p>

<p>Success!</p>

<h2 id="6-conclusion">6. Conclusion</h2>

<p>As the article proves, it is possible to use <code class="language-plaintext highlighter-rouge">rodauth-oauth</code> without actually using <code class="language-plaintext highlighter-rouge">rodauth</code> for authentication, with a few tweaks to the configuration. <code class="language-plaintext highlighter-rouge">devise</code> was used for demonstration purposes, but the same lessons can be replicated for any other authentication library (<code class="language-plaintext highlighter-rouge">sorcery</code>, <code class="language-plaintext highlighter-rouge">warden-rails</code>, plain <code class="language-plaintext highlighter-rouge">warden</code>…).</p>

<p>It’s now up to the user to decide whether these tweaks are worth it, compared to the alternative frameworks for OAuth or OIDC.</p>

<p>And who knows, maybe you’ll like <code class="language-plaintext highlighter-rouge">rodauth</code>’s approach so much so that you’ll start migrating your authentication system to it :) .</p>

<p>You can find the demo app under <a href="https://gitlab.com/honeyryderchuck/rodauth-oauth-devise-demo">this gitlab repository</a>.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[I’ve written before about rodauth-oauth and how to use it to make an OAuth2 or OIDC Connect provider out of a rails application, and where I built a rails demo app based out of Janko Mahronic’s rodauth-rails demo app as a workable tutorial. It shows well what rodauth accomplishes, how integrating it in a rails app became significantly simpler thanks to rodauth-rails, and how one can building an OAuth/OIDC provider using rodauth-oauth on top of that.]]></summary></entry><entry><title type="html">Tensorflow Serving with Ruby</title><link href="honeyryderchuck.gitlab.io/2021/08/26/tensorflow-serving-with-ruby.html" rel="alternate" type="text/html" title="Tensorflow Serving with Ruby" /><published>2021-08-26T00:00:00+00:00</published><updated>2021-08-26T00:00:00+00:00</updated><id>honeyryderchuck.gitlab.io/2021/08/26/tensorflow-serving-with-ruby</id><content type="html" xml:base="honeyryderchuck.gitlab.io/2021/08/26/tensorflow-serving-with-ruby.html"><![CDATA[<p>The <a href="https://www.tensorflow.org/">Tensorflow framework</a> is the most used framework when it comes to develop, train and deploy Machine Learning models. It ships with first class API support for <code class="language-plaintext highlighter-rouge">python</code> and <code class="language-plaintext highlighter-rouge">C++</code>, the former being a favourite of most data scientists, which explains the pervasiveness of <code class="language-plaintext highlighter-rouge">python</code> in virtually all of the companies relying on ML for their products.</p>

<p>When it comes to deploying ML-based web services, there are two options. The first one is to develop a <code class="language-plaintext highlighter-rouge">python</code> web service, using something like <code class="language-plaintext highlighter-rouge">flask</code> or <code class="language-plaintext highlighter-rouge">django</code>, add <code class="language-plaintext highlighter-rouge">tensorflow</code> as a dependency, and run the model from within it. This approach is straightforward, but it comes with its own set of problems: rolling out model upgrades has to be done for each application using it, and even ensuring that the same <code class="language-plaintext highlighter-rouge">tensorflow</code> library version is used everywhere tends to be difficult, it being a pretty heavy dependency, which often conflicts with other libraries in the python ecosystem, and is frequently the subject of CVEs. All of this introduces risk in the long run.</p>

<p>The other approach is to deploy the models using <a href="https://www.tensorflow.org/tfx/guide/serving">Tensorflow Serving</a> (<a href="https://pytorch.org/serve/inference_api.html">pytorch has something similar, torchserve</a>). In short, it exposes the execution of the ML models over the network “as a service”. It supports model versioning, and can be interfaced with via gRPC or REST API, which solves the main integration issues from the previously described approach. It thus allows to compartimentalize the risks from the other approach, while also enabling the possibilitiy of throwing dedicated hardware at it.</p>

<p>It also allows you to ditch <code class="language-plaintext highlighter-rouge">python</code> when building applications.</p>

<h3 id="research-and-development">Research and Development</h3>

<p>Now, I’m not a <code class="language-plaintext highlighter-rouge">python</code> hater. It’s an accessible programming language. It shares a lot of benefits and drawbacks with <code class="language-plaintext highlighter-rouge">ruby</code>. But by the time a company decides to invest in ML to improve their product, the tech team might already be heavily familiar with a different tech stack. Maybe it’s <code class="language-plaintext highlighter-rouge">ruby</code>, maybe <code class="language-plaintext highlighter-rouge">java</code>, maybe <code class="language-plaintext highlighter-rouge">go</code>. It’s unreasonable to replace all of them with <code class="language-plaintext highlighter-rouge">python</code> experts. It’s possible to ask them to use a bit of <code class="language-plaintext highlighter-rouge">python</code>, but that comes at the cost of learning a new stack (thereby decreasing quality of delivery) and alienating the employees (thereby increasing turnover).</p>

<p>It’s also unreasonable to ask from the new data science team to not use their preferred <code class="language-plaintext highlighter-rouge">python</code> tech stack. It’s an ML <em>lingua franca</em>, and there’s way more years of investment and resources poured into libraries like <a href="https://numpy.org/">numpy</a> or <a href="https://scikit-learn.org/stable/index.html">scikit</a>. And although there’s definitely value in improving the state of ML in your preferred languages (shout out at the <a href="http://sciruby.com/">SciRuby</a> folks) and diminish the overall industry dependency on <code class="language-plaintext highlighter-rouge">python</code>, that should not come at the cost of decreasing the quality of your product.</p>

<p>Therefore, <code class="language-plaintext highlighter-rouge">tensorflow-serving</code> allows the tech team to focus on developing and shipping the best possible product, and the research team to focus on developing the best possible  models. Everyone’s productive and happy.</p>

<h3 id="tensorflow-serving-with-json">Tensorflow Serving with JSON</h3>

<p>As stated above, <code class="language-plaintext highlighter-rouge">tensorflow serving</code> services are exposed using <code class="language-plaintext highlighter-rouge">gRPC</code> and REST APIs. IF you didn’t use <code class="language-plaintext highlighter-rouge">gRPC</code> before, you’ll probably privilege the latter; you’ve done HTTP JSON clients for other APIs before, how hard can it be creating an HTTP client for it?</p>

<p>While certainly possible, going this route will come at a cost; besides ensuring that the HTTP layer works reliably, using persistent connections, timeouts, etc, there’s the cost of JSON.</p>

<p><code class="language-plaintext highlighter-rouge">tensorflow</code> (and other ML frameworks in general) makes heavy use of “tensors”, multi-dimensional same-type arrays (vectors, matrixes…), describing, for example, the coordinates of a face recognized in an image. These tensors are represented in memory as contiguous array objects, and can be therefore easily serialized into a bytestream. Libraries like <code class="language-plaintext highlighter-rouge">numpy</code> (or <code class="language-plaintext highlighter-rouge">numo</code> in ruby) take advantage of this memory layout to provide high-performance mathematical and logical operations.</p>

<p>JSON is UTF-8, and can’t encode byte streams; in order to send and receive byte streams using the REST API interface, you’ll have to convert to and from base 64 notation. This means that, besides the CPU usage overhead for these operations, you should expect a ~33% increase in the transmitted payload.</p>

<p>The <code class="language-plaintext highlighter-rouge">tensorflow-serving</code> REST API proxies to the <code class="language-plaintext highlighter-rouge">gRPC</code> layer, so there’s also this extra level of indirection to account for.</p>

<p><code class="language-plaintext highlighter-rouge">gRPC</code> doesn’t suffer from these drawbacks; on top of <code class="language-plaintext highlighter-rouge">HTTP/2</code>, it not only improves connnectivity, it also solves multiplexing and streaming; using <code class="language-plaintext highlighter-rouge">protobufs</code>, it has a typed message serialization protocol which supports byte streams.</p>

<p>How can it be used in <code class="language-plaintext highlighter-rouge">ruby</code> then?</p>

<h3 id="tensorflow-serving-with-protobufs">Tensorflow Serving with Protobufs</h3>

<p>Tensorflow Serving calls are performed using a standardized set of common protobufs, which <code class="language-plaintext highlighter-rouge">.proto</code> definitions can be found both in the <a href="https://github.com/tensorflow/tensorflow">tensorflow</a> repo, as well as in the <a href="https://github.com/tensorflow/serving">tensorflow-serving</a> repo. The most important for our case are declared under <a href="https://github.com/tensorflow/serving/blob/master/tensorflow_serving/apis/prediction_service.proto">prediction_service.proto</a>, which defines request and response protobufs declaring which model version to run, and how input and output tensors are laid out.</p>

<p>Both libraries above already package the <code class="language-plaintext highlighter-rouge">python</code> protobufs. To use them in <code class="language-plaintext highlighter-rouge">ruby</code>, you have to compile them yourself using the <a href="https://github.com/ruby-protobuf/protobuf">protobuf</a> gem. For this particular case, compiling can be a pretty involved process, which looks like this:</p>

<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># gem install grpc-tools</span>

<span class="nv">TF_VERSION</span><span class="o">=</span><span class="s2">"2.5.0"</span>
<span class="nv">TF_SERVING_VERSION</span><span class="o">=</span><span class="s2">"2.5.1"</span>
<span class="nv">PROTO_PATH</span><span class="o">=</span>path/to/protos
<span class="nb">set</span> <span class="nt">-o</span> pipefail

curl <span class="nt">-L</span> <span class="nt">-o</span> tensorflow.zip https://github.com/tensorflow/tensorflow/archive/v<span class="nv">$TF_VERSION</span>.zip
unzip tensorflow.zip <span class="o">&amp;&amp;</span> <span class="nb">rm </span>tensorflow.zip
<span class="nb">mv </span>tensorflow-<span class="nv">$TF_VERSION</span> <span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow

curl <span class="nt">-L</span> <span class="nt">-o</span> tf-serving.zip https://github.com/tensorflow/serving/archive/<span class="nv">$TF_SERVING_VERSION</span>.zip
unzip tf-serving.zip <span class="o">&amp;&amp;</span> <span class="nb">rm </span>tf-serving.zip
<span class="nb">mv </span>serving-<span class="nv">$TF_SERVING_VERSION</span>/tensorflow_serving <span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow


<span class="nv">TF_SERVING_PROTO</span><span class="o">=</span><span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/ruby
<span class="nb">mkdir</span> <span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span>

grpc_tools_ruby_protoc <span class="se">\</span>
    <span class="nt">-I</span> <span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow/tensorflow/core/framework/<span class="k">*</span>.proto <span class="se">\</span>
    <span class="nt">--ruby_out</span><span class="o">=</span><span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--grpc_out</span><span class="o">=</span><span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--proto_path</span><span class="o">=</span><span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow

grpc_tools_ruby_protoc <span class="se">\</span>
    <span class="nt">-I</span> <span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow/tensorflow/core/example/<span class="k">*</span>.proto <span class="se">\</span>
    <span class="nt">--ruby_out</span><span class="o">=</span><span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--grpc_out</span><span class="o">=</span><span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--proto_path</span><span class="o">=</span><span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow

grpc_tools_ruby_protoc <span class="se">\</span>
    <span class="nt">-I</span> <span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow/tensorflow/core/protobuf/<span class="k">*</span>.proto <span class="se">\</span>
    <span class="nt">--ruby_out</span><span class="o">=</span><span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--grpc_out</span><span class="o">=</span><span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--proto_path</span><span class="o">=</span><span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow

grpc_tools_ruby_protoc <span class="se">\</span>
    <span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow/tensorflow_serving/apis/<span class="k">*</span>.proto <span class="se">\</span>
    <span class="nt">--ruby_out</span><span class="o">=</span><span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--grpc_out</span><span class="o">=</span><span class="k">${</span><span class="nv">TF_SERVING_PROTO</span><span class="k">}</span> <span class="se">\</span>
    <span class="nt">--proto_path</span><span class="o">=</span><span class="k">${</span><span class="nv">PROTO_PATH</span><span class="k">}</span>/tensorflow

<span class="nb">ls</span> <span class="nv">$TF_SERVING_PROTO</span>
</code></pre></div></div>

<p><strong>NOTE</strong>: There’s also the <a href="https://github.com/nubbel/tensorflow_serving_client-ruby">tensorflow-serving-client</a>, which already ships with the necessary <code class="language-plaintext highlighter-rouge">ruby</code> protobufs, however there hasn’t been any updates in more than 5 years, so I can’t attest to its state of maintenance. So if you want to use this in production, make sure you generate ruby stubs from the latest version of definitons.</p>

<p>Once the protobufs are available, creating a <code class="language-plaintext highlighter-rouge">PredictRequest</code> is simple. Here’s how you’d encode a request to a model called <code class="language-plaintext highlighter-rouge">mnist</code>, taking a 784-wide float array as input:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"</span>

<span class="n">tensor</span> <span class="o">=</span> <span class="p">[</span><span class="mf">0.0</span><span class="p">]</span> <span class="o">*</span> <span class="mi">784</span>

<span class="n">request</span> <span class="o">=</span> <span class="no">Tensorflow</span><span class="o">::</span><span class="no">Serving</span><span class="o">::</span><span class="no">PredictRequest</span><span class="p">.</span><span class="nf">new</span>
<span class="n">request</span><span class="p">.</span><span class="nf">model_spec</span> <span class="o">=</span> <span class="no">Tensorflow</span><span class="o">::</span><span class="no">Serving</span><span class="o">::</span><span class="no">ModelSpec</span><span class="p">.</span><span class="nf">new</span> <span class="ss">name: </span><span class="s1">'mnist'</span>
<span class="n">request</span><span class="p">.</span><span class="nf">inputs</span><span class="p">[</span><span class="s1">'images'</span><span class="p">]</span> <span class="o">=</span> <span class="no">Tensorflow</span><span class="o">::</span><span class="no">TensorProto</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
  <span class="ss">float_val: </span><span class="n">tensor</span><span class="p">,</span>
  <span class="ss">tensor_shape: </span><span class="no">Tensorflow</span><span class="o">::</span><span class="no">TensorShapeProto</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
    <span class="ss">dim: </span><span class="p">[</span>
      <span class="no">Tensorflow</span><span class="o">::</span><span class="no">TensorShapeProto</span><span class="o">::</span><span class="no">Dim</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">size: </span><span class="mi">1</span><span class="p">),</span>
      <span class="no">Tensorflow</span><span class="o">::</span><span class="no">TensorShapeProto</span><span class="o">::</span><span class="no">Dim</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">size: </span><span class="mi">784</span><span class="p">)</span>
    <span class="p">]</span>
  <span class="p">),</span>
  <span class="ss">dtype: </span><span class="no">Tensorflow</span><span class="o">::</span><span class="no">DataType</span><span class="o">::</span><span class="no">DT_FLOAT</span>
<span class="p">)</span>
</code></pre></div></div>

<p><strong>NOTE</strong>: <code class="language-plaintext highlighter-rouge">tensorflow</code> python API ships with a very useful function called <a href="https://www.tensorflow.org/api_docs/python/tf/make_tensor_proto">make_tensor_proto</a>, which could do the above as a “one-liner”. While it’s certainly possible to code a similar function in <code class="language-plaintext highlighter-rouge">ruby</code>, it’s a pretty involved process which is beyond the scope of this post.</p>

<p>As an example, this one is easy to grasp. However, we’ll have to deal with much larger tensors in production, which is going to get heavier and slower to deal with using <code class="language-plaintext highlighter-rouge">ruby</code> arrays.</p>

<h3 id="tensorflow-serving-with-numo-and-grpc">Tensorflow Serving with Numo and GRPC</h3>

<p>In <code class="language-plaintext highlighter-rouge">python</code>, the standard for using n-dimensional arrays is <a href="https://numpy.org/">numpy</a>. <code class="language-plaintext highlighter-rouge">ruby</code> has a similar library called <a href="https://github.com/ruby-numo/numo">numo</a>.</p>

<p>It aims at providing the same APIs as <code class="language-plaintext highlighter-rouge">numpy</code>, which is mostly an aspirational goal, as keeping up with <code class="language-plaintext highlighter-rouge">numpy</code> is hard (progress can be tracked <a href="https://github.com/ruby-numo/numo-narray/wiki/Numo-vs-numpy">here</a>).</p>

<p>A lot can be done already though, such as <a href="https://github.com/yoshoku/magro">image processing</a>. If our model requires an image, this is how it can be done in <code class="language-plaintext highlighter-rouge">python</code>:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># using numpy
</span><span class="kn">import</span> <span class="n">grpc</span>
<span class="kn">import</span> <span class="n">numpy</span> <span class="k">as</span> <span class="n">np</span>
<span class="kn">from</span> <span class="n">PIL</span> <span class="kn">import</span> <span class="n">Image</span>
<span class="kn">import</span> <span class="n">tensorflow</span> <span class="k">as</span> <span class="n">tf</span>
<span class="kn">from</span> <span class="n">tensorflow_serving.apis</span> <span class="kn">import</span> <span class="n">predict_pb2</span><span class="p">,</span> <span class="n">prediction_service_pb2_grpc</span>

<span class="n">img</span> <span class="o">=</span> <span class="n">Image</span><span class="p">.</span><span class="nf">open</span><span class="p">(</span><span class="sh">'</span><span class="s">test-image.png</span><span class="sh">'</span><span class="p">)</span>
<span class="n">tensor</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="nf">asarray</span><span class="p">(</span><span class="n">img</span><span class="p">)</span>
<span class="n">tensor</span><span class="p">.</span><span class="n">shape</span> <span class="c1">#=&gt; [512,512,3]
</span>

<span class="n">request</span> <span class="o">=</span> <span class="n">predict_pb2</span><span class="p">.</span><span class="nc">PredictRequest</span><span class="p">()</span>
<span class="n">request</span><span class="p">.</span><span class="n">model_spec</span><span class="p">.</span><span class="n">name</span> <span class="o">=</span> <span class="sh">"</span><span class="s">mnist</span><span class="sh">"</span>
<span class="n">request</span><span class="p">.</span><span class="n">inputs</span><span class="p">[</span><span class="sh">'</span><span class="s">images</span><span class="sh">'</span><span class="p">].</span><span class="nc">CopyFrom</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="nf">make_tensor_proto</span><span class="p">(</span><span class="n">tensor</span><span class="p">))</span>


<span class="n">stub</span> <span class="o">=</span> <span class="n">prediction_service_pb2_grpc</span><span class="p">.</span><span class="nc">PredictionServiceStub</span><span class="p">(</span><span class="n">grpc</span><span class="p">.</span><span class="nf">insecure_channel</span><span class="p">(</span><span class="sh">"</span><span class="s">localhost:9000</span><span class="sh">"</span><span class="p">))</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">stub</span><span class="p">.</span><span class="nc">Predict</span><span class="p">(</span><span class="n">request</span><span class="p">)</span>
<span class="nf">print</span><span class="p">(</span><span class="n">response</span><span class="p">.</span><span class="n">outputs</span><span class="p">)</span>
</code></pre></div></div>

<p>And this is the equivalent <code class="language-plaintext highlighter-rouge">ruby</code> code:</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"grpc"</span>
<span class="nb">require</span> <span class="s2">"path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"</span>

<span class="c1"># magro reads images to numo arrays</span>
<span class="nb">require</span> <span class="s2">"magro"</span>


<span class="k">def</span> <span class="nf">build_predict_request</span><span class="p">(</span><span class="n">tensor</span><span class="p">)</span>
  <span class="n">request</span> <span class="o">=</span> <span class="no">Tensorflow</span><span class="o">::</span><span class="no">Serving</span><span class="o">::</span><span class="no">PredictRequest</span><span class="p">.</span><span class="nf">new</span>
  <span class="n">request</span><span class="p">.</span><span class="nf">model_spec</span> <span class="o">=</span> <span class="no">Tensorflow</span><span class="o">::</span><span class="no">Serving</span><span class="o">::</span><span class="no">ModelSpec</span><span class="p">.</span><span class="nf">new</span> <span class="ss">name: </span><span class="s1">'mnist'</span>
  <span class="n">request</span><span class="p">.</span><span class="nf">inputs</span><span class="p">[</span><span class="s1">'images'</span><span class="p">]</span> <span class="o">=</span> <span class="no">Tensorflow</span><span class="o">::</span><span class="no">TensorProto</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
    <span class="ss">binary_val: </span><span class="n">tensor</span><span class="p">.</span><span class="nf">to_binary</span><span class="p">,</span>
    <span class="ss">tensor_shape: </span><span class="no">Tensorflow</span><span class="o">::</span><span class="no">TensorShapeProto</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span>
      <span class="ss">dim: </span><span class="n">tensor</span><span class="p">.</span><span class="nf">shape</span><span class="p">.</span><span class="nf">map</span><span class="p">{</span> <span class="o">|</span><span class="n">size</span><span class="o">|</span> <span class="no">Tensorflow</span><span class="o">::</span><span class="no">TensorShapeProto</span><span class="o">::</span><span class="no">Dim</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="ss">size: </span><span class="n">size</span><span class="p">)</span> <span class="p">}</span>
    <span class="p">),</span>
    <span class="ss">dtype: </span><span class="no">Tensorflow</span><span class="o">::</span><span class="no">DataType</span><span class="o">::</span><span class="no">DT_UINT8</span>
  <span class="p">)</span>
<span class="k">end</span>

<span class="n">tensor</span> <span class="o">=</span> <span class="no">Magro</span><span class="o">::</span><span class="no">IO</span><span class="p">.</span><span class="nf">imread</span><span class="p">(</span><span class="s2">"test-image.png"</span><span class="p">)</span>
<span class="n">tensor</span><span class="p">.</span><span class="nf">shape</span> <span class="c1">#=&gt; [512,512,3]</span>

<span class="c1"># using tensorflow-serving-client example</span>
<span class="n">stub</span> <span class="o">=</span> <span class="no">Tensorflow</span><span class="o">::</span><span class="no">Serving</span><span class="o">::</span><span class="no">PredictionService</span><span class="o">::</span><span class="no">Stub</span><span class="p">.</span><span class="nf">new</span><span class="p">(</span><span class="s1">'localhost:9000'</span><span class="p">,</span> <span class="ss">:this_channel_is_insecure</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">stub</span><span class="p">.</span><span class="nf">predict</span><span class="p">(</span> <span class="n">build_predict_request</span><span class="p">(</span><span class="n">tensor</span><span class="p">)</span> <span class="p">)</span>
<span class="nb">puts</span> <span class="n">res</span><span class="p">.</span><span class="nf">outputs</span> <span class="c1"># returns PredictResponses</span>
</code></pre></div></div>

<p>That’s it!</p>

<h3 id="grpc-over-httpx">GRPC over HTTPX</h3>

<p><a href="https://honeyryderchuck.gitlab.io/httpx/wiki/GRPC">httpx ships with a grpc plugin</a>. This being a blog mostly about <code class="language-plaintext highlighter-rouge">httpx</code>, it’s only fitting I show how to do the above using it :) .</p>

<div class="language-ruby highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">require</span> <span class="s2">"httpx"</span>
<span class="nb">require</span> <span class="s2">"magro"</span>
<span class="nb">require</span> <span class="s2">"path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"</span>

<span class="c1"># ... same as above ...</span>

<span class="n">stub</span> <span class="o">=</span> <span class="no">HTTPX</span><span class="p">.</span><span class="nf">plugin</span><span class="p">(</span><span class="ss">:grpc</span><span class="p">).</span><span class="nf">build_stub</span><span class="p">(</span><span class="s2">"localhost:9000"</span><span class="p">,</span> <span class="ss">service: </span><span class="no">Tensorflow</span><span class="o">::</span><span class="no">Serving</span><span class="o">::</span><span class="no">PredictionService</span><span class="p">)</span>
<span class="n">res</span> <span class="o">=</span> <span class="n">stub</span><span class="p">.</span><span class="nf">predict</span><span class="p">(</span> <span class="n">build_predict_request</span><span class="p">(</span><span class="n">tensor</span><span class="p">)</span> <span class="p">)</span>
<span class="nb">puts</span> <span class="n">res</span><span class="p">.</span><span class="nf">outputs</span> <span class="c1"># returns PredictResponses</span>
</code></pre></div></div>

<h3 id="conclusion">Conclusion</h3>

<p>Hopefully you’ve gained enough interest about some <code class="language-plaintext highlighter-rouge">ruby</code> ML toolchain to investigate further. Who knows, maybe you can teach your researcher friends about. However, the ML industry won’t move away from <code class="language-plaintext highlighter-rouge">python</code> soon, so at least you know some more about how you can still use <code class="language-plaintext highlighter-rouge">ruby</code> to build your services, while interfacing remotely with ML models, running on dedicated hardware, using the gRPC protocol.</p>]]></content><author><name></name></author><summary type="html"><![CDATA[The Tensorflow framework is the most used framework when it comes to develop, train and deploy Machine Learning models. It ships with first class API support for python and C++, the former being a favourite of most data scientists, which explains the pervasiveness of python in virtually all of the companies relying on ML for their products.]]></summary></entry></feed>