Tensorflow Serving with Ruby

The Tensorflow framework is the most used framework when it comes to develop, train and deploy Machine Learning models. It ships with first class API support for python and C++, the former being a favourite of most data scientists, which explains the pervasiveness of python in virtually all of the companies relying on ML for their products.

When it comes to deploying ML-based web services, there are two options. The first one is to develop a python web service, using something like flask or django, add tensorflow as a dependency, and run the model from within it. This approach is straightforward, but it comes with its own set of problems: rolling out model upgrades has to be done for each application using it, and even ensuring that the same tensorflow library version is used everywhere tends to be difficult, it being a pretty heavy dependency, which often conflicts with other libraries in the python ecosystem, and is frequently the subject of CVEs. All of this introduces risk in the long run.

The other approach is to deploy the models using Tensorflow Serving (pytorch has something similar, torchserve). In short, it exposes the execution of the ML models over the network “as a service”. It supports model versioning, and can be interfaced with via gRPC or REST API, which solves the main integration issues from the previously described approach. It thus allows to compartimentalize the risks from the other approach, while also enabling the possibilitiy of throwing dedicated hardware at it.

It also allows you to ditch python when building applications.

Research and Development

Now, I’m not a python hater. It’s an accessible programming language. It shares a lot of benefits and drawbacks with ruby. But by the time a company decides to invest in ML to improve their product, the tech team might already be heavily familiar with a different tech stack. Maybe it’s ruby, maybe java, maybe go. It’s unreasonable to replace all of them with python experts. It’s possible to ask them to use a bit of python, but that comes at the cost of learning a new stack (thereby decreasing quality of delivery) and alienating the employees (thereby increasing turnover).

It’s also unreasonable to ask from the new data science team to not use their preferred python tech stack. It’s an ML lingua franca, and there’s way more years of investment and resources poured into libraries like numpy or scikit. And although there’s definitely value in improving the state of ML in your preferred languages (shout out at the SciRuby folks) and diminish the overall industry dependency on python, that should not come at the cost of decreasing the quality of your product.

Therefore, tensorflow-serving allows the tech team to focus on developing and shipping the best possible product, and the research team to focus on developing the best possible models. Everyone’s productive and happy.

Tensorflow Serving with JSON

As stated above, tensorflow serving services are exposed using gRPC and REST APIs. IF you didn’t use gRPC before, you’ll probably privilege the latter; you’ve done HTTP JSON clients for other APIs before, how hard can it be creating an HTTP client for it?

While certainly possible, going this route will come at a cost; besides ensuring that the HTTP layer works reliably, using persistent connections, timeouts, etc, there’s the cost of JSON.

tensorflow (and other ML frameworks in general) makes heavy use of “tensors”, multi-dimensional same-type arrays (vectors, matrixes…), describing, for example, the coordinates of a face recognized in an image. These tensors are represented in memory as contiguous array objects, and can be therefore easily serialized into a bytestream. Libraries like numpy (or numo in ruby) take advantage of this memory layout to provide high-performance mathematical and logical operations.

JSON is UTF-8, and can’t encode byte streams; in order to send and receive byte streams using the REST API interface, you’ll have to convert to and from base 64 notation. This means that, besides the CPU usage overhead for these operations, you should expect a ~33% increase in the transmitted payload.

The tensorflow-serving REST API proxies to the gRPC layer, so there’s also this extra level of indirection to account for.

gRPC doesn’t suffer from these drawbacks; on top of HTTP/2, it not only improves connnectivity, it also solves multiplexing and streaming; using protobufs, it has a typed message serialization protocol which supports byte streams.

How can it be used in ruby then?

Tensorflow Serving with Protobufs

Tensorflow Serving calls are performed using a standardized set of common protobufs, which .proto definitions can be found both in the tensorflow repo, as well as in the tensorflow-serving repo. The most important for our case are declared under prediction_service.proto, which defines request and response protobufs declaring which model version to run, and how input and output tensors are laid out.

Both libraries above already package the python protobufs. To use them in ruby, you have to compile them yourself using the protobuf gem. For this particular case, compiling can be a pretty involved process, which looks like this:

# gem install grpc-tools

TF_VERSION="2.5.0"
TF_SERVING_VERSION="2.5.1"
PROTO_PATH=path/to/protos
set -o pipefail

curl -L -o tensorflow.zip https://github.com/tensorflow/tensorflow/archive/v$TF_VERSION.zip
unzip tensorflow.zip && rm tensorflow.zip
mv tensorflow-$TF_VERSION ${PROTO_PATH}/tensorflow

curl -L -o tf-serving.zip https://github.com/tensorflow/serving/archive/$TF_SERVING_VERSION.zip
unzip tf-serving.zip && rm tf-serving.zip
mv serving-$TF_SERVING_VERSION/tensorflow_serving ${PROTO_PATH}/tensorflow


TF_SERVING_PROTO=${PROTO_PATH}/ruby
mkdir ${TF_SERVING_PROTO}

grpc_tools_ruby_protoc \
    -I ${PROTO_PATH}/tensorflow/tensorflow/core/framework/*.proto \
    --ruby_out=${TF_SERVING_PROTO} \
    --grpc_out=${TF_SERVING_PROTO} \
    --proto_path=${PROTO_PATH}/tensorflow

grpc_tools_ruby_protoc \
    -I ${PROTO_PATH}/tensorflow/tensorflow/core/example/*.proto \
    --ruby_out=${TF_SERVING_PROTO} \
    --grpc_out=${TF_SERVING_PROTO} \
    --proto_path=${PROTO_PATH}/tensorflow

grpc_tools_ruby_protoc \
    -I ${PROTO_PATH}/tensorflow/tensorflow/core/protobuf/*.proto \
    --ruby_out=${TF_SERVING_PROTO} \
    --grpc_out=${TF_SERVING_PROTO} \
    --proto_path=${PROTO_PATH}/tensorflow

grpc_tools_ruby_protoc \
    ${PROTO_PATH}/tensorflow/tensorflow_serving/apis/*.proto \
    --ruby_out=${TF_SERVING_PROTO} \
    --grpc_out=${TF_SERVING_PROTO} \
    --proto_path=${PROTO_PATH}/tensorflow

ls $TF_SERVING_PROTO

NOTE: There’s also the tensorflow-serving-client, which already ships with the necessary ruby protobufs, however there hasn’t been any updates in more than 5 years, so I can’t attest to its state of maintenance. So if you want to use this in production, make sure you generate ruby stubs from the latest version of definitons.

Once the protobufs are available, creating a PredictRequest is simple. Here’s how you’d encode a request to a model called mnist, taking a 784-wide float array as input:

require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"

tensor = [0.0] * 784

request = Tensorflow::Serving::PredictRequest.new
request.model_spec = Tensorflow::Serving::ModelSpec.new name: 'mnist'
request.inputs['images'] = Tensorflow::TensorProto.new(
  float_val: tensor,
  tensor_shape: Tensorflow::TensorShapeProto.new(
    dim: [
      Tensorflow::TensorShapeProto::Dim.new(size: 1),
      Tensorflow::TensorShapeProto::Dim.new(size: 784)
    ]
  ),
  dtype: Tensorflow::DataType::DT_FLOAT
)

NOTE: tensorflow python API ships with a very useful function called make_tensor_proto, which could do the above as a “one-liner”. While it’s certainly possible to code a similar function in ruby, it’s a pretty involved process which is beyond the scope of this post.

As an example, this one is easy to grasp. However, we’ll have to deal with much larger tensors in production, which is going to get heavier and slower to deal with using ruby arrays.

Tensorflow Serving with Numo and GRPC

In python, the standard for using n-dimensional arrays is numpy. ruby has a similar library called numo.

It aims at providing the same APIs as numpy, which is mostly an aspirational goal, as keeping up with numpy is hard (progress can be tracked here).

A lot can be done already though, such as image processing. If our model requires an image, this is how it can be done in python:

# using numpy
import grpc
import numpy as np
from PIL import Image
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc

img = Image.open('test-image.png')
tensor = np.asarray(img)
tensor.shape #=> [512,512,3]


request = predict_pb2.PredictRequest()
request.model_spec.name = "mnist"
request.inputs['images'].CopyFrom(tf.make_tensor_proto(tensor))


stub = prediction_service_pb2_grpc.PredictionServiceStub(grpc.insecure_channel("localhost:9000"))
response = stub.Predict(request)
print(response.outputs)

And this is the equivalent ruby code:

require "grpc"
require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"

# magro reads images to numo arrays
require "magro"


def build_predict_request(tensor)
  request = Tensorflow::Serving::PredictRequest.new
  request.model_spec = Tensorflow::Serving::ModelSpec.new name: 'mnist'
  request.inputs['images'] = Tensorflow::TensorProto.new(
    binary_val: tensor.to_binary,
    tensor_shape: Tensorflow::TensorShapeProto.new(
      dim: tensor.shape.map{ |size| Tensorflow::TensorShapeProto::Dim.new(size: size) }
    ),
    dtype: Tensorflow::DataType::DT_UINT8
  )
end

tensor = Magro::IO.imread("test-image.png")
tensor.shape #=> [512,512,3]

# using tensorflow-serving-client example
stub = Tensorflow::Serving::PredictionService::Stub.new('localhost:9000', :this_channel_is_insecure)
res = stub.predict( build_predict_request(tensor) )
puts res.outputs # returns PredictResponses

That’s it!

GRPC over HTTPX

httpx ships with a grpc plugin. This being a blog mostly about httpx, it’s only fitting I show how to do the above using it :) .

require "httpx"
require "magro"
require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"

# ... same as above ...

stub = HTTPX.plugin(:grpc).build_stub("localhost:9000", service: Tensorflow::Serving::PredictionService)
res = stub.predict( build_predict_request(tensor) )
puts res.outputs # returns PredictResponses

Conclusion

Hopefully you’ve gained enough interest about some ruby ML toolchain to investigate further. Who knows, maybe you can teach your researcher friends about. However, the ML industry won’t move away from python soon, so at least you know some more about how you can still use ruby to build your services, while interfacing remotely with ML models, running on dedicated hardware, using the gRPC protocol.