Tensorflow Serving with Ruby
26 Aug 2021The Tensorflow framework is the most used framework when it comes to develop, train and deploy Machine Learning models. It ships with first class API support for python
and C++
, the former being a favourite of most data scientists, which explains the pervasiveness of python
in virtually all of the companies relying on ML for their products.
When it comes to deploying ML-based web services, there are two options. The first one is to develop a python
web service, using something like flask
or django
, add tensorflow
as a dependency, and run the model from within it. This approach is straightforward, but it comes with its own set of problems: rolling out model upgrades has to be done for each application using it, and even ensuring that the same tensorflow
library version is used everywhere tends to be difficult, it being a pretty heavy dependency, which often conflicts with other libraries in the python ecosystem, and is frequently the subject of CVEs. All of this introduces risk in the long run.
The other approach is to deploy the models using Tensorflow Serving (pytorch has something similar, torchserve). In short, it exposes the execution of the ML models over the network “as a service”. It supports model versioning, and can be interfaced with via gRPC or REST API, which solves the main integration issues from the previously described approach. It thus allows to compartimentalize the risks from the other approach, while also enabling the possibilitiy of throwing dedicated hardware at it.
It also allows you to ditch python
when building applications.
Research and Development
Now, I’m not a python
hater. It’s an accessible programming language. It shares a lot of benefits and drawbacks with ruby
. But by the time a company decides to invest in ML to improve their product, the tech team might already be heavily familiar with a different tech stack. Maybe it’s ruby
, maybe java
, maybe go
. It’s unreasonable to replace all of them with python
experts. It’s possible to ask them to use a bit of python
, but that comes at the cost of learning a new stack (thereby decreasing quality of delivery) and alienating the employees (thereby increasing turnover).
It’s also unreasonable to ask from the new data science team to not use their preferred python
tech stack. It’s an ML lingua franca, and there’s way more years of investment and resources poured into libraries like numpy or scikit. And although there’s definitely value in improving the state of ML in your preferred languages (shout out at the SciRuby folks) and diminish the overall industry dependency on python
, that should not come at the cost of decreasing the quality of your product.
Therefore, tensorflow-serving
allows the tech team to focus on developing and shipping the best possible product, and the research team to focus on developing the best possible models. Everyone’s productive and happy.
Tensorflow Serving with JSON
As stated above, tensorflow serving
services are exposed using gRPC
and REST APIs. IF you didn’t use gRPC
before, you’ll probably privilege the latter; you’ve done HTTP JSON clients for other APIs before, how hard can it be creating an HTTP client for it?
While certainly possible, going this route will come at a cost; besides ensuring that the HTTP layer works reliably, using persistent connections, timeouts, etc, there’s the cost of JSON.
tensorflow
(and other ML frameworks in general) makes heavy use of “tensors”, multi-dimensional same-type arrays (vectors, matrixes…), describing, for example, the coordinates of a face recognized in an image. These tensors are represented in memory as contiguous array objects, and can be therefore easily serialized into a bytestream. Libraries like numpy
(or numo
in ruby) take advantage of this memory layout to provide high-performance mathematical and logical operations.
JSON is UTF-8, and can’t encode byte streams; in order to send and receive byte streams using the REST API interface, you’ll have to convert to and from base 64 notation. This means that, besides the CPU usage overhead for these operations, you should expect a ~33% increase in the transmitted payload.
The tensorflow-serving
REST API proxies to the gRPC
layer, so there’s also this extra level of indirection to account for.
gRPC
doesn’t suffer from these drawbacks; on top of HTTP/2
, it not only improves connnectivity, it also solves multiplexing and streaming; using protobufs
, it has a typed message serialization protocol which supports byte streams.
How can it be used in ruby
then?
Tensorflow Serving with Protobufs
Tensorflow Serving calls are performed using a standardized set of common protobufs, which .proto
definitions can be found both in the tensorflow repo, as well as in the tensorflow-serving repo. The most important for our case are declared under prediction_service.proto, which defines request and response protobufs declaring which model version to run, and how input and output tensors are laid out.
Both libraries above already package the python
protobufs. To use them in ruby
, you have to compile them yourself using the protobuf gem. For this particular case, compiling can be a pretty involved process, which looks like this:
# gem install grpc-tools
TF_VERSION="2.5.0"
TF_SERVING_VERSION="2.5.1"
PROTO_PATH=path/to/protos
set -o pipefail
curl -L -o tensorflow.zip https://github.com/tensorflow/tensorflow/archive/v$TF_VERSION.zip
unzip tensorflow.zip && rm tensorflow.zip
mv tensorflow-$TF_VERSION ${PROTO_PATH}/tensorflow
curl -L -o tf-serving.zip https://github.com/tensorflow/serving/archive/$TF_SERVING_VERSION.zip
unzip tf-serving.zip && rm tf-serving.zip
mv serving-$TF_SERVING_VERSION/tensorflow_serving ${PROTO_PATH}/tensorflow
TF_SERVING_PROTO=${PROTO_PATH}/ruby
mkdir ${TF_SERVING_PROTO}
grpc_tools_ruby_protoc \
-I ${PROTO_PATH}/tensorflow/tensorflow/core/framework/*.proto \
--ruby_out=${TF_SERVING_PROTO} \
--grpc_out=${TF_SERVING_PROTO} \
--proto_path=${PROTO_PATH}/tensorflow
grpc_tools_ruby_protoc \
-I ${PROTO_PATH}/tensorflow/tensorflow/core/example/*.proto \
--ruby_out=${TF_SERVING_PROTO} \
--grpc_out=${TF_SERVING_PROTO} \
--proto_path=${PROTO_PATH}/tensorflow
grpc_tools_ruby_protoc \
-I ${PROTO_PATH}/tensorflow/tensorflow/core/protobuf/*.proto \
--ruby_out=${TF_SERVING_PROTO} \
--grpc_out=${TF_SERVING_PROTO} \
--proto_path=${PROTO_PATH}/tensorflow
grpc_tools_ruby_protoc \
${PROTO_PATH}/tensorflow/tensorflow_serving/apis/*.proto \
--ruby_out=${TF_SERVING_PROTO} \
--grpc_out=${TF_SERVING_PROTO} \
--proto_path=${PROTO_PATH}/tensorflow
ls $TF_SERVING_PROTO
NOTE: There’s also the tensorflow-serving-client, which already ships with the necessary ruby
protobufs, however there hasn’t been any updates in more than 5 years, so I can’t attest to its state of maintenance. So if you want to use this in production, make sure you generate ruby stubs from the latest version of definitons.
Once the protobufs are available, creating a PredictRequest
is simple. Here’s how you’d encode a request to a model called mnist
, taking a 784-wide float array as input:
require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"
tensor = [0.0] * 784
request = Tensorflow::Serving::PredictRequest.new
request.model_spec = Tensorflow::Serving::ModelSpec.new name: 'mnist'
request.inputs['images'] = Tensorflow::TensorProto.new(
float_val: tensor,
tensor_shape: Tensorflow::TensorShapeProto.new(
dim: [
Tensorflow::TensorShapeProto::Dim.new(size: 1),
Tensorflow::TensorShapeProto::Dim.new(size: 784)
]
),
dtype: Tensorflow::DataType::DT_FLOAT
)
NOTE: tensorflow
python API ships with a very useful function called make_tensor_proto, which could do the above as a “one-liner”. While it’s certainly possible to code a similar function in ruby
, it’s a pretty involved process which is beyond the scope of this post.
As an example, this one is easy to grasp. However, we’ll have to deal with much larger tensors in production, which is going to get heavier and slower to deal with using ruby
arrays.
Tensorflow Serving with Numo and GRPC
In python
, the standard for using n-dimensional arrays is numpy. ruby
has a similar library called numo.
It aims at providing the same APIs as numpy
, which is mostly an aspirational goal, as keeping up with numpy
is hard (progress can be tracked here).
A lot can be done already though, such as image processing. If our model requires an image, this is how it can be done in python
:
# using numpy
import grpc
import numpy as np
from PIL import Image
import tensorflow as tf
from tensorflow_serving.apis import predict_pb2, prediction_service_pb2_grpc
img = Image.open('test-image.png')
tensor = np.asarray(img)
tensor.shape #=> [512,512,3]
request = predict_pb2.PredictRequest()
request.model_spec.name = "mnist"
request.inputs['images'].CopyFrom(tf.make_tensor_proto(tensor))
stub = prediction_service_pb2_grpc.PredictionServiceStub(grpc.insecure_channel("localhost:9000"))
response = stub.Predict(request)
print(response.outputs)
And this is the equivalent ruby
code:
require "grpc"
require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"
# magro reads images to numo arrays
require "magro"
def build_predict_request(tensor)
request = Tensorflow::Serving::PredictRequest.new
request.model_spec = Tensorflow::Serving::ModelSpec.new name: 'mnist'
request.inputs['images'] = Tensorflow::TensorProto.new(
binary_val: tensor.to_binary,
tensor_shape: Tensorflow::TensorShapeProto.new(
dim: tensor.shape.map{ |size| Tensorflow::TensorShapeProto::Dim.new(size: size) }
),
dtype: Tensorflow::DataType::DT_UINT8
)
end
tensor = Magro::IO.imread("test-image.png")
tensor.shape #=> [512,512,3]
# using tensorflow-serving-client example
stub = Tensorflow::Serving::PredictionService::Stub.new('localhost:9000', :this_channel_is_insecure)
res = stub.predict( build_predict_request(tensor) )
puts res.outputs # returns PredictResponses
That’s it!
GRPC over HTTPX
httpx ships with a grpc plugin. This being a blog mostly about httpx
, it’s only fitting I show how to do the above using it :) .
require "httpx"
require "magro"
require "path/to/protos/ruby/tensorflow_serving/apis/prediction_service_pb"
# ... same as above ...
stub = HTTPX.plugin(:grpc).build_stub("localhost:9000", service: Tensorflow::Serving::PredictionService)
res = stub.predict( build_predict_request(tensor) )
puts res.outputs # returns PredictResponses
Conclusion
Hopefully you’ve gained enough interest about some ruby
ML toolchain to investigate further. Who knows, maybe you can teach your researcher friends about. However, the ML industry won’t move away from python
soon, so at least you know some more about how you can still use ruby
to build your services, while interfacing remotely with ML models, running on dedicated hardware, using the gRPC protocol.