Hypertuned Tensorflow Neural Net + BentoML (Fashion MNIST)
This project combines TensorFlow Keras for training a neural network model on the Fashion MNIST dataset and BentoML for seamlessly deploying and serving the model via an API, revolutionizing clothing classification in the fashion industry.
Introduction
Utilizing the popular Fashion MNIST dataset, the project delves into training and testing a robust model with TensorFlow Keras, including hyperparameter tuning using the Keras Tuner library. The real-world applicability of this model is significantly enhanced by integrating it with BentoML, which facilitates the deployment of the trained model through a user-friendly API. Additionally, the model is encapsulated within a Docker container using BentoML, streamlining deployment and ensuring consistency across different environments.
Loading and preprocessing the dataset
First we create the Fashion MNIST dataset object and then create the training and testing splits of the dataset. We then expand the dimensions of the image arrays and normalize the image values to be between 0 and 1.
training_images = np.expand_dims(training_images,3)
(training_images, training_labels), (test_images, test_labels) = fmnist.load_data()
training_images = np.expand_dims(training_images,3)
test_images = np.expand_dims(test_images, 3)
training_images = training_images.astype('float32') / 255.0
test_images = test_images.astype('float32') / 255.0
Hyperparameter tuning
To make the best out of our model, first we try to get the best hyperparameters for our model. We start off with defining the 'model_builder' function to build and compile a TF Keras model with tunable hyperparameters. The hyperparameter library we're using is Keras Tuner.
Let's create our model builder function
def model_builder(hp):
model = tf.keras.Sequential()
filters_1 = hp.Int('conv2D_1_units', min_value=8, max_value=128, step=8)
kernel_size_choice_1 = hp.Choice('conv2D_1_kernel_size', ['(2,2)','(3,3)','(4,4)'])
kernel_size_1 = ast.literal_eval(kernel_size_choice_1)
activation_1 = hp.Choice('conv2D_1_activation', ['relu','elu'])
model.add(tf.keras.layers.Conv2D(filters=filters_1,
kernel_size=kernel_size_1,
activation=activation_1,
input_shape=(28, 28,1)))
model.add(tf.keras.layers.MaxPool2D(2,2))
#---
model.add(tf.keras.layers.Flatten()
hp_units = hp.Int('units', min_value=32, max_value=512, step=32)
model.add(tf.keras.layers.Dense(units=hp_units, activation='relu'))
model.add(tf.keras.layers.Dense(10))
hp_learning_rate = hp.Choice('learning_rate', values=[1e-2, 1e-3, 1e-4])
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=hp_learning_rate),
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
metrics=['accuracy'])
return model
Then we initialize the Hyperband tuner
Here the tuner is created so as to tune a model with specific hyperparameters.
tuner = kt.Hyperband(model_builder,
objective='val_accuracy',
max_epochs=10,
factor=3,
directory='my_dir',
project_name='space_search_fmnist')
Retrieving the best hyperparameters
We now let the tuner find the best hyperparameters for the model by training it multiple times and retrieving the best set of hyperparameters to use for the model. We also set the callback to use the keras early stopping callback to stop the validation loss optimization process if it doesn't improve for 5 consecutive epochs during the training.
tuner.search(training_images, training_labels, epochs=50, validation_split=0.2, callbacks=[stop_early])
best_hps=tuner.get_best_hyperparameters(num_trials=1)[0]
Building and training
Now that we have the best set of hyperparameters to use, we build a new model instance using these hyperparameters using tuner.hypermodel.build function which gives us back the compiled model. The compiled model is then trained for 50 epochs (we can even train it for 100 or more epochs) where we store the model's performance in the history object.
model = tuner.hypermodel.build(best_hps)
history = model.fit(training_images, training_labels, epochs=50, validation_split=0.2)
Best Epoch
With the stored training history, we can get the best epoch to use so our model is updated with the most optimum parameter weights used to achieve the highest validation accuracy.
val_acc_per_epoch = history.history['val_accuracy']
best_epoch = val_acc_per_epoch.index(max(val_acc_per_epoch)) + 1
Building the Hypermodel
With all the best set of hyperparameters and the best number of epochs to use, we build and train the model and can rightly call it a hypermodel
hypermodel = tuner.hypermodel.build(best_hps)
hypermodel.fit(training_images, training_labels, epochs=best_epoch, validation_split=0.2)
Evaluating and Testing
Let's now evaluate the model on the test dataset and note down the loss and accuracy.
eval_result = hypermodel.evaluate(test_images, test_labels)
print("[test loss, test accuracy]:", eval_result)
Once we are satisfied with the model evaluation results, we would need to make it available as a service for the users to make use of our ML model.
BentoML for model serving
For this project we use BentoML as it streamlines machine learning model deployment by enabling easy packaging, versatile deployment options including Docker and cloud platforms, and REST API server generation. It supports multiple ML frameworks including Tensorflow, making models production-ready.
Saving the model
We use BentoML to save the hypermodel we built with specified configuration for serving. The signatures call indicate that the default call method of the model would be used to serve predictions. The batchable option is set to True so as to optimize the throughput and efficiency by batching the incoming prediction requests.
bentoml.tensorflow.save_model(
"tensorflow_mnist",
hypermodel,
signatures={"__call__": {"batchable": True}},
)
Creating BentoML service for model serving
Here we build a BentoML service that is configured to use a model runner for efficient model serving.
First we get the latest version of the saved model and convert it to a model runner which provides efficient model serving capabilities like batch requests, serving inference via a separate process or machine.
We then initialize the service.
mnist_runner = bentoml.tensorflow.get("tensorflow_mnist:latest").to_runner()
svc = bentoml.Service(
name="tensorflow_mnist_hypertuned",
runners=[mnist_runner],
)
Create the API endpoint
We can now define an asynchronous API endpoint for a BentoML service that processes image inputs, makes predictions and returns the predicts through the API endpoint or function.
@svc.api(input=Image(), output=NumpyNdarray(dtype="float32"))
async def predict_image(f: PILImage) -> "np.ndarray":
arr = np.array(f) / 255.0
arr = np.expand_dims(arr, (0, 3)).astype("float32")
return await mnist_runner.async_run(arr)
Test out the BentoML service
The API endpoint can be tested on the local instance by starting a dev server to test out the service and then using 'curl' command or any other tool to query the API endpoint
curl -H "Content-Type: multipart/form-data" -F'fileobj=@samples/0.png;type=image/png' http://127.0.0.1:3000/predict_image
Build the Bento for Deployment
The bentofile.yaml is automatically created in the source directory. With a simple command as below, all the necessary components required to run the model are packaged together ready for deployment into a Bento - a format used by BentoML for model serving.
bentoml build
To view the list of all Bento models available for deployment, we can use the following command.
bentoml list
We can easily create a docker image of the BentoML model by then running the command:
bentoml containerize tensorflow_mnist_hypertuned:latest
And then use standard docker workflow to first build the docker container from the image and then deploy it on the server.
Github Repository
The github repo with all the code deliverables can be found here: (https://github.com/shivrajd/fmnst_tnsr_bnto)