API Documentation

Remember: One of the motive behind the existence of stockroom is the simplicity and that’s what we have considered whenever we added or removed a new API. What does that mean? It means that we have tried hard to keep the number of APIs to a minimum while catering the requirements of a developer. Here we discuss the python APIs available in stockroom.

Initialization

init hangar repo, create stock file and add details to .gitignore

StockRoom class

class StockRoom[source]

This class is the only user entrypoint of stockroom that interacts with an existing stock repository i.e. all the repository interaction a user would do will have to go through an object of this class. Also, stockroom comes with three different storages

  1. Model: Weights of models built with keras.Model or torch.nn
  2. Data: Dataset as numpy arrays/tensors
  3. Tag: Information related to an experiment such as metrics, parameters etc

An object of this class holds an object to these three storages each has a dictionary style access machinery

Parameters:path (Union[str, Path, None]) – Path the to the stock repository. If None, it traverse up from pwd till it finds the stock root (stock root is the location where head.stock file is located and ideally will have .git folder as well

Note

By default (if no path is provided while initializing StockRoom), it checks for the stock root. A stock root is a directory that is

  1. a git repository (has .git folder)
  2. a hangar repository (has .hangar folder)
  3. a stock repository (has head.stock file)

If you’d like to skip these checks and just use stockroom (for example: if you are a hangar user and use stockroom just for storing models in your hangar repository, it doesn’t need to be a stock repository and hence can skip these checks), provide the path to the repository explicitly. The rationale here is, if you provide the path, we trust you that you know what you doing on that path

commit(message: str, update_head=True) → str[source]

Make a stock commit. A stock commit is a hangar commit plus writing the commit hash to the stock file. This function opens the stock checkout in write mode and close after the commit. Which means, no other write operations should be running while stock commit is in progress

get_hangar_checkout(write: bool = False) → Any[source]

Fetch the hangar checkout object that’s been used by stockroom internally. Don’t do this unless you know what you are doing. Directly interacting with hangar could tamper the data stored by stockroom if you are not familiar with how hangar stores data and it’s APIs.

Parameters:write (bool) – Whether you need a write enabled checkout or not
Returns:A hangar checkout object which can be used to interact with the repository data
Return type:Union[ReaderCheckout, WriterCheckout]

Warning

You won’t be able to fetch a write enabled checkout if you are in optimize context manager. Similarly if you fetch a write enabled checkout from here, you neither be able to do any write operation through stockroom nor be able to open optimize context manager

optimize()[source]

This context manager, on enter, asks the StockRepository object to open the global checkout. Global checkout is being stored as property of the repository singleton. Hence all the downstream tasks will get this opened checkout until it is closed. This global checkout will be closed on the exit of this context manager

Storages

Stockroom introduces three different storages for different storage needs and all the APIs in stockroom is to deal with these storages

class Data[source]

Data storage is essentially a wrapper over hangar’s column API which let stockroom handles the checkout scope. The instance creation is not something user would directly do here. Instead, a created instance will be available at stockroom.StockRoom

Note

Each __getitem__ or __setitem__ call will open & close a hangar checkout. Unlike other storages, this is a crucial information for data storage because both reading and writing of data happens quite frequently in a pipeline unlike saving or retrieving model or parameters or metrics. So for optimizing, this you could make the data read/write inside the context manager stockroom.StockRoom.optimize()

Examples

>>> stock = StockRoom()
>>> stock.data['column1']['sample1'] = np.arange(20).reshape(5, 4)
>>> sample = stock.data['column1']['sample5']

Inside context manager

>>> with stock.optimize():
...     sample = stock.data['coloumn1']['sample1']
class Model[source]

Model class utilizes hangar columns to store pieces of a model and use hangar metadata to store the information required to collate it back to a model. Currently, it supports keras.Model and torch.nn.Module models. ModelStore instance, on stockroom.storages.Model.save_weights() creates few columns (one column for each data type) to store the weights and create one column specifically to store the shape of each layer. This shape column is needed because the weights of each layer would be flattened before saving. This is essential since handling variable shapes and variable ranks are more complex than flattening and reshaping-back the weights.

Examples

>>> import torch
>>> import tensorflow as tf
>>> torch_model = torch.Sequential(...)
>>> stock.model['torch_model'] = torch_model.state_dict()
>>> tf_model = tf.Keras.Sequential()
>>> tf_model.add(tf.layers.Dense(64, activation='relu'))
>>> stock.model['tf_model'] = tf_model.get_weights()

But if you can make it easy by calling special functions that knows how to fetch weights from the model or how to put weights back to model. Checkout Model.save_weights() & Model.load_weights() for more details

load_weights(name, model)[source]

Load the parameters from hangar repo, put it back to the model object. It looks for all the columns that matches the model name and reshape it back to the actual shape (actual shape is stored in another column). Different frameworks has different way of loading the parameter to model object. For identifying this, Model.save_weights() also saves the framework name while saving the model

Parameters:
  • name (str) – Name of the key from which the model parameters are loaded
  • model (Any) – Model object from any supported framework onto which the parameters are loaded. Loading the parameters is an inplace operation and hence this function doesn’t return anything

Examples

>>> stock.model.load_weights('torch_model', torch_model)
save_weights(name, model)[source]

A convenient function to call when you don’t want to deal with weight extraction from the model, regardless of which framework do you use to write model, as far as that framework is supported by stockroom. This function expects the model object from one of the supported framework. This will call the corresponding function of that framework to fetch the weights and then call Model.__setitem__() to save the weights.

Parameters:
  • name (str) – Name of the key to which the model parameters are saved
  • model (Any) – Object from any supported framework

Examples

>>> stock.model.save_weights('torch_model', torch_model)
class Tag[source]

Tag store, as the name suggests, is to store tags related to an experiment. Ideally/ eventually this store information on commit level and would not pass it down the commit history tree. But currently the internal implementation of hangar doesn’t allow that and hence we store the information on metadata store in hangar. It currently takes int, float & str data types and convert it to a string which is the only data type supported by hangar metadata. But Tag stores the type of the data in another metadata “column” which will be uesd while pulling the data back from the Tag store.

Examples

>>> stock.tag['epochs'] = 1000
>>> stock.tag['lr'] = 0.0001
>>> stock.tag['optimizer'] = 'adam'