API Documentation¶
Remember: One of the motive behind the existence of stockroom is the simplicity and that’s what we have considered whenever we added or removed a new API. What does that mean? It means that we have tried hard to keep the number of APIs to a minimum while catering the requirements of a developer. Here we discuss the python APIs available in stockroom.
Initialization¶
init hangar repo, create stock file and add details to .gitignore
StockRoom class¶
-
class
StockRoom
[source]¶ This class is the only user entrypoint of stockroom that interacts with an existing stock repository i.e. all the repository interaction a user would do will have to go through an object of this class. Also, stockroom comes with three different storages
- Model: Weights of models built with
keras.Model
ortorch.nn
- Data: Dataset as numpy arrays/tensors
- Tag: Information related to an experiment such as metrics, parameters etc
An object of this class holds an object to these three storages each has a dictionary style access machinery
Parameters: path (Union[str, Path, None]) – Path the to the stock repository. If None, it traverse up from pwd till it finds the stock root (stock root is the location where head.stock file is located and ideally will have .git folder as well Note
By default (if no path is provided while initializing
StockRoom
), it checks for the stock root. A stock root is a directory that is- a git repository (has .git folder)
- a hangar repository (has .hangar folder)
- a stock repository (has head.stock file)
If you’d like to skip these checks and just use stockroom (for example: if you are a hangar user and use stockroom just for storing models in your hangar repository, it doesn’t need to be a stock repository and hence can skip these checks), provide the path to the repository explicitly. The rationale here is, if you provide the path, we trust you that you know what you doing on that path
-
commit
(message: str, update_head=True) → str[source]¶ Make a stock commit. A stock commit is a hangar commit plus writing the commit hash to the stock file. This function opens the stock checkout in write mode and close after the commit. Which means, no other write operations should be running while stock commit is in progress
-
get_hangar_checkout
(write: bool = False) → Any[source]¶ Fetch the hangar checkout object that’s been used by stockroom internally. Don’t do this unless you know what you are doing. Directly interacting with hangar could tamper the data stored by stockroom if you are not familiar with how hangar stores data and it’s APIs.
Parameters: write (bool) – Whether you need a write enabled checkout or not Returns: A hangar checkout object which can be used to interact with the repository data Return type: Union[ReaderCheckout, WriterCheckout] Warning
You won’t be able to fetch a write enabled checkout if you are in
optimize
context manager. Similarly if you fetch a write enabled checkout from here, you neither be able to do any write operation through stockroom nor be able to openoptimize
context manager
-
optimize
()[source]¶ This context manager, on enter, asks the
StockRepository
object to open the global checkout. Global checkout is being stored as property of the repository singleton. Hence all the downstream tasks will get this opened checkout until it is closed. This global checkout will be closed on the exit of this context manager
- Model: Weights of models built with
Storages¶
Stockroom introduces three different storages for different storage needs and all the APIs in stockroom is to deal with these storages
-
class
Data
[source]¶ Data storage is essentially a wrapper over hangar’s column API which let stockroom handles the checkout scope. The instance creation is not something user would directly do here. Instead, a created instance will be available at
stockroom.StockRoom
Note
Each
__getitem__
or__setitem__
call will open & close a hangar checkout. Unlike other storages, this is a crucial information for data storage because both reading and writing of data happens quite frequently in a pipeline unlike saving or retrieving model or parameters or metrics. So for optimizing, this you could make the data read/write inside the context managerstockroom.StockRoom.optimize()
Examples
>>> stock = StockRoom() >>> stock.data['column1']['sample1'] = np.arange(20).reshape(5, 4) >>> sample = stock.data['column1']['sample5']
Inside context manager
>>> with stock.optimize(): ... sample = stock.data['coloumn1']['sample1']
-
class
Model
[source]¶ Model class utilizes hangar columns to store pieces of a model and use hangar metadata to store the information required to collate it back to a model. Currently, it supports
keras.Model
andtorch.nn.Module
models. ModelStore instance, onstockroom.storages.Model.save_weights()
creates few columns (one column for each data type) to store the weights and create one column specifically to store the shape of each layer. This shape column is needed because the weights of each layer would be flattened before saving. This is essential since handling variable shapes and variable ranks are more complex than flattening and reshaping-back the weights.Examples
>>> import torch >>> import tensorflow as tf >>> torch_model = torch.Sequential(...) >>> stock.model['torch_model'] = torch_model.state_dict() >>> tf_model = tf.Keras.Sequential() >>> tf_model.add(tf.layers.Dense(64, activation='relu')) >>> stock.model['tf_model'] = tf_model.get_weights()
But if you can make it easy by calling special functions that knows how to fetch weights from the model or how to put weights back to model. Checkout
Model.save_weights()
&Model.load_weights()
for more details-
load_weights
(name, model)[source]¶ Load the parameters from hangar repo, put it back to the model object. It looks for all the columns that matches the model name and reshape it back to the actual shape (actual shape is stored in another column). Different frameworks has different way of loading the parameter to model object. For identifying this,
Model.save_weights()
also saves the framework name while saving the modelParameters: - name (str) – Name of the key from which the model parameters are loaded
- model (Any) – Model object from any supported framework onto which the parameters are loaded. Loading the parameters is an inplace operation and hence this function doesn’t return anything
Examples
>>> stock.model.load_weights('torch_model', torch_model)
-
save_weights
(name, model)[source]¶ A convenient function to call when you don’t want to deal with weight extraction from the model, regardless of which framework do you use to write model, as far as that framework is supported by stockroom. This function expects the model object from one of the supported framework. This will call the corresponding function of that framework to fetch the weights and then call
Model.__setitem__()
to save the weights.Parameters: - name (str) – Name of the key to which the model parameters are saved
- model (Any) – Object from any supported framework
Examples
>>> stock.model.save_weights('torch_model', torch_model)
-
-
class
Tag
[source]¶ Tag store, as the name suggests, is to store tags related to an experiment. Ideally/ eventually this store information on commit level and would not pass it down the commit history tree. But currently the internal implementation of hangar doesn’t allow that and hence we store the information on metadata store in hangar. It currently takes int, float & str data types and convert it to a string which is the only data type supported by hangar metadata. But
Tag
stores the type of the data in another metadata “column” which will be uesd while pulling the data back from the Tag store.Examples
>>> stock.tag['epochs'] = 1000 >>> stock.tag['lr'] = 0.0001 >>> stock.tag['optimizer'] = 'adam'