amber.modeler
Contents
Rationale
This module amber.modeler
provides class and interfaces to convert an architecture (usually
a list of tokens/strings) to a model. For simple, sequential models, it’s sufficient to use the
out-of-box tf.keras.Sequential
for implementations. However, more classes are needed for
advanced architectures, such as conversion of enas super-net to sub-nets.
On the high level, we first need an analog of tf.keras.Sequential
that returns a model object
when called; in AMBER, this is amber.modeler.ModelBuilder
and its subclasses.
To wrap around different implementations of neural networks (e.g., a sequential keras model
vs. a sub-net of enas implemented in tensorflow), ModelBuilder will take
amber.architect.ModelSpace
as the unifying reference of model architectures, so that
different implementations frameworks, like tensorflow vs keras vs pytorch, will look the same to
the search algorithms in amber.architect
to ease its burden.
Moving one level further, we need an analog of tf.keras.Model
to facilitate the training
and evaluation as class methods. This is implemented by amber.modeler.child
.
Under the hood of child models, the corresponding tensor operations and computation graphs are constructed
in module amber.modeler.dag
. Currently AMBER builds the enas sub-graphs with keras models, and builds
branching keras model and multi-input/output keras model. Next steps include construction of pytorch computation graphs.
Model Builders
enasModeler
- class DAGModelBuilder(inputs_op, output_op, model_space, model_compile_dict, num_layers=None, with_skip_connection=True, with_input_blocks=True, dag_func=None, *args, **kwargs)[source]
- class EnasAnnModelBuilder(session=None, controller=None, dag_func='EnasAnnDAG', l1_reg=0.0, l2_reg=0.0, with_output_blocks=False, use_node_dag=True, feature_model=None, dag_kwargs=None, *args, **kwargs)[source]
Bases:
amber.modeler.enasModeler.DAGModelBuilder
This function builds a feed-forward neural net (FFNN).
It uses tensorflow low-level API to define a big graph, where each child network architecture is a subgraph in this big DAG.
- Parameters
session (tf.Session) – tensorflow session for building enas DAG
controller (amber.architect.MultiIOController) – controller instance
dag_func (str) – string name for DAG to use
l1_reg (float) – regularizer strength for L1
l2_reg (float) – regularizaer strength for L2
with_output_blocks (bool) – if True, add another architecture representation vector, to connect intermediate layers to output blocks.
use_node_dag (bool) – if True, use another
amber.modeler.InputBlockDAG
to represent the computation graphfeature_model (tf.keras.Model, or None) – If specified, use the provided upstream model for pre-transformations of inputs, instead of taking the raw input features.
dag_kwargs (dict, or None) – keyword arugments passed to initializing DAG
kerasModeler
- class KerasBranchModelBuilder(inputs_op, output_op, model_compile_dict, model_space=None, with_bn=False, **kwargs)[source]
- class KerasModelBuilder(inputs_op, output_op, model_compile_dict, model_space=None, gpus=None, **kwargs)[source]
- class KerasMultiIOModelBuilder(inputs_op, output_op, model_compile_dict, model_space, with_input_blocks, with_output_blocks, dropout_rate=0.2, wsf=1, **kwargs)[source]
Bases:
amber.modeler.enasModeler.ModelBuilder
- Note:
Still not working if num_outputs=0
- class KerasResidualCnnBuilder(inputs_op, output_op, fc_units, flatten_mode, model_compile_dict, model_space, dropout_rate=0.2, wsf=1, add_conv1_under_pool=True, verbose=1, **kwargs)[source]
Bases:
amber.modeler.enasModeler.ModelBuilder
Function class for converting an architecture sequence tokens to a Keras model
- Parameters
inputs_op (amber.architect.modelSpace.Operation)
output_op (amber.architect.modelSpace.Operation)
fc_units (int) – number of units in the fully-connected layer
flatten_mode ({‘GAP’, ‘Flatten’}) – the flatten mode to convert conv layers to fully-connected layers.
model_compile_dict (dict)
model_space (amber.architect.modelSpace.ModelSpace)
dropout_rate (float) – dropout rate, must be 0<dropout_rate<1
wsf (int) – width scale factor
- build_multi_gpu_sequential_model(model_states, input_state, output_state, model_compile_dict, gpus=4, **kwargs)[source]
- build_multi_gpu_sequential_model_from_string(model_states_str, input_state, output_state, state_space, model_compile_dict)[source]
build a sequential model from a string of states
- build_sequential_model(model_states, input_state, output_state, model_compile_dict, **kwargs)[source]
- Parameters
model_states (a list of _operators sampled from operator space)
input_state
output_state (specifies the output tensor, e.g. Dense(1, activation=’sigmoid’))
model_compile_dict (a dict of loss, optimizer and metrics)
- Return type
Keras.Model
Child Models: Training Interface
Child model classes wrapped above Keras.Model API for more complex child network manipulations
- class EnasAnnModel(inputs, outputs, arc_seq, dag, session, dropouts=None, name='EnasModel')[source]
Bases:
object
- fit(x, y, batch_size=None, nsteps=None, epochs=1, verbose=1, callbacks=None, validation_data=None)[source]
- fit_generator(generator, steps_per_epoch=None, epochs=1, verbose=1, callbacks=None, validation_data=None, max_queue_size=10, workers=1, use_multiprocessing=False, shuffle=True)[source]
- fit_ph(x, y, batch_size=None, nsteps=None, epochs=1, verbose=1, callbacks=None, validation_data=None)[source]
- class EnasCnnModel(inputs, outputs, labels, arc_seq, dag, session, dropouts=None, use_pipe=None, name='EnasModel', **kwargs)[source]
Bases:
object
Todo
re-write weights save/load
use the input/output/label tensors provided by EnasConv1dDAG; this should unify the fit method when using placeholder and Tensor pipelines - probably still need two separate methods though
DAG: Computation Graph for Child Models
represent neural network computation graph as a directed-acyclic graph from a list of architecture selections
- class ComputationNode(operation, node_name, merge_op=<class 'tensorflow.python.keras.layers.merge.Concatenate'>)[source]
Bases:
object
Computation Node is an analog to
tf.keras.layers.Layer
to make branching and multiple input/output feed-forward neural network (FFNN) models, represented by a directed-acyclic graph (DAG) in AMBER.The reason we need ComputationNode is that an
amber.architect.Operation
focus on token-level computations, but does not represent the connectivity patterns well enough. When it comes to building DAG-represented FFNNs, we need more fine-grained control over the graph connectivities and validities.This is a helper that provides building blocks for
amber.modeler.DAG
to use, and is not intended to be used by itself.See also
amber.modeler.dag.DAG
.- Parameters
operation (amber.architect.Operation) – defines the operation in current layer
node_name (str) – name of the node
merge_op (tf.keras.layers.merge, optional) – operation for merging multiple inputs
- class DAG(arc_seq, model_space, input_node, output_node, with_skip_connection=True, with_input_blocks=True, *args, **kwargs)[source]
Bases:
object
Construct a feed-forward neural network (FFNN) represented by a directed acyclic graph (DAG).
While a simple, linear and sequential neural network model is also a DAG, here we are trying to build more flexible, generalizable branching models. In other words, the primary use is to construct a block-sparse FFNN, to create an specific inductive bias for a specific question; although one may use it in conv nets or other architectures stronger with inductive biases as well.
Note that we are re-using the skip connection searching algorithms designed for building residual connections, but instead use it to build inter-layer connections without the “stem” connections in a ResNet. That is, the residual connections summed to the output of the currenct layer is now concatenated as input to the layer. By construction, it is possible that a node has no input, and these nodes will be removed in
_remove_disconnected_nodes()
.- Parameters
arc_seq (list, or numpy.array) – a list of integers, each is a token for neural network architecture specific to a model space
model_space (amber.architect.ModelSpace) – model space to sample model architectures from. Necessary for mapping token integers to operations.
input_node (amber.modeler.ComputationNode, or list) – a list of input layers/nodes; in case of a single input node, use a single element list
output_node (amber.modeler.ComputationNode, or list) – output node configuration
with_skip_connection (bool) – if False, disable inter-layers connections (i.e. skip-layer connections). Default is True.
with_input_blocks (bool) – if False, disable connecting partial inputs to intermediate layers. Default is True.
- Returns
model – a constructed model using keras Model API
- Return type
tf.keras.models.Model
- class EnasAnnDAG(model_space, input_node, output_node, model_compile_dict, session, l1_reg=0.0, l2_reg=0.0, with_skip_connection=True, with_input_blocks=True, with_output_blocks=False, controller=None, feature_model=None, feature_model_trainable=None, child_train_op_kwargs=None, name='EnasDAG')[source]
Bases:
object
EnasAnnDAG is a DAG model builder for using the weight sharing method for child models.
This class deals with the feed-forward neural network (FFNN). The weight sharing is between all Ws for different hidden units sizes - that is, a larger hidden size always includes the smaller ones.
- Parameters
model_space (amber.architect.ModelSpace) – model space to search architectures from
input_node (amber.architect.Operation, or list) – one or more input layers, each is a block of input features
output_node (amber.architect.Operation, or list) – one or more output layers, each is a block of output labels
model_compile_dict (dict) – compile dict for child models
session (tf.Session) – tensorflow session that hosts the computation graph; should use the same session as controller for sampling architectures
with_skip_connection (bool) – if False, disable inter-layer connections. Default is True.
with_input_blocks (bool) – if False, disable connecting input layers to hidden layers. Default is True.
with_output_blocks (bool) – if True, add another architecture representation vector, to connect intermediate layers to output blocks.
controller (amber.architect.MultiIOController, or None) – connect a controller to enable architecture sampling; if None, can only train fixed architecture manually provided
feature_model (tf.keras.Model, or None) – If specified, use the provided upstream model for pre-transformations of inputs, instead of taking the raw input features.
feature_model_trainable (bool, or None) – Boolean of whether pass gradients to the feature model.
child_train_op_kwargs (dict, or None) – Keyword arguments passed to
model.fit()
.name (str) – a string name for this instance
- class EnasConv1DwDataDescrption(data_description, *args, **kwargs)[source]
Bases:
amber.modeler.dag.EnasConv1dDAG
This is a modeler that specifiied for convolution network with data description features
- class EnasConv1dDAG(model_space, input_node, output_node, model_compile_dict, session, with_skip_connection=True, batch_size=128, keep_prob=0.9, l1_reg=0.0, l2_reg=0.0, reduction_factor=4, controller=None, child_train_op_kwargs=None, stem_config=None, data_format='NWC', train_fixed_arc=False, fixed_arc=None, name='EnasDAG', **kwargs)[source]
Bases:
object
- class InputBlockAuxLossDAG(*args, **kwargs)[source]
Bases:
amber.modeler.dag.InputBlockDAG
Add intermediate outputs whenever two input blocks first meet and merge.
Compared to InputBlockDAG, the difference is best illustrated by an example:
|Input_A Input_B Input_C Input_D | |------- ------- ------- ------- | | | | | | | | Hidden_AB Hidden_CD | | / | | \ | | / Hidden_ABCD \ | | add_out1 | \ add_out2 | | Hidden_2 add_out3 | | | | | Output |
In
amber.modeler.dag.InputBlockDAG
, add_out3 will NOT be added, since only immediate layers to input blocks (i.e. Hidden_AB and Hidden_CD) will be added output.- Returns
model – a subclass of keras Model API with multiple intermediate outputs predicting the same label
- Return type
- class InputBlockDAG(add_output=True, *args, **kwargs)[source]
Bases:
amber.modeler.dag.DAG
Add intermediate outputs to each level of network hidden layers. Based on DAG
Compared to DAG, the difference is best illustrated by an example:
|Input_A Input_B Input_C Input_D | |------- ------- ------- ------- | | | | | | | | Hidden_AB Hidden_CD | | / | | \ | | / Hidden_ABCD \ | | add_out1 | add_out2 | | Hidden_2 | | | | | Output |
In
amber.modeler.dag.DAG
, add_out1 and add_out2 will NOT be added. The loss and out1 and out2 will be the same as output, but with a lower weight of 0.1.See also
amber.modeler.dag.DAG
the base class.
amber.modeler.dag.InputBlockAuxLossDAG
add more auxillary outputs whenever two inputs meet.
- Returns
model – a subclass of keras Model API with multiple intermediate outputs predicting the same label
- Return type
- get_dag(arg)[source]
Getter method for getting a DAG class from a string
DAG refers to the underlying tensor computation graphs for child models. Whenever possible, we prefer to use Keras Model API to get the job done. For ENAS, the parameter-sharing scheme is implemented by tensorflow.
- Parameters
arg (str or callable) – return the DAG constructor corresponding to that identifier; if is callable, assume it’s a DAG constructor already, do nothing and return it
- Returns
A DAG constructor
- Return type
callable
- get_layer(x, state, with_bn=False)[source]
Getter method for a Keras layer, including native Keras implementation and custom layers that are not included in Keras.
- Parameters
x (tf.keras.layers or None) – The input Keras layer
state (amber.architect.Operation, or callable) – The target layer to be built
with_bn (bool, optional) – If true, add batch normalization layers before activation
- Returns
x – The built target layer connected to input x
- Return type
tf.keras.layers
Architecture Decoder
Classes for breaking down an architecture sequence into a more structured format for later use
- class ResConvNetArchitecture(model_space)[source]
Bases:
object
- decode(arc_seq)[source]
Decode a sequence of architecture tokens into operations and res-connections
- encode(operations, res_con)[source]
Encode operations and residual connections to a sequence of architecture tokens
This is the inverse function for decode
- Parameters
operations (list) – A list of integers for categorically-encoded operations
res_con (list) – A list of list where each entry is a binary-encoded residual connections