Package mdp :: Package contrib :: Class NIPALSNode
[hide private]
[frames] | no frames]

Class NIPALSNode



Perform Principal Component Analysis using the NIPALS algorithm.
This algorithm is particularyl useful if you have more variable than
observations, or in general when the number of variables is huge and
calculating a full covariance matrix may be unfeasable. It's also more
efficient of the standard PCANode if you expect the number of significant
principal components to be a small. In this case setting output_dim to be
a certain fraction of the total variance, say 90%, may be of some help.

Internal variables of interest:
self.avg -- Mean of the input data (available after training)
self.d -- Variance corresponding to the PCA components
self.v -- Transposed of the projection matrix (available after training)
self.explained_variance -- When output_dim has been specified as a fraction
                           of the total variance, this is the fraction
                           of the total variance that is actually explained

Reference for NIPALS (Nonlinear Iterative Partial Least Squares):
Wold, H.
Nonlinear estimation by iterative least squares procedures
in David, F. (Editor), Research Papers in Statistics, Wiley,
New York, pp 411-444 (1966).

More information about Principal Component Analysis, a.k.a. discrete
Karhunen-Loeve transform can be found among others in
I.T. Jolliffe, Principal Component Analysis, Springer-Verlag (1986).

Original code contributed by:
Michael Schmuker, Susanne Lezius, and Farzad Farkhooi (2008).

Nested Classes [hide private]
    Inherited from Node
  __metaclass__
This Metaclass is meant to overwrite doc strings of methods like execute, stop_training, inverse with the ones defined in the corresponding private methods _execute, _stop_training, _inverse, etc...
Instance Methods [hide private]
 
__init__(self, input_dim=None, output_dim=None, dtype=None, conv=1e-08, max_it=100000)
The number of principal components to be kept can be specified as 'output_dim' directly (e.g.
 
_stop_training(self, debug=False)
Transform the data list to an array object and reshape it.
 
_train(self, x)
Cumulate all input data in a one dimensional list.

Inherited from object: __delattr__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__

    Inherited from Cumulator
 
stop_training(self, *args, **kwargs)
Transform the data list to an array object and reshape it.
 
train(self, x, *args, **kwargs)
Cumulate all input data in a one dimensional list.
    Inherited from nodes.PCANode
 
_adjust_output_dim(self)
Return the eigenvector range and set the output dim if required.
 
_check_output(self, y)
 
_execute(self, x, n=None)
Project the input on the first 'n' principal components.
 
_get_supported_dtypes(self)
Return the list of dtypes supported by this node.
 
_inverse(self, y, n=None)
Project 'y' to the input space using the first 'n' components.
 
_set_output_dim(self, n)
 
execute(self, x, *args, **kargs)
Project the input on the first 'n' principal components.
 
get_explained_variance(self)
Return the fraction of the original variance that can be explained by self._output_dim PCA components.
 
get_projmatrix(self, transposed=1)
Return the projection matrix.
 
get_recmatrix(self, transposed=1)
Return the back-projection matrix (i.e.
 
inverse(self, y, *args, **kargs)
Project 'y' to the input space using the first 'n' components.
    Inherited from Node
 
__add__(self, other)
 
__call__(self, x, *args, **kargs)
Calling an instance of Node is equivalent to call its 'execute' method.
 
__repr__(self)
repr(x)
 
__str__(self)
str(x)
 
_check_input(self, x)
 
_check_train_args(self, x, *args, **kwargs)
 
_get_train_seq(self)
 
_if_training_stop_training(self)
 
_pre_execution_checks(self, x)
This method contains all pre-execution checks.
 
_pre_inversion_checks(self, y)
This method contains all pre-inversion checks.
 
_refcast(self, x)
Helper function to cast arrays to the internal dtype.
 
_set_dtype(self, t)
 
_set_input_dim(self, n)
 
copy(self, protocol=-1)
Return a deep copy of the node.
 
get_current_train_phase(self)
Return the index of the current training phase.
 
get_dtype(self)
Return dtype.
 
get_input_dim(self)
Return input dimensions.
 
get_output_dim(self)
Return output dimensions.
 
get_remaining_train_phase(self)
Return the number of training phases still to accomplish.
 
get_supported_dtypes(self)
Return dtypes supported by the node as a list of numpy.dtype objects.
 
is_invertible(self)
Return True if the node can be inverted, False otherwise.
 
is_trainable(self)
Return True if the node can be trained, False otherwise.
 
is_training(self)
Return True if the node is in the training phase, False otherwise.
 
save(self, filename, protocol=-1)
Save a pickled serialization of the node to 'filename'.
 
set_dtype(self, t)
Set internal structures' dtype.
 
set_input_dim(self, n)
Set input dimensions.
 
set_output_dim(self, n)
Set output dimensions.
Properties [hide private]

Inherited from object: __class__

    Inherited from Node
  _train_seq
List of tuples: [(training-phase1, stop-training-phase1), (training-phase2, stop_training-phase2), ...
  dtype
dtype
  input_dim
Input dimensions
  output_dim
Output dimensions
  supported_dtypes
Supported dtypes
Method Details [hide private]

__init__(self, input_dim=None, output_dim=None, dtype=None, conv=1e-08, max_it=100000)
(Constructor)

 

The number of principal components to be kept can be specified as
'output_dim' directly (e.g. 'output_dim=10' means 10 components
are kept) or by the fraction of variance to be explained
(e.g. 'output_dim=0.95' means that as many components as necessary
will be kept in order to explain 95% of the input variance).

Other Arguments:
   conv   - convergence threshold for the residual error.
   max_it - maximum number of iterations
   

Overrides: object.__init__

_stop_training(self, debug=False)

 
Transform the data list to an array object and reshape it.

Overrides: Node._stop_training

_train(self, x)

 
Cumulate all input data in a one dimensional list.

Overrides: Node._train