# -*- mode: org -*-
#+STARTUP: content
#+SEQ_TODO: MAYBE(m) TODO(t) NEXT(n) STARTED(s) WAITING(w) | DONE(d) CANCELLED(c)
#+STARTUP: hidestars
* gradient descent
** DONE rename segmented-trainer to segmented-gd-trainer
* conjugate gradient
** WAITING clarify conjugate gradient license
   - State "WAITING"    [2008-07-21 Mon 21:53] \\
     Contacting Carl Rasmussen by email
** DONE portably implement float nan, inf handling
   There is no way to do traps in allegro so let's just catch
   ARITHMETIC-ERROR at some strategically chosen places.
** DONE failed line searches can still change the weights
   documented
* boltzmann machines
** MAYBE conditioning chunks are not really visible nor hidden
   Does it make sense to store them separately?
** DONE factored RBM: weight matrix is A*B
** DONE semi-restricted (connections between visibles)
** MAYBE higher order rbm
** MAYBE training with conjugate gradient
   Is it possible to calculate the cost function? Not quite, but there
   are two ways out: 1) rbm importance sampling paper to estimate the
   partition function 2) constrain features to be sparse and don't use
   a partition function at all.
** MAYBE training with SMD
** DONE sparsity constraint
   possible with WEIGHT-PENALTY
** MAYBE generic support for exponential family distributions
   http://www.ics.uci.edu/~michal/GenHarm3.pdf
** TODO cache inputs optionally
   This should make higher level RBMs a lot faster and can be
   implemented trivially with a samples -> node arrays hash table.
** TODO fix gradient accumulation
   When the accumulator's start is not zero, there are two ways to
   fail: a) go to the matlisp branch and put it into the wrong place
   as if start were 0, b) use the lisp branch and fail if there are
   multiple stripes. Currently, non-zero start with several stripes
   runs into an assert.
** DONE normalized-chunk: scale should be per stripe
** DONE positive phase: if sample-hidden-p, then is it (* visible hidden-mean) or (* visible hidden-sample)?
   HIDDEN-SAMPLING parameter in RBM-TRAINER
** DONE temporal rbm
** DONE calculate bias+conditioning activation only once per training example
   - State "DONE"       [2008-09-10 Wed 15:00] \\
     :CACHE-STATIC-ACTIVATIONS-P initarg to CHUNK
** DONE persistent contrastive divergence
** DONE general boltzmann machines
** DONE deep boltzmann machines
** DONE initialize DBM from DBN
** TODO implmement annealed importance sampling
** TODO unbreak temporal chunks
* backprop
** TODO VALIDATE-LUMP: set and check size if possible, check inputs
** DONE ADD-LUMP: set MAX-N-STRIPES to something saner
** TODO redo ->CROSS-ENTROPY
** TODO what to do with INDICES-TO-CALCULATE?
** DONE normalized-lump: scale should be per stripe
* unroll
** MAYBE unroll only part of the network
** MAYBE example for unrolling with missing values
** DONE fix or remove missing value support
   removed
** DONE unroll factored clouds
** TODO flip chunk
   When an input chunk is to be reconstructed it should go above the
   layer in the bpn instead of below it where it is in the BM.
* misc
** DONE faster exp? http://citeseer.ist.psu.edu/cache/papers/cs/43/ftp:zSzzSzftp.idsia.chzSzpubzSzniczSzexp.pdf/schraudolph98fast.pdf
Doesn't seem to speed up things.
** DONE remove dependency on BLAS: implement some of matlisp in Lisp
   Ripped matrix.lisp and various bits from Matlisp.
** TODO what to do with *USE-BLAS*?
   The wrappers should calculate cost and call matlisp if it is
   avaible.
** TODO float vector I/O for cmucl
** MAYBE support more lisps
   Matlisp only supports cmucl, sbcl and allegro.
** TODO optimize the missing value case (SSE?)
** TODO lookup table based exp/sigmoid
** TODO investigate SPARTNS
** DONE factor out most frequent and log-likelihood-ratio based feature selection
* examples
** DONE movie review example
** TODO netflix example
