Justin Le https://blog.jle.im (justin@jle.im)
Kiev Functional Programming, Aug 16, 2017
Slide available at https://talks.jle.im/kievfprog/dependent-types.html.
All code available at https://github.com/mstksg/talks/tree/master/kievfprog.
Libraries required: (available on Hackage) hmatrix, singletons, MonadRandom. GHC 8.x assumed.
The big question of Haskell: What can types do for us?
Dependent types are simply the extension of this question, pushing the power of types further.
Each layer receives an input vector, x : ℝn, and produces an output y : ℝm.
They are parameterized by a weight matrix W : ℝm × n (an m × n matrix) and a bias vector b : ℝm, and the result is: (for some activation function f
)
y = f(Wx + b)
A neural network would take a vector through many layers.
randomWeights :: MonadRandom m => Int -> Int -> m Weights
randomWeights i o = do
seed1 :: Int <- getRandom
seed2 :: Int <- getRandom
let wB = randomVector seed1 Uniform o * 2 - 1
wN = uniformSample seed2 o (replicate i (-1, 1))
return $ W wB wN
randomNet :: MonadRandom m => Int -> [Int] -> Int -> m Network
randomNet i [] o = O <$> randomWeights i o
randomNet i (h:hs) o = (:~) <$> randomWeights i h <*> randomNet h hs o
randomWeights
?randomWeights
?runLayer
and runNet
implementation correct? go :: Vector Double -- ^ input vector
-> Network -- ^ network to train
-> (Network, Vector Double)
-- handle the output layer
go !x (O w@(W wB wN))
= let y = runLayer w x
o = logistic y
-- the gradient (how much y affects the error)
-- (logistic' is the derivative of logistic)
dEdy = logistic' y * (o - target)
-- new bias weights and node weights
wB' = wB - scale rate dEdy
wN' = wN - scale rate (dEdy `outer` x)
w' = W wB' wN'
-- bundle of derivatives for next step
dWs = tr wN #> dEdy
in (O w', dWs)
-- handle the inner layers
go !x (w@(W wB wN) :~ n)
= let y = runLayer w x
o = logistic y
-- get dWs', bundle of derivatives from rest of the net
(n', dWs') = go o n
-- the gradient (how much y affects the error)
dEdy = logistic' y * dWs'
-- new bias weights and node weights
wB' = wB - scale rate dEdy
wN' = wN - scale rate (dEdy `outer` x)
w' = W wB' wN'
-- bundle of derivatives for next step
dWs = tr wN #> dEdy
in (w' :~ n', dWs)
An o x i
layer
From HMatrix:
An R 3
is a 3-vector, an L 4 3
is a 4 x 3 matrix.
With -XDataKinds
, all values and types are lifted to types and kinds.
In addition to the values True
, False
, and the type Bool
, we also have the type 'True
, 'False
, and the kind Bool
.
In addition to :
and []
and the list type, we have ':
and '[]
and the list kind.
runLayer :: (KnownNat i, KnownNat o)
=> Weights i o
-> R i
-> R o
runLayer (W wB wN) v = wB + wN #> v
runNet :: (KnownNat i, KnownNat o)
=> Network i hs o
-> R i
-> R o
runNet (O w) !v = logistic (runLayer w v)
runNet (w :~ n') !v = let v' = logistic (runLayer w v)
in runNet n' v'
Exactly the same! No loss in expressivity!
Much better! Matrices and vector lengths are guaranteed to line up!
Also, note that the interface for runNet
is better stated in its type. No need to reply on documentation.
The user knows that they have to pass in an R i
, and knows to expect an R o
.
randomWeights :: (MonadRandom m, KnownNat i, KnownNat o)
=> m (Weights i o)
randomWeights = do
s1 :: Int <- getRandom
s2 :: Int <- getRandom
let wB = randomVector s1 Uniform * 2 - 1
wN = uniformSample s2 (-1) 1
return $ W wB wN
No need for explicit arguments! User can demand i
and o
. No reliance on documentation and parameter orders.
But, for generating nets, we have a problem:
randomNet :: forall m i hs o. (MonadRandom m, KnownNat i, KnownNat o)
=> m (Network i hs o)
randomNet = case hs of [] -> ??
The solution for pattern matching on types: singletons.
Explicitly passing singletons can be ugly.
randomNet :: forall i hs o m. (MonadRandom m, KnownNat i, SingI hs, KnownNat o)
=> m (Network i hs o)
randomNet = randomNet' sing
Now the shape can be inferred from the functions that use the Network
.
train :: forall i hs o. (KnownNat i, KnownNat o)
=> Double -- ^ learning rate
-> R i -- ^ input vector
-> R o -- ^ target vector
-> Network i hs o -- ^ network to train
-> Network i hs o
train rate x0 target = fst . go x0
Ready for this?
go :: forall j js. KnownNat j
=> R j -- ^ input vector
-> Network j js o -- ^ network to train
-> (Network j js o, R j)
-- handle the output layer
go !x (O w@(W wB wN))
= let y = runLayer w x
o = logistic y
-- the gradient (how much y affects the error)
-- (logistic' is the derivative of logistic)
dEdy = logistic' y * (o - target)
-- new bias weights and node weights
wB' = wB - konst rate * dEdy
wN' = wN - konst rate * (dEdy `outer` x)
w' = W wB' wN'
-- bundle of derivatives for next step
dWs = tr wN #> dEdy
in (O w', dWs)
-- handle the inner layers
go !x (w@(W wB wN) :~ n)
= let y = runLayer w x
o = logistic y
-- get dWs', bundle of derivatives from rest of the net
(n', dWs') = go o n
-- the gradient (how much y affects the error)
dEdy = logistic' y * dWs'
-- new bias weights and node weights
wB' = wB - konst rate * dEdy
wN' = wN - konst rate * (dEdy `outer` x)
w' = W wB' wN'
-- bundle of derivatives for next step
dWs = tr wN #> dEdy
in (w' :~ n', dWs)
-- handle the inner layers
go !x (w@(W wB wN) :~ n)
= let y = runLayer w x
o = logistic y
-- get dWs', bundle of derivatives from rest of the net
(n', dWs') = go o n
-- the gradient (how much y affects the error)
dEdy = logistic' y * dWs'
-- new bias weights and node weights
wB' = wB - konst rate * dEdy
wN' = wN - konst rate * (dEdy `outer` x)
w' = W wB' wN'
-- bundle of derivatives for next step
dWs = tr wN #> dEdy
in (w' :~ n', dWs)
Surprise! It’s actually identical! No loss in expressivity.
Also, typed holes can help you write your code in a lot of places. And shapes are all verified.
By the way, still waiting for linear types in GHC :)
The overall guiding principle is:
I recommend not going the other way (use perfect type safety before figuring out where you actually really need them). We call that “hasochism”.