using LossFunctions
l = L1DistLoss()
value(l, .1, .2) # |.2 - .1|
deriv(l, .1, .2) # sign(.2 - .1)
y = randn(100)
yhat = randn(100)
value(l, y, yhat) # vector of value mapped to y[i], yhat[i]
value(l, y, yhat, AvgMode.Sum()) # sum(value(l, y, yhat)), but better
value(l, y, yhat, AvgModel.Mean()) # mean(value(l, y, yhat)), but better
Assume we want to use a linear transformation so that our prediction of a vector is .
Our loss function looks like
with gradient
We can implement a naive, inefficient gradient descent algorithm as:
function f(l::Loss, x::Matrix, y::Vector; maxit=20, s=.5)
n, p = size(x)
β = zeros(p)
for i in 1:maxit
β -= (s / n) * x' * deriv(l, y, x * β)
end
β
end
This automatically works with whatever loss function I provide because multiple dispatch is amazing:
# make some fake data
x = randn(1000, 3)
y = x * [1.0, 2.0, 3.0] + randn(1000)
julia> f(L2DistLoss(), x, y)
3-element Array{Float64,1}:
0.966367
2.05046
2.94227
julia> f(L1DistLoss(), x, y)
3-element Array{Float64,1}:
1.00185
2.00302
2.95002
julia> f(HuberLoss(2.), x, y)
3-element Array{Float64,1}:
0.967642
2.0485
2.94429
An interface for penalty/regularization functions. It follows similar conventions to LossFunctions.
θ = rand(5) # parameter vector
p = L1Penalty()
value(p, θ) # sum(abs, θ)
grad(p, θ) # the gradient, sign.(θ)
prox(p, θ, .1) # proximal mapping (soft-thresholding) with step size .1
Let's change our gradient descent algorithm to proximal gradient method:
function g(l::Loss, pen::Penalty, x::Matrix, y::Vector; maxit=20, s=.5, λ=.1)
n, p = size(x)
β = zeros(p)
for i in 1:maxit
β = prox(pen, β - (s / n) * x' * deriv(l, y, x * β), s * λ)
end
β
end
We can now do proximal gradient method on arbitrary loss/penalty combinations because multiple dispatch is amazing:
julia> g(L2DistLoss(), L1Penalty(), x, y; λ = .4)
3-element Array{Float64,1}:
0.789894
1.80736
2.81209
julia> g(L1DistLoss(), L1Penalty(), x, y; λ = .4)
3-element Array{Float64,1}:
0.0
0.464488
1.5998
julia> g(HuberLoss(2.), L1Penalty(), x, y; λ = .4)
3-element Array{Float64,1}:
0.525633
1.57115
2.57907
These two packages create a consistent "grammar" for losses and regularization. Julia's multiple dispatch then allows us to write general algorithms like the proximal gradient method example. This paves the way for machine learning experimentation that is unavailable in any other language that I know of.