Group Group Group Group Group Group Group Group Group

Chapter 5: Training the classifier with regularization

Hi all,

In the book, it says

typically, you’d use either l2_penalty or l1_penalty, but not both in the same training session.

However, the code snippet to train the model both passes arguments to l2_penalty and l1_penalty.

In addition, in the question part, it’s even suggesting to try out different values for l2_penalty and l1_penalty.

What is the effect in the training of model, if we pass values for l2_penalty and l1_penalty?

Thanks,
chalkdust

The penalty is something that is added to the loss in order to reduce the magnitude of the weights or learned parameters. This stops the model from overfitting because putting a constraint on the size of the weights makes it harder for the model to memorize specific things (it cannot pick whatever weights it feels like).

The reason you’d normally only use one of them is that they more-or-less do the same thing so you don’t need both.

The L1 penalty is also known as “lasso” regression; the L2 penalty is also known as “ridge” regression. You may come across those terms in the machine learning literature.

There is also something called “elastic net”, which actually combines the L1 and L2 penalty. So it’s not unheard of to use both at the same time.

With neural networks, we usually just use L2, also known as “weight decay”.