One point often ignored is that in many applications a significant part of features are categorical.
All algorithms we have introduced so far implicitly assume, however, that features are numerical.
famousways of dealing with categorical features are one-hot encoding or dummy encoding.
Definition (One-hot encoding)
Given a feature with \(X\) that can take \(k\) differen categorical values, one-hot encoding entails transforming the feature into \(k\) binary features where \[ X^{(l)}=\mathbb 1(X=l), \quad l=1,\dots,k. \]
Definition (Dummy encoding)
Given a feature with \(X\) that can take \(k\) differen categorical values, dummy encoding entails transforming the feature into \(k-1\) binary features where, given a reference \(l^\ast\), \[ X^{(l)}=\mathbb 1(X=l), \quad l=1,\dots,l^\ast-1,l^\ast+1,\dots,k. \]