In the previous post, we learned about PCA and how to use PCA in machine learning for various applications. In this tutorial, we are going to learn about linear discriminant analysis (LDA). In PCA we project the higher dimensional data to lower dimensions but we are only concerned with the features of the data in LDA we consider the classes of the data.

This means LDA allow us to project data in a manner so that classes are linearly well separable. This means LDA is a supervised learning algorithm since we have labels involved here. So, let’s get started. If you have any questions, please ask in the forum/community support.

## Table of Contents

## Prerequisites

- Understanding of Linear Algebra Until PCA
- Familiarity with statistical concepts such as variance, co-variance and correlation
- Python, Numpy & Scikit-Learn

## What You Will Learn

- LDA concepts
- LDA derivation
- LDA applications

## What Is Linear Discriminant Analysis?

LDA is a dimensionality reduction technique in which we consider the class labels. In other words, we try to preserve as much of the class-discriminatory information as possible while performing dimensionality reduction.

**Definition: **

The goal of the LDA technique is to project the original data matrix onto a lower dimensional space. To achieve this goal, three steps needed to be performed. The first step is to calculate the separability between different classes (i.e. the distance between the means of different classes), which is called the between-class variance or between-class matrix.

The second step is to calculate the distance between the mean and the samples of each class, which is called the within-class variance or within-class matrix. The third step is to construct the lower dimensional space which maximizes the between-class variance and minimizes the within-class variance.

Alaa Tharwat^{1}

Fisher Linear Discriminant Analysis (also called Linear Discriminant Analysis(LDA)) are methods used in statistics, pattern recognition and machine learning to find a linear combination of features which characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

LDA is closely related to PCA, for both of them are based on linear, i.e. matrix multiplication, transformations. For the case of PCA, the transformation is based on minimizing mean square error between original data vectors and data vectors that can be estimated fro the reduced dimensionality data vectors. And the PCA does not take into account any difference in class. But for the case of LDA, the transformation is based on maximizing a ratio of “between-class variance” to “within-class variance” with the goal of reducing data variation in the same class and increasing the separation between classes.

Cheng Li, Bingyu Wang^{2}

One way of doing that is by maximizing the mean distance between projections for each class but simply maximizing the mean distance won’t be sufficient as it does not take into account the variance of the standard deviation within the classes.

For this, we need a measure of within-class variability which is called scatter. It is equivalent to variance but we remove the 1/n from the equation and it is defined as the sum of square differences between the projected samples and their class mean. Furthermore, we define the within-class scatter matrix (SW) which measures the spread of data points within each class and the between-class scatter matrix (SB) which measures the spread of the class means.

It reflects how much the class means deviate from the overall mean. Within-class scatter matrix is calculated by summing up the individual scatter matrices for each class. A detailed formula for these two is provided in the derivation below. I have provided derivation for the two classes but for more classes, the same principle applies.

## Derivation Of LDA:

Here is the detailed derivation of LDA:

Once we have solved our equation and found S_{W}W = λS_{B}W, Where W = S^{-1}_{W} S_{B} is the transformation matrix and λ is its eigenvalues. We calculate the eigenvalues and eigenvectors of W. The eigenvectors of W represent the directions of the new space, and the corresponding eigenvalues represent the scaling factor, length, or the magnitude of the eigenvectors.

Each eigenvector represents one axis of the LDA space, and the associated eigenvalue represents the robustness of this eigenvector. The robustness of the eigenvector reflects its ability to discriminate between different classes, i.e. increase the between-class variance, and decrease the within-class variance of each class; hence meets the LDA goal. Thus, the eigenvectors with the k highest eigenvalues are used to construct a lower dimensional space (V_{k}), while the other eigenvectors ({v_{k}+1, v_{k}+2, v_{M}}) are neglected. ^{4}

^{5}

## Numpy & Scikit Learn Implementation:

I have provided the codes here. Make sure to type these codes. You can’t copy the content. Outputs are not included.

### Example 1 – PCA Vs LDA

```
import numpy as np
import matplotlib.pyplot as plt
X = np.array([[0, 1, 2, 3, 4, 5, 1, 2, 3, 3, 5, 6, 7, 8], [1, 2, 3, 3, 5, 5, 0, 1, 1, 2, 3, 5, 6, 6]])
y = np.array([0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1])
X = X.T
X
```

Code language: JavaScript (javascript)

```
# plot the data
plt.scatter(X[:, 0], X[:, 1], c = y);
```

Code language: PHP (php)

First, we will apply PCA to check how it is different from LDA. You will see that it is not separating the classes.

```
# Apply PCA
from sklearn.decomposition import PCA
pca = PCA(n_components = 1)
pca.fit(X)
Xr = pca.transform(X)
print(Xr)
```

Code language: PHP (php)

```
# PCA projection
plt.scatter(Xr[:, 0], Xr[:, 0], c = y);
```

Code language: PHP (php)

Here The classes are clearly linearly separable

```
# Apply LDA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
X_lda = lda.fit_transform(X, y)
plt.scatter(X_lda[:, 0], X_lda[:, 0], c = y);
```

Code language: PHP (php)

### Example 2 – Just Using Numpy

Now we will see an end-to-end example without using Sklearn.

```
# Generate simple 2D dataset with two classes
np.random.seed(42)
class1_data = np.random.randn(50, 2) + np.array([2, 2])
class2_data = np.random.randn(50, 2) + np.array([5, 5])
# Plot data
plt.scatter(class1_data[:, 0], class1_data[:, 1], label='Class 1')
plt.scatter(class2_data[:, 0], class2_data[:, 1], label='Class 2')
plt.title('Generated Data')
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.legend()
plt.show()
```

Code language: PHP (php)

```
# Step 1: Calculate Class Means
mean_class1 = np.mean(class1_data, axis=0)
mean_class2 = np.mean(class2_data, axis=0)
print("Mean of Class 1:", mean_class1)
print("Mean of Class 2:", mean_class2)
```

Code language: PHP (php)

```
# Step 2: Calculate Within-Class Scatter Matrix (SW)
cov_class1 = np.cov(class1_data.T)
cov_class2 = np.cov(class2_data.T)
SW = cov_class1 + cov_class2
print("Within-Class Scatter Matrix (SW):\n", SW)
```

Code language: PHP (php)

```
# Step 3: Calculate Between-Class Scatter Matrix (SB)
mean_diff = (mean_class1 - mean_class2).reshape(2, 1)
SB = np.dot(mean_diff, mean_diff.T)
print("Between-Class Scatter Matrix (SB):\n", SB)
```

Code language: PHP (php)

```
# Step 4: Solve the Generalized Eigenvalue Problem
# Compute the eigenvalues and eigenvectors
eig_vals, eig_vecs = np.linalg.eig(np.linalg.inv(SW).dot(SB))
# Sort eigenvalues and corresponding eigenvectors
sorted_indices = np.argsort(eig_vals)[::-1]
eig_vals = eig_vals[sorted_indices]
eig_vecs = eig_vecs[:, sorted_indices]
print("Eigenvalues:\n", eig_vals)
print("Eigenvectors:\n", eig_vecs)
```

Code language: PHP (php)

```
# Step 5: Choose Top Eigenvector and Project Data
# Choose the top eigenvector for projection (LDA dimensionality reduction)
W = eig_vecs[:, 0]
# Project the data onto the new feature subspace
lda_result_class1 = class1_data.dot(W)
lda_result_class2 = class2_data.dot(W)
# Plot the LDA results with correct decision boundary
plt.scatter(lda_result_class1, np.zeros_like(lda_result_class1), label='Class 1', alpha=0.7)
plt.scatter(lda_result_class2, np.zeros_like(lda_result_class2), label='Class 2', alpha=0.7)
# Plot the linear discriminant line (decision boundary)
decision_boundary = (mean_class1.dot(W) + mean_class2.dot(W)) / 2
plt.axvline(x=decision_boundary, color='black', linestyle='--', label='Decision Boundary')
plt.title('LDA Projection Results with Decision Boundary')
plt.xlabel('LDA Projection Value')
plt.legend()
plt.show()
```

Code language: PHP (php)

```
# Project the data onto the new feature subspace for the top two eigenvectors
W1 = eig_vecs[:, 0]
W2 = eig_vecs[:, 1]
lda_result_class1_W1 = class1_data.dot(W1)
lda_result_class2_W1 = class2_data.dot(W1)
lda_result_class1_W2 = class1_data.dot(W2)
lda_result_class2_W2 = class2_data.dot(W2)
# Plot the LDA results for the top two eigenvectors
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.scatter(lda_result_class1_W1, np.zeros_like(lda_result_class1_W1), label='Class 1', alpha=0.7)
plt.scatter(lda_result_class2_W1, np.zeros_like(lda_result_class2_W1), label='Class 2', alpha=0.7)
plt.axvline(x=decision_boundary, color='black', linestyle='--', label='Decision Boundary')
plt.title('LDA Projection - Eigenvector 1')
plt.xlabel('LDA Projection Value')
plt.legend()
plt.subplot(1, 2, 2)
plt.scatter(lda_result_class1_W2, np.zeros_like(lda_result_class1_W2), label='Class 1', alpha=0.7)
plt.scatter(lda_result_class2_W2, np.zeros_like(lda_result_class2_W2), label='Class 2', alpha=0.7)
plt.axvline(x=0, color='black', linestyle='--', label='Decision Boundary')
plt.title('LDA Projection - Eigenvector 2')
plt.xlabel('LDA Projection Value')
plt.legend()
plt.tight_layout()
plt.show()
```

Code language: PHP (php)

### Example 3 – On The Wine Dataset

- Number of Instances: The dataset consists of a total of 178 instances.
- Number of Features: There are 13 attributes (features) in the dataset, representing various chemical properties of the wines.
- Classes: The dataset is divided into three classes, each corresponding to a different type of wine. The classes are labelled 1, 2, and 3.

Since there are 3 classes we can max go to 2 dimensions in LDA. We will do that here.

```
from sklearn.datasets import load_wine
import pandas as pd
wine = load_wine()
X = np.array(wine.data)
y = np.array(wine.target)
print(X[1:5, :])
print(y)
```

Code language: PHP (php)

`wine.feature_names`

Code language: CSS (css)

```
# Apply PCA - Check the output
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
result = pca.fit(X)
Z = result.transform(X)
plt.scatter(Z[:,0], Z[:,1], c = y);
```

Code language: PHP (php)

```
# Apply LDA - Check the output and compare it with PCA
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
X_lda = lda.fit_transform(X, y)
plt.scatter(X_lda[:, 0], X_lda[:, 1], c = y);
```

Code language: PHP (php)

You can also use LDA as a classifier. To do that simply create a train and a test set and how it performs. You can use the predict function provided by Sklearn. If you have any further queries, feel free to ask in community support.

## Limitations Of LDA

- In Linear Discriminant Analysis (LDA) for a classification problem with
*C*classes, the maximum number of discriminant functions (or feature projections) that can be derived is*C*−1. Each discriminant function represents a direction in the feature space that maximizes the separation between classes. - LDA assumes that the decision boundaries separating different classes are linear.
- LDA assumes that the feature vectors for each class follow a multivariate Gaussian distribution.
- It can be sensitive to situations where the discriminatory information is primarily in the variance of the data rather than in the mean differences between classes.

## Applications Of LDA:

Linear Discriminant Analysis (LDA) has various applications across different domains. Here are a few of its applications:

- In medical research, LDA can be applied to distinguish between different patient groups or to identify patterns in medical data for disease diagnosis.
- LDA can be utilized in speech recognition systems to model and classify speech signals based on discriminant features.
- LDA is employed in biometric identification systems, such as fingerprint recognition, where it helps in extracting discriminant features for accurate identification.
- LDA can be used in human-computer interaction applications for gesture recognition by finding discriminant features in sensor data.
- In finance, LDA can be applied to credit scoring or fraud detection by distinguishing between different risk or fraud categories based on financial features.
- LDA can be used in genomics to analyze gene expression data and classify samples into different biological conditions or disease states.
- In market research, LDA can assist in segmenting customers or products based on various attributes, aiding in targeted marketing strategies.

## Footnotes & Further Readings:

- Linear Discriminant Analysis: A Detailed Tutorial by Alaa Tharwat, Department of Computer Science and Engineering,

Frankfurt University of Applied Sciences, Frankfurt am Main, Germany ↩︎ - Fisher Linear Discriminant Analysis, Cheng Li, Bingyu Wang, August 31, 2014 ↩︎
- Shireen Elhabian and Aly A. Fara, University of Louisville, CVIP Lab September 2009 ↩︎
- Same as 1 ↩︎
- Same as 1 ↩︎