• Home
  • Get Started
  • Updates
  • Support
  • Shop
  • Pricing
  • AI News
Get Started
  • Login
  • Register
Neuraldemy
Cart / 0.00$

No products in the cart.

No Result
View All Result
Get Started
Neuraldemy
Get started
Home Premium

[In Depth] Singular Value Decomposition: Concepts And Applications

Everything you need to know about SVD for machine learning. From theory to applications

Amritesh Kumar by Amritesh Kumar
December 7, 2023 - Updated on September 5, 2024
in Machine Learning
Reading Time: 39 mins read
A A
Singular Value Decomposition: Concepts And Applications

The singular value decomposition (SVD) is a crucial concept for machine learning that builds the foundation of many algorithms you will use throughout your machine learning journey. It’s not only used in ML or data science but in other fields too. At its core, SVD breaks down complex datasets into simpler, more interpretable components, helping us get some deeper insights. It also allows you to find solutions to various problems such as image compression, text mining, and multidimensional data analysis.

That’s why it’s so important to understand. For machine learning, you don’t need to learn how to implement SVD in Python because so many libraries allow you to perform SVD. Another reason why you don’t need to know its implementation is because it can be a very complex project which will deviate you from the main goal of actually learning how to use it. However, to use SVD you need to understand the basics of SVD so that you can make the best of it and understand more advanced concepts such as PCA which I will explain in another post. So, let’s get started and learn SVD.

Premium
Intermediate
1 Week to Learn
Singular Value Decomposition
Prerequisites
Linear Algebra Basics
Eigendecomposition
Spectral Decomposition
Python & Numpy
What You Will Learn
  • Concept of Singular Value Decomposition (SVD)
  • How to perform SVD on matrices
  • Low Rank Approximation
  • SVD & Low Rank Approximation Using Numpy
Tutorial Outcomes
By the end of this tutorial on Singular Value Decomposition (SVD), you’ll have a thorough understanding of how to decompose matrices into singular values and vectors. You will build foundation for more advanced concepts like PCA.
This Tutorial Includes
Certificate of completion
Concepts & Explanation
Mathematical Derivations
Coding Examples

Table of Contents

  • What is SVD?
    • 1. Matrix U:
    • 2. Matrix Σ:
    • 3. Matrix VT:
    • 4. Rank of A:
  • What Does Singular Value Decomposition Tell Us?
  • Theorem, Proofs, Examples Of SVD & Low-Rank Approximation
  • Applications of SVD
    • Singular Value Decomposition using Numpy
      • Low-Rank Approximation Of A Matrix
  • Footnotes And Sources

What is SVD?

You must have heard about the eigendecomposition which is explained in our notes on linear algebra and how we use eigendecomposition to understand various underlying features. The problem with eigendecomposition is the matrix should be square but in the real world, our data points are not in a square matrix form all the time.

So, we need a way to decompose such matrices so that we can know more about the underlying features of such matrices. So, SVD or singular value decomposition is a way to dissect any matrices further down so that we can work with such real-world matrices.

A formal definition can be: The singular value decomposition of a matrix is the factorization of A into the product of three matrices A = UΣVT where the columns of U and VT (V transpose) are orthonormal and the matrix Σ is diagonal with positive real entries. σ1,…,σr being the singular values of A satisfying σ1 ≥ σ2 ≥ ··· ≥ σr > 0.

Singular Value Decomposition
Source1

1. Matrix U:

  • U is an orthogonal matrix, meaning its columns are unit vectors (length 1) that are perpendicular to each other. This property is expressed as UTU =I, where I is the identity matrix.
  • U represents an orthogonal transformation that reorients the coordinate system of the rows of A. It aligns the data with a new basis where the rows are positioned along the principal directions determined by VT.
  • The columns of U are the left singular vectors. They describe the relationships between rows in the original matrix A and the new coordinate system.

2. Matrix Σ:

  • Σ is a diagonal matrix, meaning all its elements outside the main diagonal are zero. The diagonal elements are the singular values of A, denoted as σ1​,σ2​,…,σr​ (where r is the rank of A).
  • The singular values in Σ represent the scaling factor by which the corresponding directions in U and V are stretched or shrunk during the transformation. Larger singular values indicate more significant importance.
  • Geometrically, Σ stretches or shrinks the data along the principal axes determined by VT.

3. Matrix VT:

  • Similar to U, V or its transpose is an orthogonal matrix. Its columns are unit vectors and VVT=I.
  • Geometrically, VT reorients the coordinate system for the columns of A, aligning them with the directions (principal axes) that capture the most variance.
  • The columns of VT are the right singular vectors. These vectors provide a new orthonormal basis for the columns of A. They describe the directions along which the data is most spread out in the column space.

4. Rank of A:

  • The rank of A is the number of non-zero singular values in Σ. In other words, the rank of a matrix is the number of linearly independent rows or columns. It indicates the number of dimensions or directions in which the data has non-zero variance.
[In Depth] Singular Value Decomposition: Concepts And Applications
The matrix VT (or V) represents a rotation (or reflection) of the data space to align it with the principal components. In other words, VT rotates the data into a new coordinate system where the axes are aligned with the directions of maximum variance. The matrix Σ scales the data along the new axes defined by VT. The diagonal entries in Σ(the singular values) indicate how much stretching or shrinking occurs along each axis. Larger singular values correspond to directions where the data has more variance. The matrix U represents another rotation (or reflection) of the data, this time in the transformed space. It aligns the rows of the original matrix A with the principal components after scaling.
Geometrical Interpretation of SVD, Image Credit Murad Shibili2

What Does Singular Value Decomposition Tell Us?

We can deduce a few things about SVD based on our above definition:

  • The matrices U and VT in the SVD are orthogonal. This orthogonality implies that the columns of U and VT form orthonormal bases. It also suggests that the transformation represented by U and VT preserves the length and angles between vectors.
  • The matrix Σ is diagonal, and its diagonal elements are the singular values of the original matrix. These singular values are non-negative and represent the importance or magnitude of the corresponding singular vectors. The singular values are arranged in descending order. What that means is by using these singular values we can work out a way to compress data without losing much information.
  • The rank of the matrix is equal to the number of non-zero singular values in Σ. If the matrix has a rank r, then only the first r singular values are non-zero, and the rest are zero.
  • The singular values of a matrix A are the square roots of the eigenvalues of ATA (or AAT). This connection to eigenvalues provides a link between SVD and the spectral properties of the matrix. Here “spectral properties” means the properties related to its eigenvalues and eigenvectors.
  • The SVD provides the best low-rank approximation of a matrix. The best k-rank approximation is achieved by retaining the first k singular values and their corresponding singular vectors. This property is useful in data compression and noise reduction. This means by using SVD we will be able to find a matrix that takes low storage instead of storing the big file size.

Let’s understand the above concepts using some intuition based example to make them more clear:

To appreciate the utility of SVD, it’s helpful to start by thinking about what a matrix represents. We have already talked about this in our tutorial on mathematical intuition. In many applications, a matrix is a way to store data, where the rows and columns correspond to different variables or observations. For example, in image processing, a matrix might represent pixel intensities, while in other tasks, it might represent user ratings for a set of movies. However, not all data in a matrix is equally informative; some of it might be noise, redundancy, or irrelevant details.

Let’s consider a simple, concrete example: imagine a 2D dataset where each data point is represented by a vector in a 2D space, like coordinates on a graph. The data might look like a cloud of points, with some directions (or axes) showing more spread or variation than others. Suppose we have a set of points on a 2D plane, and we arrange these points into a matrix A, where each row of A represents a point, and each column represents a dimension (in this case, just two dimensions: x and y coordinates). The matrix A might look something like this:

A = [[1 2], [3, 4], [5, 6]]​Code language: plaintext (plaintext)

Here, each row [1, 2] etc. corresponds to a point in the plane. The question is, what patterns or structures can we extract from this matrix? This is where SVD comes in. SVD is a way to decomposes a matrix A into three matrices: A = UΣVT. U is an orthogonal matrix containing the left singular vectors. Σ is a diagonal matrix containing the singular values. VT is the transpose of orthogonal matrix (V) containing the right singular vectors. But what does this mean intuitively?

In a 2D or 3D space, vectors are orthogonal if they are at right angles to each other. For example, in the 2D plane, the x-axis and y-axis are orthogonal because they meet at a 90-degree angle. Orthogonality in higher dimensions generalizes this idea: two vectors are orthogonal if their dot product is zero, meaning they are independent of each other in a geometric sense.

In the context of SVD, orthogonal matrices like U and V consist of vectors (or columns) that are mutually orthogonal. This orthogonality is crucial because it means that the singular vectors provide a set of independent directions that span the space where our data lives. These directions are like the principal axes of an ellipse that best describes the shape of our data cloud.

VT captures the directions in the original space (columns of A) that explain the most variance in the data. It provides the axes along which the data is most “spread out.” These are the principal directions or eigenvectors of the data covariance matrix. Σ is a diagonal matrix where the diagonal entries are the singular values. These values tell us how much variance or information is contained along each direction identified by VT. The larger the singular value, the more important that direction is. Smaller singular values indicate directions that contribute less to the overall structure of the data. U tells us how to combine the principal directions given by VT to reconstruct the original data. It provides the transformation to align the rows of A with these principal directions.

Now, let’s go back to our example matrix A and apply SVD to it. Suppose after applying SVD, we obtain the following matrices:

U:
  - [-0.23, 0.88]
  - [-0.52, 0.45]
  - [-0.81, -0.12]

Σ:
  - [9.51, 0]
  - [0, 0.77]

V^T:
  - [-0.40, -0.91]
  - [-0.91, 0.40]
Code language: plaintext (plaintext)

The rows of VT represent the directions in the 2D space that capture the most variance in the data. These directions are orthogonal, meaning they are at right angles to each other, just like the principal axes of an ellipse. The values 9.51 and 0.77 in Σ represent how much “stretching” occurs along each of these directions. The larger singular value (9.51) indicates that most of the data’s variance is along the first direction given by VT. The smaller value (0.77) indicates less variance along the second direction. The columns of U are the left singular vectors, which provide an orthonormal basis for the row space of A. They represent how the rows of A are oriented in the transformed space defined by principal directions (from VT).

Essentially, When reconstructing the original matrix A, U helps in aligning the transformed data (from Σ and VT) back to the original row space, while Σ scales the data along the principal directions and VT provides the principal directions in the column space.

If you think in terms of constructing A from these three matrices rather than decomposing it, it will make more sense but be careful this is not the right approach to think!

This is what SVD is all about. We can exploit this property for other application. One such application is Low-Rank Approximation.

By keeping only the most significant singular values and vectors, we can represent our data much more compactly. This is also the basis for many image and signal compression techniques. A matrix is considered low-rank if its rank r (the number of non-zero singular values) is much smaller than the dimensions of the matrix. This means matrix might have many rows and columns, but if the rank (number of non-zero singular values) is much smaller, it means that the data can be effectively described with fewer dimensions. This indicates redundancy or correlation among the data dimensions. The idea behind low-rank approximation is to approximate a matrix A by another matrix A` with lower rank, capturing the most significant patterns while ignoring the less important details.

The low-rank approximation is achieved by truncating the SVD components. Specifically, by retaining only the top k singular values and their corresponding singular vectors, we can construct a matrix A` that is a rank-k approximation of A. Mathematically, this can be expressed as: A` =UkΣkVkT where Uk and VkT​ are the matrices obtained by taking only the first k columns of U and VT, respectively, and Σ​ is the diagonal matrix containing only the top k singular values. The resulting matrix A` is the best rank-k approximation of A in terms of minimizing the Frobenius norm of the difference between A and A`.

This process of truncating the SVD is not arbitrary. The singular values are sorted in decreasing order, meaning the first few singular values capture the most variance in the data. By focusing on these dominant singular values, we retain the most significant features of the matrix while reducing its complexity. This reduction is the core idea behind dimensionality reduction technique like Principal Component Analysis (PCA), which is fundamentally linked to SVD.

Imagine a high-resolution image stored as a matrix, where each pixel’s intensity is a matrix element. A high-resolution image might have millions of pixels, but not every pixel contains unique information. Large areas of the image might be uniform, or patterns might repeat across different regions. By applying SVD and low-rank approximation, we can compress the image by retaining only the most critical information that defines its structure and discarding the less significant details. The result is a much smaller matrix that still captures the essence of the original image. See, an example below.

By approximating an image with a low-rank matrix, we can reduce the storage space required while preserving the visual quality.

Theorem, Proofs, Examples Of SVD & Low-Rank Approximation

Please expand to navigate through the pages. Once you have grasped these then move ahead,

[In Depth] Singular Value Decomposition: Concepts And Applications
1
[In Depth] Singular Value Decomposition: Concepts And Applications
2
[In Depth] Singular Value Decomposition: Concepts And Applications
3
[In Depth] Singular Value Decomposition: Concepts And Applications
4
[In Depth] Singular Value Decomposition: Concepts And Applications
5
[In Depth] Singular Value Decomposition: Concepts And Applications
6
[In Depth] Singular Value Decomposition: Concepts And Applications
7
[In Depth] Singular Value Decomposition: Concepts And Applications
8
[In Depth] Singular Value Decomposition: Concepts And Applications
9

Applications of SVD

Here is an example of how can we use SVD. Let’s consider you are trying to send a photo from space to the earth. The photograph was captured by a satellite. Now you want to find a way to send the image so that it takes less memory without losing the actual image contents. Here we can compress the image to some extent by using SVD matrix approximations to send the photo.

Here how the image looks like based on different singular values considerations:

[In Depth] Singular Value Decomposition: Concepts And Applications
[In Depth] Singular Value Decomposition: Concepts And Applications
[In Depth] Singular Value Decomposition: Concepts And Applications
Image Source3

Singular Value Decomposition using Numpy

In this section, we will use Numpy to perform SVD on a matrix and see how we can further implement low-rank approximation to find another matrix that approximates the original one. You can use images as well.

# Importing NumPy
import numpy as np
# Creating a matrix
a = np.array([[1, 2, 3, 4], [1, 1, 2, 3], [0, 1, 1, 0]])
array([[1, 2, 3, 4],
[1, 1, 2, 3],
[0, 1, 1, 0]])
# Calculate the rank of the matrix
np.linalg.matrix_rank(a)
3
# Perform SVD using NumPy
u, s, vh = np.linalg.svd(a, full_matrices=True)
# Output U matrix
u
array([[ 0.8109, 0.0934, -0.5776],
[ 0.57 , -0.3493, 0.7437],
[ 0.1323, 0.9324, 0.3365]])
# Check if U is orthonormal
u.dot(u.T)
array([[ 1., 0., 0.],
[ 0., 1., -0.],
[ 0., -0., 1.]])
# Output VH matrix
vh
array([[ 0.2046, 0.3443, 0.5488, 0.7338],
[-0.2181, 0.6561, 0.438 , -0.5746],
[ 0.7598, -0.3431, 0.4167, -0.3625],
[-0.5774, -0.5774, 0.5774, 0. ]])
# Check if VH is orthonormal
vh.dot(vh.T)
array([[ 1., 0., -0., 0.],
[ 0., 1., 0., 0.],
[-0., 0., 1., 0.],
[ 0., 0., 0., 1.]])
# Output Singular Values
s
array([6.7509, 1.1734, 0.2186])
# Make S into 3×3 diagonal matrix
sd = np.diag(s)
array([[6.7509, 0. , 0. ],
[0. , 1.1734, 0. ],
[0. , 0. , 0.2186]])
# Create matrix B and add SD to it
b = np.zeros((3, 4))
b[:, :-1] = sd
sigma = b
sigma
array([[6.7509, 0. , 0. , 0. ],
[0. , 1.1734, 0. , 0. ],
[0. , 0. , 0.2186, 0. ]])
# Reconstruct the original matrix
print(np.dot(np.dot(u, sigma), vh))
[[1. 2. 3. 4.]
[1. 1. 2. 3.]
[0. 1. 1. 0.]]

We added zero because the sigma matrix is equal to the size of the original matrix. Finally, We got our original matrix!!

Low-Rank Approximation Of A Matrix

Now let’s find a rank-2 approximation of the above matrix.

s = [6.7509, 1.1734, 0] # third element will be replaced with 0 in rank – 2 approximation
sd = np.diag(s)
sd
array([[6.7509, 0. , 0. ],
[0. , 1.1734, 0. ],
[0. , 0. , 0. ]])
b = np.zeros((3, 4))
b
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
b[:, :-1] = sd
sigma = b
sigma
array([[6.7509, 0. , 0. , 0. ],
[0. , 1.1734, 0. , 0. ],
[0. , 0. , 0. , 0. ]])
rank2 = np.dot(np.dot(u, sigma), vh)
print(rank2)
[[ 1.0959 1.9567 3.0526 3.9542]
[ 0.8764 1.0558 1.9322 3.0589]
[-0.0559 1.0252 0.9693 0.0267]]
# let’s check its rank
np.linalg.matrix_rank(rank2)
2
# Compare the output with matrix a
print(a)
[[1 2 3 4]
[1 1 2 3]
[0 1 1 0]]

This result above approximates the matrix a which is rank 3 to a matrix with rank 2!! We can compare the outputs with matrix a. I hope this was helpful. For more details check out the additional resources below. The main concept you should take from this tutorial is of diagonal matrix and how it plays an important role in variance capturing. It will be useful in next tutorial on PCA. If you have any queries please drop your questions here: Linear Algebra.

Footnotes And Sources

  1. SingularValueDecomposition, Andrew Lounsbury Department of Mathematics TennesseeTechnological University [Link] ↩︎
  2. In Vivo Dynamic Image Characterization of Brain Tumor Growth Using Singular Value Decomposition and Eigenvalues [Link] ↩︎
  3. SingularValueDecomposition, Andrew Lounsbury Department of Mathematics TennesseeTechnological University [Link]  ↩︎

Other sources and further readings:

  1. Singular Value Decomposition: Applications to Image Processing Elizabeth A. Compton and Stacey L. Ernstberger, PhD [Link]
  2. The Extraordinary SVD Carla D. Martinand Mason A. Porter [Link]
  3. Advance SVD Applications [Link]
Tags: Low Rank ApproximationMatrix FactorizationSVD
Previous Post

Intelligence, Knowledge, Data, Information, AGI, Superintelligence And Responsible AI

Next Post

[In Depth] Principal Components Analysis: Concepts And Application

Amritesh Kumar

Amritesh Kumar

I believe you are not dumb or unintelligent; you just never had someone who could simplify the concepts you struggled to understand. My goal here is to simplify AI for all. Please help me improve this platform by contributing your knowledge on machine learning and data science, or help me improve current tutorials. I want to keep all the resources free except for support and certifications. Email me @amriteshkr18@gmail.com.

Related Posts

Linear Discriminant Analysis: Concepts And Application

[In Depth] Linear Discriminant Analysis: Concepts And Application

Principal Components Analysis: Concepts And Application

[In Depth] Principal Components Analysis: Concepts And Application

Next Post
Principal Components Analysis: Concepts And Application

[In Depth] Principal Components Analysis: Concepts And Application

Linear Discriminant Analysis: Concepts And Application

[In Depth] Linear Discriminant Analysis: Concepts And Application

Machine Learning: An Introduction

Machine Learning: An Introduction

  • Customer Support
  • Get Started
  • Ask Your ML Queries
  • Contact
  • Privacy Policy
  • Terms Of Use
Neuraldemy

© 2024 - A learning platform by Odist Magazine

Welcome Back!

Login to your account below

Forgotten Password? Sign Up

Create New Account!

Fill the forms below to register

*By registering into our website, you agree to the Terms & Conditions and Privacy Policy.
All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Home
  • Get Started
  • Updates
  • Support
  • Shop
  • Pricing
  • AI News
  • Login
  • Sign Up
  • Cart
Order Details

© 2024 - A learning platform by Odist Magazine

This website uses cookies. By continuing to use this website you are giving consent to cookies being used.
Are you sure want to unlock this post?
Unlock left : 0
Are you sure want to cancel subscription?
0