What Is Quick Matrix Multiplication? – Nick Higham

September 13, 2022

178

The definition of matrix multiplication says that for $ntimes n$ matrices $A$ and $B$ , the product $C = AB$ is given by $c_{ij} = sum_{k=1}^n a_{ik}b_{kj}$ . Every component of $C$ is an inside product of a row of $A$ and a column of $B$ , so if this components is used then the price of forming $C$ is $n^2(2n-1)$ additions and multiplications, that’s, $O(n^3)$ operations. For over a century after the event of matrix algebra within the 1850s by Cayley, Sylvester and others, all strategies for matrix multiplication had been primarily based on this components and required $O(n^3)$ operations.

In 1969 Volker Strassen confirmed that when $n=2$ the product may be computed from the formulation

$notag begin{gathered} p_1 =(a_{11}+a_{22})(b_{11}+b_{22}), p_2=(a_{21}+a_{22})b_{11}, qquad p_3=a_{11}(b_{12}-b_{22}), p_4=a_{22}(b_{21}-b_{11}), qquad p_5=(a_{11}+a_{12})b_{22}, p_6=(a_{21}-a_{11})(b_{11}+b_{12}), qquad p_7=(a_{12}-a_{22})(b_{21}+b_{22}), noalign{smallskip} C = begin{bmatrix} p_1+p_4-p_5+p_7 & p_3+p_5 p_2+p_4 & p_1+p_3-p_2+p_6 end{bmatrix}. end{gathered}$

The analysis requires $7$ multiplications and $18$ additions as an alternative of $8$ multiplications and $4$ additions for the standard formulation.

At first sight, Strassen’s formulation might seem merely to be a curiosity. Nevertheless, the formulation don’t depend on commutativity so are legitimate when the $a_{ij}$ and $b_{ij}$ are matrices, by which case for big dimensions the saving of 1 multiplication tremendously outweighs the additional $14$ additions. Assuming $n$ is an influence of $2$ , we are able to partition $A$ and $B$ into 4 blocks of dimension $n/2$ , apply Strassen’s formulation for the multiplication, after which apply the identical formulation recursively on the half-sized matrix merchandise.

Allow us to look at the variety of multiplications for the recursive Strassen algorithm. Denote by $M(k)$ the variety of scalar multiplications required to multiply two $2^k times 2^k$ matrices. We’ve got $M(k) = 7M(k-1)$ , so

$M(k) = 7M(k-1) = 7^2M(k-2) = cdots = 7^k M(0) = 7^k.$

However $7^k = 2^{log_2{7^k}} = (2^k)^{log_2 7} = n^{log_2 7} = n^{2.807dots}$ . The variety of additions may be proven to be of the identical order of magnitude, so the algorithm requires $O(n^{log_27})=O(n^{2.807dots})$ operations.

Strassen’s work sparked curiosity to find matrix multiplication algorithms of even decrease complexity. Since there are $O(n^2)$ components of knowledge, which should every take part in at the very least one operation, the exponent of $n$ within the operation depend have to be at the very least $2$ .

The present document higher certain on the exponent is $2.37286$ , proved by Alman and Vassilevska Williams (2021) which improved on the earlier document of $2.37287$ , proved by Le Gall (2014) The next determine plots the very best higher certain for the exponent for matrix multiplication over time.

Within the strategies that obtain exponents decrease than 2.775, numerous intricate methods are used, primarily based on representing matrix multiplication by way of bilinear or trilinear varieties and their illustration as tensors having low rank. Laderman, Pan, and Sha (1993) clarify that for these strategies “very giant overhead constants are hidden within the ` $O$ ‘ notation”, and that the strategies “enhance on Strassen’s (and even the classical) algorithm just for immense numbers $N$ .”

Strassen’s methodology, when rigorously applied, may be sooner than standard matrix multiplication for cheap dimensions. In apply, one doesn’t recur all the way down to $1times 1$ matrices, however moderately makes use of standard multiplication as soon as $n_0times n_0$ matrices are reached, the place the parameter $n_0$ is tuned for the very best efficiency.

Strassen’s methodology has the downside that it satisfies a weaker type of rounding error certain that standard multiplication. For standard multiplication $C = AB$ of $Ainmathbb{R}^{mtimes n}$ and $Binmathbb{R}^{ntimes p}$ we now have the componentwise certain

$notag |C - widehat{C}| le gamma^{}_n |A| @ |B|, qquad(1)$

the place $gamma^{}_n = nu/(1-nu)$ and $u$ is the unit roundoff. For Strassen’s methodology we now have solely a normwise error certain. The next outcome makes use of the norm $|A| = max_{i,j} |a_{ij}|$ , which isn’t a constant norm.

Theorem 1 (Brent). Let $A,B in mathbb{R}^{ntimes n}$ , the place $n=2^k$ . Suppose that $C=AB$ is computed by Strassen’s methodology and that $n_0=2^r$ is the brink at which standard multiplication is used. The computed product $widehat{C}$ satisfies

$notag |C - widehat{C}| le left[ Bigl( displaystylefrac{n}{n_0} Bigr)^{log_2{12}} (n_0^2+5n_0) - 5n right] u |A|, |B| + O(u^2). qquad(2)$

With full recursion ( $n_0=1$ ) the fixed in (2) is $6 n^{log_2 12}-5n approx 6 n^{3.585}-5n$ , whereas with only one stage of recursion ( $n_0 = n/2$ ) it’s $3n^2+25n$ . These evaluate with $n^2u + O(u^2)$ for standard multiplication (obtained by taking norms in (1)). So the fixed for Strassen’s methodology grows at a sooner price than that for standard multiplication it doesn’t matter what the selection of $n_0$ .

The truth that Strassen’s methodology doesn’t fulfill a componentwise error certain is a big weak point of the strategy. Certainly Strassen’s methodology can not even precisely multiply by the identification matrix. The product

$notag C = begin{bmatrix} 1 & 0 0 & 1 end{bmatrix} begin{bmatrix} 1 & epsilon epsilon & epsilon^2 end{bmatrix}, quad 0 < epsilon ll 1$

is evaluated precisely in floating-point arithmetic by standard multiplication, however Strassen’s methodology computes

$c_{22} = 2(1+epsilon^2) + (epsilon-epsilon^2) - 1 - (1+epsilon).$

As a result of $c_{22}$ includes subterms of order unity, the error $c_{22} - widehat c_{22}$ will likely be of order $u$ . Thus the relative error $|c_{22} - widehat c_{22}| / |c_{22}| = O(u/epsilon^2) gg O(u)$ ,

One other weak point of Strassen’s methodology is that whereas the scaling $AB to (AD) (D^{-1}B)$ , the place $D$ is diagonal, leaves (1) unchanged, it may well alter (2) by an arbitrary quantity. Dumitrescu (1998) suggests computing $D_1(D_1^{-1}Acdot B D_2^{-1})D_2$ , the place the diagonal matrices $D_1$ and $D_2$ are chosen to equilibrate the rows of $A$ and the columns of $B$ within the $infty$ -norm; he reveals that this scaling can enhance the accuracy of the outcome. Additional investigations alongside these strains are made by Ballard et al. (2016).

Ought to one use Strassen’s methodology in apply, assuming that an implementation is obtainable that’s sooner than standard multiplication? Not if one wants a componentwise error certain, which ensures correct merchandise of matrices with nonnegative entries and ensures that the column scaling of $A$ and row scaling of $B$ has no impact on the error. But when a normwise error certain with a sooner rising fixed than for standard multiplication is suitable then the strategy is price contemplating.

Notes

For current work on high-performance implementation of Strassen’s methodology see Huang et al. (2016, 2020).

Theorem 1 is from an unpublished technical report of Brent (1970). A proof may be present in Higham (2002, §23.2.2).

For extra on quick matrix multiplication see Bini (2014) and Higham (2002, Chapter 23).

References

It is a minimal set of references, which comprise additional helpful references inside.

Josh Alman and Virginia Vassilevska Williams. A refined laser methodology and sooner matrix multiplication. In Proceedings of the 2021 ACM-SIAM Symposium on Discrete Algorithms (SODA), Society for Industrial and Utilized Arithmetic, January 2021, pages 522–539.
Gray Ballard, Austin R. Benson, Alex Druinsky, Benjamin Lipshitz, and Oded Schwartz. Enhancing the numerical stability of quick matrix multiplication. SIAM J. Matrix Anal. Appl, 37(4):1382–1418, 2016.
Benson, Alex Druinsky, Benjamin Lipshitz, and Oded Schwartz. Enhancing the numerical stability of quick matrix multiplication. SIAM J. Matrix Anal. Appl., 37(4):1382–1418, 2016.
Dario A. Bini. Quick matrix multiplication. In Handbook of Linear Algebra, Leslie Hogben, editor, second version, Chapman and Corridor/CRC, Boca Raton, FL, USA, 2014, pages 61.1–61.17.
Bogdan Dumitrescu. Enhancing and estimating the accuracy of Strassen’s algorithm. Numer. Math., 79:485–499, 1998.
Nicholas J. Higham, Accuracy and Stability of Numerical Algorithms, second version, Society for Industrial and Utilized Arithmetic, Philadelphia, PA, USA, 2002.
Jianyu Huang, Tyler M. Smith, Greg M. Henry, and Robert A. van de Geijn. Strassen’s algorithm reloaded. In SC16: Worldwide Convention for Excessive Efficiency Computing, Networking, Storage and Evaluation, IEEE, November 2016.
Jianyu Huang, Chenhan D. Yu, and Robert A. van de Geijn. Strassen’s algorithm reloaded on GPUs, ACM Trans. Math. Software program, 46(1):1:1–1:22, 2020.
Julian Laderman, Victor Pan, and Xuan-He Sha. On sensible algorithms for accelerated matrix multiplication. Linear Algebra Appl., 162–164:557–588, 1992.
François Le Gall. Powers of tensors and quick matrix multiplication. In Proceedings of the thirty ninth Worldwide Symposium on Symbolic and Algebraic Computation, 2014, pages 296–303.

This text is a part of the “What Is” sequence, out there from https://nhigham.com/class/what-is and in PDF kind from the GitHub repository https://github.com/higham/what-is.

Previous articlePowerShell HashTable – Every thing it is advisable to know — LazyAdmin

Next articleWebKit Options in Safari 16.0 | CSS-Methods

What Is Quick Matrix Multiplication? – Nick Higham

Notes

References

Startup Shorts: Flux Marine Redefines Marine Propulsion By means of Electrification » Startups, Accelerators, & Entrepreneurs

Spinoff of state ‘1’ in block ‘X/Y/Integrator’ at time 0.55 will not be finite » Man on Simulink

A Treacherous SVD » Cleve’s Nook: Cleve Moler on Arithmetic and Computing

LEAVE A REPLY Cancel reply

Most Popular

Rogier de Boevé’s Portfolio 2024

How a lot AI compute to match humanity’s collective mind compute? A mind-boggling comparability – Be on the Proper Facet of Change

Merge Type in C Program [Full Guide]

JavaScript Weekly Difficulty 698: July 25, 2024

Recent Comments

ABOUT US

POPULAR POSTS

Rogier de Boevé’s Portfolio 2024

How a lot AI compute to match humanity’s collective mind compute? A mind-boggling comparability – Be on the Proper Facet of Change

Merge Type in C Program [Full Guide]

POPULAR CATEGORY