Math for Data Science

Math for Data Science is a comprehensive treatment that is accessible to beginners while maintaining the precision and depth the subject demands.

Starting from datasets and the question of what it means to average, the text develops linear geometry, matrix decompositions, calculus, probability, and statistics in a unified geometric framework, culminating with machine learning. Concepts are motivated by data, theorems are proved from first principles, and proofs are accompanied by working Python code.

The approach is geometric throughout. Eigenvectors are maximum variance directions. The pseudoinverse is the "closest twice" solution. The singular value decomposition is rotation, scaling, rotation. Information and the cumulant-generating function are convex duals. Trainability — the question of whether gradient descent converges in machine learning — is characterized precisely in terms of the geometry of convex hulls.

Probability is developed from first principles, with the binomial, normal, and chi-squared distributions derived rather than assumed. The central limit theorem is proved via moment-generating functions, and entropy and information are treated with the same geometric care as the rest of the text.

Nine appendices cover the supporting mathematics — permutations and combinations, the binomial theorem, the exponential function, two-dimensional geometry, complex numbers, integration, asymptotics, existence of minimizers, and SQL — allowing instructors to tailor coverage to their audience.

Several results appear here for the first time: the trainability of logistic loss and its dichotomy with perceptron loss, matrix reduction as a unifying framework for EVD and SVD, and a four-way equivalence characterizing full-rank datasets in terms of linear algebra, span, hyperplanes, and convex geometry.

Designed for programs in Applied Mathematics, Business Analytics, Computer Science, Data Science, and Engineering, the text supports both a one-semester course through the decomposition chapters and an advanced one-semester course through the machine learning chapters — or both semesters taught simultaneously.