Math for Data Science

Code and supporting files at mathdatasciencebook.github.io
Includes nine appendices covering background material
Includes 472 exercises

⁂

Math for Data Science is a comprehensive treatment that is accessible to beginners without sacrificing either precision or depth.

Starting from datasets and the question of what it means to average, the text develops linear geometry, principal components, calculus, probability, and statistics in a unified geometric framework, culminating with machine learning. Concepts are motivated by data, theorems are proved from first principles, and proofs are accompanied by Python code.

Suitable for courses in applied mathematics, business analytics, computer science, data science, and engineering, the text supports both a one-semester course through the principal components chapter and a one-semester course through the machine learning chapter — or both semesters taught simultaneously.

⁂

The approach is geometric throughout. Eigenvectors are maximum variance directions. The pseudoinverse is the "closest twice" solution. The singular value decomposition is rotation, scaling, rotation. Information and the cumulant-generating function are convex duals. Trainability — the question of whether gradient descent converges in machine learning — is characterized precisely in terms of the geometry of convex hulls, and SGD is carefully derived for strongly convex mean losses.

Probability is developed from first principles, with the binomial, normal, and chi-squared distributions derived rather than assumed. The central limit theorem is proved via moment-generating functions, MLE consistency is derived using information and surprisal, and entropy and information are treated with the same geometric care as the rest of the text.

Nine appendices cover the supporting mathematics — permutations and combinations, the binomial theorem, the exponential, two-dimensional geometry, complex numbers, integration, asymptotics, existence of minimizers, and SQL — allowing instructors to tailor coverage to their audience.