Mathematics for AI 2

Times：5 sessions

Format：On-demand

Presented：AIC

Student Reservation Link

Worker Reservation Link

(1) Purpose and content of the course

This course is a continuation of “Mathematics for AI (1),” which will be offered in the first half of the spring (fall) semester. In course (1), students learned the basics of matrices, and in course (2), the topic of matrices used in AI will be discussed in depth. The concept of using matrices appears not only in AI but also in various other areas. For example, matrices are needed to solve extreme value problems in bivariate functions. In this course, each session will cover a different topic. Each session will cover a very important topic in machine learning.

(2) Contents of each session

Session 1: Extreme value problems of multivariate functions (1)

When training AI models, we often try to maximize or minimize a function. However, the maximization/minimization problem has a difficult aspect because it requires looking at the overall situation. Therefore, we consider an extreme value problem, which is a slightly simplified version of the maximization/minimization problem. Extreme value problems can be solved mechanically using differentiation from the surrounding situation, and are often used in learning in AI.

Session 2: Differentiation of vectors and matrices

Until now, high school mathematics and other courses have dealt with derivatives of real-valued functions, but this time we will learn about derivatives for vectors and matrices. This is a topic that frequently appears in the field of machine learning, so it is likely to come up in the course of machine learning studies in the future. The lecture will proceed in the sequence of definition of computation, formulas, and proofs. At the end of the lecture, the least-squares estimator of the linear regression model will be obtained using the learned methods.

Session 3 Principal Component Analysis

Principal component analysis is one of the data analysis methods in statistics. It provides a quantitative solution to the problem of what point of view to look at data with a lot of information. Since this field has much to do with eigenvalues, please review the contents of AI Mathematics 1 carefully.

Session 4: Markov Chain

A Markov chain is simply “a model at time t+1 that depends only on the state at time t.” Although the concept appeared in probability theory, it has also been applied in the field of AI. In particular, the Diffusion model, which is the basis of image generation models such as Midjourny and Stable Diffusion, which have attracted much attention in recent years, is a generation model using Markov chains.

Session 5: Kullback-Leibler information content

The Kullback-Leibler information criterion is used to measure the difference between two probability distributions, and due to its nature, it appears in a variety of fields including statistics, probability theory, information theory, and statistical mechanics as well as machine learning. In machine learning, it appears frequently, especially in the context of variational autoencoders. Variational autoencoders are used in recent image generation models.

In this lecture, we will first review the definition of probability distributions. Next, the definition and properties of the Kullback-Leibler information criterion will be described and its relation to similar concepts appearing in various fields will be introduced.

Student Reservation Link

Worker Reservation Link