Summary of “Machine Learning by Andrew NG”


I found this course from zhihu. Lots of people recommend this course as “the best way to start with Machine Learning”. So I spent two weeks to finish this course. After finishing it, I found it a great course as well! So here’s the link: It won’t cost you so much time(about 50 hours are enough), but will lead you to a new world.

Problems in this course

According to wikipedia, here are 5 subfields:
1. Classification: To divide inputs to known classes.
2. Regression: To estimate the relationships among variables.
3. Clustering: To divide inputs to classes. Unlike in classification, the groups are not known beforehand.
4. Density estimation: To find the distribution of inputs in some space.
5. Dimensionality reduction: To simplify inputs by mapping them into a lower-dimensional space.

In this course, all these 5 topics are involved.

Algorithms in this course

  1. Gradient Descent: A powerful algorithm to solve Classification Problems and (Linear) Regression Problems. This algorithm use derivative of the cost function to minimize the cost function.
  2. Stochastic Gradient Descent: A variant of Gradient Descent. When dealing with a large amount of data, it’s much faster than Gradient Descent. But it’s a little bit harder to converge.
  3. Mini-Batch Gradient Descent: A variant of Gradient Descent. It cost less time to complete a single iteration than Gradient Descent, but slower than Stochastic Gradient Descent. But it can fit data better than Stochastic Gradient Descent. Actually you can regard this algorithm as a compromise between the original Gradient Descent and Stochastic Gradient Descent.
  4. Collaborative Filtering: A variant of Gradient Descent. It’s often used in Recommender system.
  5. Normal Equation: A great way to solve Linear Regression Problems. It use numerical tricks to fit the data perfectly. In this algorithm we have to compute the inverse of a matrix, which can be solved in $O(n^3)$. So this algorithm can’t deal with datasets with so much features.
  6. Support vector machine(SVM): A powerful tool to solve Classification Problems and (Linear) Regression Problems. In this course, Andrew explains the application of this algorithm in classification problems, and it can be described as a Large Margin Classifier. Furthermore the cost function of SVMs is convex, so it won’t be trapped in the local optimum. Moreover with the “Kernel trick”, it can fit nonlinear hypothesis well.
  7. Neural Network(Backpropagation): The most popular algorithm in Machine Learning. Neural networks try to simulate our brain, so it’s believed as the most possible way to build strong AI. And Backpropagation use derivative of the cost function to minimize the cost function. It’s easy to learn, and perform well on many problems.
  8. K-Means Algorithm: This algorithm try to find patterns in data by itself. It divides data to different unknown classes. It’s useful in analysis.
  9. (Multivariate)Gaussian Distribution Algorithm: An algorithm based Gaussian Distribution to solve Density Estimation Problems. Widely used in Anomaly Detection.

Useful Tricks

  1. Feature Scaling: Scale data to make algorithms work better. Widely used in Gradient Descent and other algorithms.
  2. One-vs-All: This trick allows you to do very little modification on your two-class classifier to make it a multi-class classifier.
  3. Regularization: It’s the most useful way to solve overfitting problems.
  4. Gradient Check: An easy numerical way to determine whether your implement of cost function is bug-free.
  5. Random Initialization: A necessary part of Neural Network. And Random Initialize for several times is also a good way to increase the possibility to find global optimum rather than local optimum.
  6. Train/Validation/Test set: A way to assign your dataset. It’s widely used in almost every single algorithm.
  7. Learning Curve: A good way to evaluate your algorithm. And it can help you to decide how to improve your algorithm.
  8. Precision/ Recall/ $F_1$ Score: A good way to evaluate your algorithm, especially when your dataset it skewed.
  9. Principal Component Analysis(PCA): A good way to compress your data. It can reduce the number of principal components. This can speed up your algorithm. Also it can help you to visualize your data.
  10. Ceiling Analysis: A way to the pipeline of your Machine Learning system. It can help you to decide which component to optimize worth the most.

Important Ideas

  1. Build a naive system as fast as possible. Optimize your system later.
  2. Do analyze your system. Let the result of analysi tell you what to do next instead of intuition.

Leave a Reply

Your email address will not be published. Required fields are marked *