Developing successful machine learning applications requires a substantial amount of experience and state-of-the-art knowledge. Designing and implementing predictive models is often a slow “trial and error” process that gets more agile based on the expertise of the machine learning engineers involved.
In this article, I want to describe some lessons that machine learning researchers and practitioners have learned over the years, important issues to focus on, and answers to common questions. I’d like to share these lessons in this article because they are extremely useful when thinking about tackling your next machine learning problem.
The combination of representation, evaluation and optimization is what machine learning is all about. A classifier or a regressor must be represented in formal language that a computer understands. Also, an evaluation function is needed to distinguish good classifiers from bad ones. Finally, we need a method to search among the tested models for the highest-scoring one. The choice of optimization technique is key to the efficiency of the learner and also helps determine the classifier produced if the evaluation function has more than one optimum.
The fundamental goal of machine learning is to generalize beyond the examples in the training set. As expected, no matter how much data we have, it is very unlikely that we will see those exact examples again in a production environment. The most common mistake among machine learning beginners is to test the training data and have a false impression of the predictive models’ capabilities. If the chosen classifier is then tested on new data, it is often no better than random guessing. Be sure to keep some of the data to yourself and test the classifier they give you on it.
It is no secret how time-consuming it is to gather, integrate, clean and pre-process data, and how much trial and error can go into feature design. Machine learning is not a one-time process of building a dataset and running a learner, but rather an iterative process of running the learner, analyzing the results, modifying the data and/or the learner, and repeating. However, feature engineering is more difficult because it is domain-specific, while learners can be largely general-purpose and integrated in well-known libraries. Good feature engineering often leads to better model performance due to better information representation, while model selection over similar “cutting-edge” frameworks won’t boost prediction accuracy.
You have probably heard the phrase “Every function can be represented, or approximated arbitrarily closely, using this representation”. However, just because a function can be represented does not mean it can be learned. For example, a state-of-the-art random forest cannot learn trees with more leaves than training examples. Furthermore, if the hypothesis space (i.e. if it has many local optima of the evaluation function, as is often the case, the learner may not find the true function even if it is representable). Given finite data, time and memory, standard learners can learn only a tiny subset of all possible functions, and these subsets are different for learners with different representations. Therefore the key question is not “Can it be represented?”, to which the answer is often trivial, but “Can it be learned?” And it pays to try different learners (and possibly combine them).
Let’s face a scenario in which you have engineered brilliant features, but the model is not improving enough. There are two main choices at hand: design a better learning algorithm, or gather more data. As a rule of thumb, a dumb algorithm with lots and lots of data beats a clever one with modest amounts of it. As you would know, all machine learning models essentially work by grouping nearby examples into the same class. The key concept when designing clever models is in the meaning of “nearby”. With non-uniformly distributed data, machine learning algorithms can produce different thresholds while still making the same predictions in the most common examples, i.e. the most common regions of the samples space.
If you found these five points useful, consider checking out a very good article from Professor Pedro Domingos of University of Washington titled “A Few Useful Things to Know about Machine Learning“. If you think these tips improved your machine learning methodologies, I encourage you to take a look at the full article. And see you in my next post!