When you start learning and working on machine learning problems, you usually think that the most important machine learning skill to master is training the best model performing model. You learn and apply all these features engineering tricks, features selection techniques and metrics to select the best model. A huge amount of literature, educational material, practical exercises and excellent libraries provide insight on high-level implementation of every machine learning model.
In fact, finding and training the best machine learning model is the easiest part of the process. Even in supervised problems, you get instant feedback on unseen test data, making model performance evaluations almost perfect. Machine learning as a discipline has reached a very high level of maturity. Besides in cases of cutting edge innovations in ANNs and deep learning, most capable computer scientists have easy access to machine learning that can easily solve simple problems.
In reality, the main focus of a data science engineering team is more than pandas, scikit-learn and fancy Spark pipelines. Real complexities come with translating real world problems into prediction tasks, transmitting trust in the prediction model and studying the representativeness of a dataset to real-world behaviour. Additionally, teams must think about how wrong predictions might affect the user or how the user is going to behave in the presence of given predictions. Lastly, they must know how to debug the model if something goes wrong. Ask yourself: is your static model going to perform adequately in a dynamic, real world environment?
Domain knowledge, interpretability, social impact and understanding the role of data are main issues. However, machine learning education and research is very focused on “finding the best model”, obviating the role of the data, human-machine interaction and complex interaction of the predictions in the real world. Nevertheless, the machine learning community is slowly shifting from a narrow perspective focused on model performance and scientific research to a broader understanding of the real world applicability.
When working on your machine learning model, even if it is not the best model available, you will succeed if you consider these aspects: