Reduce Docker Image Size for Machine Learning
In my previous blog, I proposed a way to easily run large scale machine learning task in cloud using Docker container and Azure Batch. I also use this approach at work for some of my projects. One thing I start realizing is the size of the contianer image can grow very quickly as we add more functionality into the ML training task. Use open source tools such as scikit-learn, nltk etc. will bring additional dependencies into the container image. For example, some of us may use mini conda, but it can easily introduce a few hundred MBs into the docker container image. The Ubuntu 16.04 base image is about 120MB, then very quickly I start seeing my container image size go beyond 1GB, then 3GB after install some other tools. ...