How to start with Machine Learning?

This article is always a WIP, though I have been asked this by so many juniors that I thought it would be better to write it down, so I don't forget any details. I gave a session [slides] at BITS Goa about the same. I will primarily be summarizing that, but that had a lot of BITS specific advice.

Table of Contents

Why ML (brief)?

I think a lot of people can motivate this much better than me, and let's be honest if you didn't know, you wouldn't be reading this. However, I got into ML (my very first motivation) to get a high-paying job without preparing for a traditional coding interview. However, over time, I really started to like the field and now am much more into research because it's fun.

I think that will always be a central motivation for most people. There are other reasons as well, I personally think that an ML job is much more fun than an equal paying SWE job, I have seen several top engineers take up ML roles, but have not seen the reverse at all. In my opinion, its close to the level of dynamic research you would get to do in a quant role, but is much more approachable (at least MLE roles). From an academic standpoint, it is the hottest field right now, with the most funding, most attention, etc. Authors of the AlphaFold paper recently won the Nobel Prize in Chemistry.

I also believe that entry level ML positions will become much harder in the coming years (like SWE roles) and now is a great time to get into the field. Finally, I think the field is still ripe to harvest, i.e., it will continue to be relevant in the near future. Please read the Caveats section before anything else.

Courses

The following parts are much less subjective but are still my opinions. For an up-to-date resource list, I would point anyone to SAiDL's latest assignment. I knew a bit of python, and anyone can pick it up along the way, any standard Python course would be adequate. I started with two Coursera courses, and would highly recommend them, [ML Specialization] and [DL Specialization]. I finished the ML specialization in about 10 days, but was doing the course full-time. I think it is a great introduction and touches the surface of several algorithms & applications. The DL specialization took much more time (about 3 months), but I was doing it with a full semester's course load1I especially enjoyed writing the BPTT for an LSTM layer in numpy. This course is more technical and assumes prior knowledge in linear algebra and statistics. Though for both of these courses, I would say that undergraduate level courses in these topics would be sufficient. You can start earlier as well, it might take more time to learn and do the course.

Next, along with the DL specialization, I was also doing Stanford's cs231n. Note that this course overlaps a lot with the DL specialization but I would still recommend this, because they both have some edge on each other, for e.g., the DL specialization has case studies, which are important (I was in fact asked one of them as a question in an interview I gave); while cs231n's later lectures cover topics which are not a part of the specialization. Just skip a lecture if you think you know the content. cs231n is also updated regularly, and so it's final assignment is a must do. I would urge anyone following this to do all the optional parts in any assignment in the specialization or cs231n. I also made detailed lecture notes, and would ask you to do the same.

After these 3 courses, you should be sufficiently proficient in ML implementation to take on a substantial project, for e.g. through Kaggle. So the next path I propose is not mandatory. It did help me establish a solid background and also specialize in a niche (then).

Extra

I did two more courses from Stanford Online on YouTube, the first one is cs229 (fall 2017 version, taught by Andrew Ng). This is a huge superset of the ML specialization, and is really math heavy, especially the assignments. I am taking (or took) 10-715 at CMU (the hardest intro. to ML course there) and this course has significant overlap. This truly created a very strong foundation, and I would totally recommend this to anyone who loves math and wants to get into theory research. Implementation wise this course is much easier, the coding assignments were not as hard as cs231n. Before any interview, I make sure to refer back to my notes from cs231n and cs229 for a quick revision.

Next, I wanted to specialize after these much more "general" courses, and picked graphs because it seemed complicated and fun. So I did cs224w (fall 2021 version, which is on YouTube). I think this course is easier compared to cs231n or cs229, however, only do this course if you would like to (or think to) specialize in this area. Otherwise, Stanford's cs224n (NLP with DL) or other courses according to your future niche would be better.

Capstone Project

If you followed through with the courses section, you probably know as much ML any MLE would know in theory, and now it's more about applying it. Kaggle Competitions or a substantial research oriented project is really important, otherwise most of the skills you just developed will take a backseat, as real-world projects are much more about application than discovery. In my very biased opinion, I would once again point you to any SAiDL assignment question, solving even one end-to-end would be pretty good! We put a lot of effort into these questions.

All of these components would really make you ready for a real-world project, either through research or an internship. If done properly, all of this can be finished in about 6 months (this is also how long it took me), or less.

Caveats

There are some caveats to all this, while I do believe ML roles are more fun, most big MLE roles or even internships need an MS degree or equivalent experience. Research Scientist roles are even more hard to break into and need a PhD (or equivalent experience) usually. Talking about India specifically, most MLE roles are in Indian startups, though a few are opening up, e.g. in DeepMind's MLO team led by Prateek Jain, I think even Meta is opening an office in Bangalore. While these may increase, as I pointed out, these will be insanely competitive and will get harder to get into. I do think that some quant firms take ML interns in their Bachelor's, but these are also not easy. On a positive note, there are a lot of opportunities outside of India, and Startups (Indian or otherwise) are doing fantastic work and those roles are much more accessible.

The situation for an academic is vastly different though, if you plan on joining academia, an ML faculty position would be as hard / as easy as a CS faculty position, depending on the institute. I do not have any strong opinions on this part.