Ok, so implementing a simple MSAA X4 anti-aliasing was much simpler than I thought, fortunately. In the following picture you can see the difference between having anti-aliasing (left) and not having anti-aliasing (right).
In order to achieve this, you have to retrieve the different values of each sampling point, add them, and then average them. In this case, there are just 4 (four) different sample points per texel.
You can see here that we're adding all the samples of a texel and then divide the result by the amount of samples to get the average. Very easy.
The following big section, which I am really looking forward to, is about lighting. I don't know whether I should do something with everything I've learnt so far, so as to solidify all the concepts. Perhaps I will take a break from OpenGL altogether, who knows. In any case, I am glad I pushed myself to get this far :)