
A course by Andrej Karpathy on building neural networks, from scratch, in code.We start with the basics of backpropagation and build up to modern deep neural networks, like GPT. In my opinion language…
A course by Andrej Karpathy on building neural networks, from scratch, in code.
We start with the basics of backpropagation and build up to modern deep neural networks, like GPT. In my opinion language models are an excellent place to learn deep learning, even if your intention is to eventually go to other areas like computer vision because most of what you learn will be immediately transferable. This is why we dive into and focus on languade models.
Prerequisites: solid programming (Python), intro-level math (e.g. derivative, gaussian).
I’ve gone through this series of videos earlier this year.
In the past I’ve gone through many “educational resources” about deep neural networks - books, coursera courses (yeah, that one), a university class, the fastai course - but I don’t work with them at all in my day to day.
This series of videos was by far the best, most “intuition building”, highest signal-to-noise ratio, and least “annoying” content to get through. Could of course be that his way of teaching just clicks with me, but in general - very strong recommend. It’s the primary resource I now recommend when someone wants to get into lower level details of DNNs.
Karpathy has a great intuitive style, but sometimes it's too dumbed down. If you come from adjacent fields, it might be a bit dragging, but it's always entertaining
>Karpathy has a great intuitive style, but sometimes it's too dumbed down
As someone who has tried some teaching in the past, it's basically impossible to teach to an audience with a wide array of experience and knowledge. I think you need to define your intended audience as narrowly as possible, teach them, and just accept that more knowledgeable folk may be bored and less knowledgeable folk may be lost.
When I was an instructor for courses like "Intro to Programming", this was definitely the case. The students ranged from "have never programmed before" to "I've been writing games in my spare time", but because it was a prerequisite for other courses, they all had to do it.
Teaching the class was a pain in the ass! What seemed to work was to do the intro stuff, and periodically throw a bone to the smartasses. Once I had them on my side, it became smooth sailing.
I think this is where LLM-assisted education is going to shine.
An LLM is the perfect tool to fill the little gaps that you need to fill to understand that one explanation that's almost at your level, but not quite.
I like Karpathy, we come from the same lineage and I am very proud of him for what he's accomplished, he's a very impressive guy.
In regards to deep learning, building deep learning architecture is one of my greatest joys in finding insights from perceptual data. Right now, I'm working on spatiotemporal data modeling to build prediction systems for urban planning to improve public transportation systems. I build ML infrastructure too and plan to release an app that deploys the model in the wild within event streams of transit systems.
It took me a month to master the basics and I've spent a lot of time with online learning, with Deeplearning.ai and skills.google. Deeplearning.ai is ok, but I felt the concepts a bit dated. The ML path at skills.google is excellent and gives a practical understanding of ML infrastructure, optimization and how to work with gpus and tpus (15x faster than gpus).
But the best source of learning for me personally and makes me a confident practitioner is the book by Francois Chollet, the creator of Keras. His book, "Deep Learning with Python", really removed any ambiguity I've had about deep learning and AI in general. Francois is extremely generous in how he explains how deep learning works, over the backdrop of 70 years of deep learning research. Francois keeps it updated and the third revision was made in September 2025 - its available online for free if you don't want to pay for it. He gives you the recipe for building a GPT and Diffusion models, but starts from the ground floor basics of tensor operations and computation graphs. I would go through it again from start to finish, it is so well written and enjoyable to follow.
The most important lesson he discusses is that "Deep learning is more of an art than a science". To get something working takes a good amount of practice and the results on how things work can't always be explained.
He includes notebooks with detailed code examples with Tensorflow, Pytorch and Jax as back ends.
Deep learning is a great skill to have. After reading this book, I can recreate scientific abstracts and deploy the models into production systems. I am very grateful to have these skills and I encourage anyone with deep curiosity like me to go all in on deep learning.
The project you mentioned you are working sounds interesting. Do you have more to share ?
I’m curious how ML/AI is leveraged in the domain of public transport. And what can it offer when compared to agent based models.
The project I’m working on emulates a scientific abstract. I’m not a scientist by any means, but am adapting an abstract to the public transit system in NYC. I will publish the project on my website when it’s done. I think it’s a few weeks away. I built the dataset, now doing experimental model training. If I can get acceptable accuracy, I will deploy in a production system and build a UI.
Here is a scientific abstract that inspired my to start building this system. -> https://arxiv.org/html/2510.03121
I am unfamiliar with agent based models, sorry I can’t offer any personal insight there, but I ran your question through Gemini and here is the AI response:
Based on the scientific abstract of the paper *"Real Time Headway Predictions in Urban Rail Systems and Implications for Service Control: A Deep Learning Approach"* (arXiv:2510.03121), agent-based models (ABMs) and deep learning (DL) approaches compare as follows:
### 1. Computational Efficiency and Real-Time Application
* *Deep Learning (DL):* The paper proposes a *ConvLSTM* (Convolutional Long Short-Term Memory) framework designed for high computational efficiency. It is specifically intended to provide real-time predictions, enabling dispatchers to evaluate operational decisions instantly. * *Agent-Based Models (ABM):* While the paper does not use ABMs, it contrasts its DL approach with traditional *"computationally intensive simulations"*—a category that includes microscopic agent-based models. ABMs often require significant processing time to simulate individual train and passenger interactions, making them less suitable for immediate, real-time dispatching decisions during operations.
### 2. Modeling Methodology
* *Deep Learning (DL):* The approach is *data-driven*, learning spatiotemporal patterns and the propagation of train headways from historical datasets. It captures spatial dependencies (between stations) and temporal evolution (over time) through convolutional filters and memory states without needing explicit rules for train behavior. * *Agent-Based Models (ABM):* These are typically *rule-based and bottom-up*, modeling the movement of each train "agent" based on signaling rules, spacing, and train-following logic. While highly detailed, they require precise calibration of individual agent parameters.
### 3. Handling Operational Control
* *Deep Learning (DL):* A key innovation in this paper is the direct integration of *target terminal headways* (dispatcher decisions) as inputs. This allows the model to predict the downstream impacts of a specific control action (like holding a train) by processing it as a data feature. * *Agent-Based Models (ABM):* To evaluate a dispatcher's decision in an ABM, the entire simulation must typically be re-run with new parameters for the affected agents, which is time-consuming and difficult to scale across an entire metro line in real-time.
### 4. Use Case Scenarios
* *Deep Learning (DL):* Optimized for *proactive operational control* and real-time decision-making. It is most effective when large amounts of historical tracking data are available to train the spatiotemporal relationships. * *Agent-Based Models (ABM):* Often preferred for *off-line evaluation* of complex infrastructure changes, bottleneck mitigation strategies, or microscopic safety analyses where the "why" behind individual train behavior is more important than prediction speed.
I have lots of non-AI software experience but nothing with AI (apart from using LLMs like everyone else). Also I did an introductory university course in AI 20 years ago that I’ve completely forgotten.
Where do I get to if I go through this material?
Enough to build… what? Or contribute on… ? Enough knowledge to have useful conversations on …? Enough knowledge to understand where to … is useful and why?
Where are the limits, what is it that the AI researchers have that this wouldn’t give?
Strange question. If you don’t know why you need this, you probably don’t. It will be the same as with the introductory AI course you did 20 years ago.
Well, no ... For a start any "AI" course 20 years ago probably wouldn't have even mentioned neural nets, and certainly not as a mainstream technique.
A 20yr old "AI" curriculum would have looked more like the 3rd edition of Russel & Norvig's "Artificial Intelligence - A Modern Approach".
https://github.com/yanshengjia/ml-road/blob/master/resources...
Karpathy's videos aren't an AI (except in modern sense of AI=LLMs) course, or a machine learning course, or even a neural network course for that matter (despite the title) - it's really just "From Zero to LLMs".
Neural nets were taught in my Uni in the late 90s. They were presented as the AI technique, which was however computationally infeasible at the time. Moreover, it was clearly stated that all supporting ideas were developed and researched 20 years prior, and the field was basically stagnated due to hardware not being there.
I remember reading "neural network" articles back from late 80's, early 90's, which weren't just about ANNs, but also other connectionist approaches like Kohonen's Self-Organizing Maps and Stephen Grossberg's Adaptive Resonance Theory (ART) ... I don't know how your university taught it, but back then this seemed more futuristic brain-related stuff, not a practical "AI" technique.
My introductory course used that exact textbook and I still have it on my shelf :).
It has a chapter or two on NNs and even mentions back propagation in the index, but the majority of the book focuses elsewhere.
Anyone who watches the videos and follows along will indeed come up to speed on the basics of neural nets, at least with respect to MLPs. It's an excellent introduction.
Sure, the basics of neural nets, but it seems just as a foundation leading to LLMs. He doesn't cover the zoo of ANN architectures such as ResNets, RNNs, LSTMs, GANs, diffusion models, etc, and barely touches on regularization, optimization, etc, other than mentioning BatchNorm and promising ADAM in a later video.
It's a useful series of videos no doubt, but his goal is to strip things down to basics and show how an ANN like a Transformer can be built from the ground up without using all the tools/libraries that would actually be used in practice.
I think they meant the result— not the content—would be the same.