Interesting article to get a bit more knowledge about the field. I went quickly trough some of the books cited and I have the same feeling that they’re not very practical. Also I didn’t find many practical books about LLVM either.
I would like to read in the future about what is the usual day of a compiler engineer, what you usually do, what are the most enjoyable and annoying tasks.
This is a personal puff piece. Her accomplishments are impressive and well deserved, but she needn't use the title of 'Becoming a Compiler Engineer' as an attack vector to get people interested in understanding how to write a compiler to read her greatest hits of her early to mid 20s.
The way to become a compiler engineer by definition is to try and write a compiler , for which the best course of action is to focus on learning how tokenizing, ast building, typechecking, and various intermediate representations work.
I don't pretend to know what's the best tutorial for this, but I think this is a fairly good one:
This is for LLVM, but I think doing basic codegen from generic SSA is not a huge leap from this point if one wants to build an entire compiler from scratch.
It's definitely a pretty small world, and to make things worse there are sub-niches -- between which there's certainly cross-pollination, but that's still a barrier to people looking to change jobs. Frontend language semantics (where most PL papers focus) vs. middle-and-back end optimization and hardware support; AoT compilers vs. JITs; CPU targets vs. a blossoming array of accelerators, etc.
Beyond that, I've definitely interviewed people who seemed like they could have been smart + capable but who couldn't cut it when it came to systems programming questions. Even senior developers often struggle with things like memory layouts and hardware behavior.
> I'm a bit shocked that it would take significant effort/creativity for an MIT grad with relevant course/project work to get a job in the niche
That bit was heartbreaking to me too. I knew the economy was bad for new grads but if a double major from MIT in SF is struggling, then the economy is cooked.
It's a bit sad seeing how much focus there is on using courses and books to learn about compilers.
> I’m not involved in any open-source projects, but they seem like a fantastic way of learning more about this field and also meeting people with shared interests. I did look into Carbon and Mojo but didn’t end up making contributions.
This sounds like the best way to learn and get involved with compilers, but something that's always been a barrier for me is just getting started in open source. Practical experience is far more valuable than taking classes, especially when you really need to know what you're doing for a real project versus following along directions in a class. Open source projects aren't usually designed to make it easy for anyone to contribute with the learning curve.
> So how the hell does anybody get a job?
> This is general advice for non-compilers people, too: Be resourceful and stand out. Get involved in open-source communities, leverage social media, make use of your university resources if you are still in school (even if that means starting a club that nobody attends, at least that demonstrates you’re trying). Meet people. There are reading groups (my friend Eric runs a systems group in NYC; I used to go all the time when it was held in Cambridge). I was seriously considering starting a compilers YouTube channel even though I’m awkward in front of the camera.
There's a lot of advice and a lot of different ways to try to find a job, but if I were to take away anything from this, it's that the best way is to do a lot of different meaningful things. Applying to a lot of jobs or doing a lot of interview prep isn't very meaningful, whereas the most meaningful things have value in itself and often aren't oriented towards finding a job. You may find a job sooner if you prioritize looking for a job, similar to how you may get better grades by cramming for a test in school, but you'll probably get better outcomes by optimizing for the long term in life and taking a short term loss.
Made an account to say thank you for sharing this post (and to Rona Wang for writing it)! I stumbled into having an interview for a Compiler Engineer position coming up and I wasn't sure how to prepare for it (the fact that I got this interview just goes to show how little people really know about Compilers if they're willing to take a chance on a normal C++ dev like me hah) and I had absolutely NO idea where to even begin (I was just working through Crafting Interpreters[1] that I picked up at the end of my contractorship last week but that's to make an Interpreter, not to make a Compiler)
...And honestly it seems that I'm screwed. And I need about 6 months of study to learn all this stuff. What I'd do right now is finish Crafting Interpreters, then grab that other book on Interpreters that was recommended here recently[2] and written in Go because I remember it had a followup book on Compilers, and THEN start going through the technical stuff that Rona suggested in the article.
And my interview is on Monday so that's not happening. I have other more general interviews that should pay better so I'm not too upset. If only I wasn't too lazy during my last position and kept learning while working. If the stars align and somehow I get that Compiler Engineer position, then I will certainly reach out to Rona and thank you again lalitkale for sharing this post with HN!
In my dabbling with compilers I’ve found Andrew Appel’s books [0] to be invaluable for understanding backend (after parsing) compiler algorithms. It’s a bit dated but covers SSA and other still-relevant optimizations and is pretty readable.
There are three versions (C, ML, and Java). The language isn’t all that important; the algorithms are described in pseudo-code.
I also find the traditional Dragon Book to be somewhat helpful, but you can mostly skip the parsing/frontend sections.
Great article. Here is a very simple test that I use to find very cracked compiler engineers on this site.
Just search for either of the words "Triton", "CUDA", "JAX", "SGLang" and "LLVM" (Not LLM) and it filters almost everyone out on "Who wants to be Hired' with 1 or 2 results.
Where as if you search "Javascript", 200+ results.
This tells me that there is little to no interest in compiler engineering here (and especially in startups) unless you are at a big tech company or at one of the biggest AI companies that use these technologies.
Of course, the barrier is meant to be high. but if a recruiter has to sift through 200+ CVs a page of a certain technology (Javascript), then your chances of getting selected against the competition for a single job is vanishingly small.
I said this before and it works all the time, for compilers; open-source contributions to production-grade compiler projects with links to commits is the most staightforward differentiator and proof one can use to stand out against the rest.
Not many companies are willing to maintain a compiler... but LLMs will change that. An LLM can find bugs in the code if the "compiler guru" is out on vacation that day. And yes, you will still need a "compiler guru" who will use the LLM but do so at a much higher level.
I'm desperately looking forward to, like, 5-10 years from now when all the "LLMs are going to change everything!!1!" comments have all but completely abated (not unlike the blockchain stuff of ~10 years ago).
No, LLMs are not going to replace compiler engineers. Compilers are probably one of the least likely areas to profit from extensive LLM usage in the way that you are thinking, because they are principally concerned with correctness, and LLMs cannot reason about whether something is correct — they only can predict whether their training data would be likely to claim that it is correct.
Additionally, each compiler differs significantly in the minute details. I simply wouldn't trust the output of an LLM to be correct, and the time wasted on determining whether it's correct is just not worth it.
Stop eating pre-chewed food. Think for yourself, and write your own code.
I bet you could use LLMs to turn stupid comments about LLMs into insightful comments that people want to read. I wonder if there’s a startup working on that?
LLMs (or LLM assisted coding), if successful, will more likely make the number of compilers go down, as LLMs are better with mainstream languages compared to niche ones. Same effect as with frameworks. Less languages, less compilers needed.
First, LLMs should be happy to use made up languages described in a couple thousand tokens without issues (you just have to have a good llm-friendly description, some examples). That and having a compiler it can iterate with / get feedback from.
Second, LLMs heavily reduce ecosystem advantage. Before LLMs, presence of libraries for common use cases to save myself time was one of the main deciding factors for language choice.
Now? The LLM will be happy to implement any utility / api client library I want given the API I want. May even be more thoroughly tested than the average open-source library.
Have you tried having an LLM write significant amounts of, say, F#? Real language, lots of documentation, definitely in the pre-training corpus, but I've never had much luck with even mid sized problems in languages like it -- ones where today's models absolutely wipe the floor in JavaScript or Python.
I’m doing Zig and it’s fine, though not significant amounts yet. I just had to have it synthesize the latest release changelog (0.15) into a short summary.
To be clear, I mean specifically using Claude Code, with preloaded sample context and giving it the ability to call the compiler and iterate on it.
I’m sure one-shot results (like asking Claude via the web UI and verifying after one iteration) will go much worse. But if it has the compiler available and writes tests, shouldn’t be an issue. It’s possible it causes 2-3 more back and forths with the compiler, but that’s an extra couple minutes, tops.
In general, even if working with Go (what I usually do), I will start each Claude Code session with tens of thousands of tokens of context from the code base, so it follows the (somewhat peculiar) existing code style / patterns, and understands what’s where.
Even best in class LLMs like GPT5 or Sonnet 4.5 do noticeably worse in languages like C# which are pretty mainstream, but not on the level of Typescript and Python - to the degree that I don't think they are reliably able to output production level code without a crazy level of oversight.
And this is for generic backend stuff, like a CRUD server with a Rest API, the same thing with an Express/Node backend works no trouble.
See, I'm coming from the understanding that language development is a dead-end in the real world. Can you name a single language made after Zig or Rust? And even those languages haven't taken over much of the professional world. So when I say companies will maintain compilers, I mean DSLs (like starlark or RSpec), application-specific languages (like CUDA), variations on existing languages (maybe C++ with some in-house rules baked in), and customer-facing config languages for advanced systems and SaaS applications.
Interesting article to get a bit more knowledge about the field. I went quickly trough some of the books cited and I have the same feeling that they’re not very practical. Also I didn’t find many practical books about LLVM either.
I would like to read in the future about what is the usual day of a compiler engineer, what you usually do, what are the most enjoyable and annoying tasks.
This is a personal puff piece. Her accomplishments are impressive and well deserved, but she needn't use the title of 'Becoming a Compiler Engineer' as an attack vector to get people interested in understanding how to write a compiler to read her greatest hits of her early to mid 20s.
The way to become a compiler engineer by definition is to try and write a compiler , for which the best course of action is to focus on learning how tokenizing, ast building, typechecking, and various intermediate representations work.
I don't pretend to know what's the best tutorial for this, but I think this is a fairly good one:
https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/index...
This is for LLVM, but I think doing basic codegen from generic SSA is not a huge leap from this point if one wants to build an entire compiler from scratch.
Tangential but since she mentions her book, "You Had Me At Hello World", is the cutest title for a nerd romance novel that I can imagine.
I'm thinking "et tu btrfs?"
Very interesting and informative!
I'm a bit shocked that it would take significant effort/creativity for an MIT grad with relevant course/project work to get a job in the niche
I would have thought the recruiting pipeline is kinda smooth
Although maybe it's a smaller niche than I think -- I imagine compiler engineers skew more senior. Maybe it's not a common first or second job
I graduated at the bottom of bear market (2001), and it was hard to get a job. But this seems a bit different
It's definitely a pretty small world, and to make things worse there are sub-niches -- between which there's certainly cross-pollination, but that's still a barrier to people looking to change jobs. Frontend language semantics (where most PL papers focus) vs. middle-and-back end optimization and hardware support; AoT compilers vs. JITs; CPU targets vs. a blossoming array of accelerators, etc.
Beyond that, I've definitely interviewed people who seemed like they could have been smart + capable but who couldn't cut it when it came to systems programming questions. Even senior developers often struggle with things like memory layouts and hardware behavior.
> I'm a bit shocked that it would take significant effort/creativity for an MIT grad with relevant course/project work to get a job in the niche
That bit was heartbreaking to me too. I knew the economy was bad for new grads but if a double major from MIT in SF is struggling, then the economy is cooked.
Most (all?) of compiler engineering jobs I've seen were about writing glue code for LLVM.
All the ones I've had, and most of the ones I've seen, we for bespoke compilers and toolchains for new HW / specific languages
I'm almost more interested in how a 20-something with no apparent prior pedigree lands a Simon and Schuster debut novel contract!
Step one: no engineering education, just get a job that a company calls engineering.
>In 2023, I graduated from MIT with a double major in math and computer science.
It's a bit sad seeing how much focus there is on using courses and books to learn about compilers.
> I’m not involved in any open-source projects, but they seem like a fantastic way of learning more about this field and also meeting people with shared interests. I did look into Carbon and Mojo but didn’t end up making contributions.
This sounds like the best way to learn and get involved with compilers, but something that's always been a barrier for me is just getting started in open source. Practical experience is far more valuable than taking classes, especially when you really need to know what you're doing for a real project versus following along directions in a class. Open source projects aren't usually designed to make it easy for anyone to contribute with the learning curve.
> So how the hell does anybody get a job?
> This is general advice for non-compilers people, too: Be resourceful and stand out. Get involved in open-source communities, leverage social media, make use of your university resources if you are still in school (even if that means starting a club that nobody attends, at least that demonstrates you’re trying). Meet people. There are reading groups (my friend Eric runs a systems group in NYC; I used to go all the time when it was held in Cambridge). I was seriously considering starting a compilers YouTube channel even though I’m awkward in front of the camera.
There's a lot of advice and a lot of different ways to try to find a job, but if I were to take away anything from this, it's that the best way is to do a lot of different meaningful things. Applying to a lot of jobs or doing a lot of interview prep isn't very meaningful, whereas the most meaningful things have value in itself and often aren't oriented towards finding a job. You may find a job sooner if you prioritize looking for a job, similar to how you may get better grades by cramming for a test in school, but you'll probably get better outcomes by optimizing for the long term in life and taking a short term loss.
Made an account to say thank you for sharing this post (and to Rona Wang for writing it)! I stumbled into having an interview for a Compiler Engineer position coming up and I wasn't sure how to prepare for it (the fact that I got this interview just goes to show how little people really know about Compilers if they're willing to take a chance on a normal C++ dev like me hah) and I had absolutely NO idea where to even begin (I was just working through Crafting Interpreters[1] that I picked up at the end of my contractorship last week but that's to make an Interpreter, not to make a Compiler)
...And honestly it seems that I'm screwed. And I need about 6 months of study to learn all this stuff. What I'd do right now is finish Crafting Interpreters, then grab that other book on Interpreters that was recommended here recently[2] and written in Go because I remember it had a followup book on Compilers, and THEN start going through the technical stuff that Rona suggested in the article.
And my interview is on Monday so that's not happening. I have other more general interviews that should pay better so I'm not too upset. If only I wasn't too lazy during my last position and kept learning while working. If the stars align and somehow I get that Compiler Engineer position, then I will certainly reach out to Rona and thank you again lalitkale for sharing this post with HN!
[1] https://craftinginterpreters.com/
[2] https://interpreterbook.com/
In my dabbling with compilers I’ve found Andrew Appel’s books [0] to be invaluable for understanding backend (after parsing) compiler algorithms. It’s a bit dated but covers SSA and other still-relevant optimizations and is pretty readable.
There are three versions (C, ML, and Java). The language isn’t all that important; the algorithms are described in pseudo-code.
I also find the traditional Dragon Book to be somewhat helpful, but you can mostly skip the parsing/frontend sections.
[0] https://www.cs.princeton.edu/~appel/modern/java/
Great article. Here is a very simple test that I use to find very cracked compiler engineers on this site.
Just search for either of the words "Triton", "CUDA", "JAX", "SGLang" and "LLVM" (Not LLM) and it filters almost everyone out on "Who wants to be Hired' with 1 or 2 results.
Where as if you search "Javascript", 200+ results.
This tells me that there is little to no interest in compiler engineering here (and especially in startups) unless you are at a big tech company or at one of the biggest AI companies that use these technologies.
Of course, the barrier is meant to be high. but if a recruiter has to sift through 200+ CVs a page of a certain technology (Javascript), then your chances of getting selected against the competition for a single job is vanishingly small.
I said this before and it works all the time, for compilers; open-source contributions to production-grade compiler projects with links to commits is the most staightforward differentiator and proof one can use to stand out against the rest.
Not many companies are willing to maintain a compiler... but LLMs will change that. An LLM can find bugs in the code if the "compiler guru" is out on vacation that day. And yes, you will still need a "compiler guru" who will use the LLM but do so at a much higher level.
I'm desperately looking forward to, like, 5-10 years from now when all the "LLMs are going to change everything!!1!" comments have all but completely abated (not unlike the blockchain stuff of ~10 years ago).
No, LLMs are not going to replace compiler engineers. Compilers are probably one of the least likely areas to profit from extensive LLM usage in the way that you are thinking, because they are principally concerned with correctness, and LLMs cannot reason about whether something is correct — they only can predict whether their training data would be likely to claim that it is correct.
Additionally, each compiler differs significantly in the minute details. I simply wouldn't trust the output of an LLM to be correct, and the time wasted on determining whether it's correct is just not worth it.
Stop eating pre-chewed food. Think for yourself, and write your own code.
I bet you could use LLMs to turn stupid comments about LLMs into insightful comments that people want to read. I wonder if there’s a startup working on that?
I'm screenshotting this, let's see who's right.
Actually, your whole point about LLMs not being able to detect correctness is just demonstrably false if you play around with LLM agents a bit.
LLMs (or LLM assisted coding), if successful, will more likely make the number of compilers go down, as LLMs are better with mainstream languages compared to niche ones. Same effect as with frameworks. Less languages, less compilers needed.
I mostly disagree.
First, LLMs should be happy to use made up languages described in a couple thousand tokens without issues (you just have to have a good llm-friendly description, some examples). That and having a compiler it can iterate with / get feedback from.
Second, LLMs heavily reduce ecosystem advantage. Before LLMs, presence of libraries for common use cases to save myself time was one of the main deciding factors for language choice.
Now? The LLM will be happy to implement any utility / api client library I want given the API I want. May even be more thoroughly tested than the average open-source library.
Have you tried having an LLM write significant amounts of, say, F#? Real language, lots of documentation, definitely in the pre-training corpus, but I've never had much luck with even mid sized problems in languages like it -- ones where today's models absolutely wipe the floor in JavaScript or Python.
I’m doing Zig and it’s fine, though not significant amounts yet. I just had to have it synthesize the latest release changelog (0.15) into a short summary.
To be clear, I mean specifically using Claude Code, with preloaded sample context and giving it the ability to call the compiler and iterate on it.
I’m sure one-shot results (like asking Claude via the web UI and verifying after one iteration) will go much worse. But if it has the compiler available and writes tests, shouldn’t be an issue. It’s possible it causes 2-3 more back and forths with the compiler, but that’s an extra couple minutes, tops.
In general, even if working with Go (what I usually do), I will start each Claude Code session with tens of thousands of tokens of context from the code base, so it follows the (somewhat peculiar) existing code style / patterns, and understands what’s where.
Even best in class LLMs like GPT5 or Sonnet 4.5 do noticeably worse in languages like C# which are pretty mainstream, but not on the level of Typescript and Python - to the degree that I don't think they are reliably able to output production level code without a crazy level of oversight.
And this is for generic backend stuff, like a CRUD server with a Rest API, the same thing with an Express/Node backend works no trouble.
Humans can barely untangle F# code..
See, I'm coming from the understanding that language development is a dead-end in the real world. Can you name a single language made after Zig or Rust? And even those languages haven't taken over much of the professional world. So when I say companies will maintain compilers, I mean DSLs (like starlark or RSpec), application-specific languages (like CUDA), variations on existing languages (maybe C++ with some in-house rules baked in), and customer-facing config languages for advanced systems and SaaS applications.