Vibe coding leaves me with a very sour taste

Vibe coding leaves me with a very sour taste

16 August, 2025 48 min read
DiskSpace, software, programming, Elixir, package, release, NIF, GenAI, LLMs, Rust

Top-tier LLMs, Rust and Erlang NIFs; nifty, and night and day vs. C, but let me tell you about vibe coding…

After I submitted my blog post to Hacker News on using Grok 3 to generate C for an Erlang NIF with the help of code reviews by GPT-5 and Gemini 2.5 Flash, I received some interesting comments. Let’s review them before we dive deep into why vibe coding leaves me with a very sour taste, and why I will never engage in vibe coding again, if I can avoid it.

It was all unnecessary

“So all this arose because you didn’t read the docs and note that get_disk_info/1 immediately fetches the data when called? The every-30-minutes-by-default checks are for generating “disk usage is high” event conditions.”juped

As it turns out, that’s correct. In my sleep-deprived state of mind, I only read the documentation of disksup and didn’t actually read the documentation of :disksup.get_disk_info/1 . It clearly states that the data the function returns are immediate, i.e. it returns fresh data on the state of the disk space there and then. It has nothing to do with the periodic interval configuration of disksup.

This means that DiskSpace is actually superfluous, when you can use something already included in Erlang/OTP. Plus, it now requires having rustc (and cargo) installed to build this dependency, and that’s not something that my readers should need to worry about. Perhaps I should at some point look into precompiling, if that’s possible with Rustler.

Oh well. This has been an experiment, anyway.

I should have used Claude Code (?)

“This was built copy pasting results from chats? Not using an ide or cli like Claude Code or Amp? Why such a manual process. This isn’t 2023…”wordofx

I hear you. All the cool kids are using Claude Code or IDE extensions to have LLMs interact directly with the code, and/or to let LLMs run amok on a codebase, and apparently also on their PC.

First of all, would Claude Code be better than Grok 4 at writing C code, and/or better than Gemini 1.5 Flash and GPT-5 at reviewing C code and Makefiles?

Maybe; I don’t know, because I’ve never used Claude Code. As far as I know, it’s a paid-only service and even though there are ways to get started with some credit, I shudder at the idea of handling the reins to an LLM for coding. Even if it’s “the best”, how much better can it be, given that (so far) none of the top-tier LLMs really stand out against each other? What difference would it make, given that the C code that was generated was pretty terrible? More on that later.

Secondly, the later iterations of “The Process” described in yesterday’s post included feeding Grok, Gemini and GPT the errors of the automated build-and-test attempts on GitHub Actions, so that I could get the C source to compile on macOS and Windows too, besides on Debian. How would Claude Code have helped with that? In that case a human (me) in the loop was necessary to shovel errors from the build logs to the LLMs and have them generate recommendations that I then shoveled back to Grok 3 for code fixes. Most of those “fixes” actually made things worse, until they didn’t.

Lastly, the manual process is tiresome and annoying after the first 5-6 iterations (more on that below). It does have one upside though: you get to see how the sausage is (badly) made. Otherwise, hey, just YOLO it and put your trust in GenAI completely–if, of course, you find a way (that may exist, I don’t care enough to look into it) to also automate the “shoveling” of build logs from GitHub Actions to the LLMs. By the moment you get such a complex setup working, it might have been a better investment of your time to learn you some Rust.

I could/should have used better LLMs

“It’s interesting why the author used weaker models (like Grok 3 when 4 is available, and Gemini 2.5 Flash when Pro is), since the difference in coding quality between these models is significant, and results could be much better.”SweetSoftPillow

Yeah, it’s true. I pay for SuperGrok, but given how slow Grok 4 used to be right after it launched, I had set Grok 3 as the default model. I also didn’t want to exhaust the (daily?) free allotment of Gemini 2.5 Pro tokens for that exercise. Plus, I ran out of free GPT-5 tokens shortly after I reached the state of v0.4.0 that compiled cleanly (memory leaks included) across the target OSs and Elixir/Erlang version combinations.

This comment nagged me though; could I have really gotten that much better code from Grok 4, and better code reviews from Gemini 2.5 Pro?

In the end, I found out that the answer is yes! More on that below.

C and LLMs are a bad match

“I would never ever let an LLM anywhere near C code. If you need help from LLM to write a NIF that performs basic C calls to the OS, you probably can’t check if it’s safe. I mean, it needs at least to pass valgrind.”drumnerd

Here I also have to concur. That was one of the earliest comments on my Hacker News submission and made me try out some tools to analyze the disk_space.c file of version 0.4.0 . I used splint , and the results were not pretty. Examples:

disk_space.c:111:14: Operands of == have incompatible types (unsigned char,
                        int): (data[i] & 0xE0) == 0xC0
disk_space.c:144:16: Only storage bin.ref_bin (type void *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:148:16: Only storage bin.data (type unsigned char *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:148:16: Only storage bin.ref_bin (type void *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:152:11: Null storage returned as non-null: NULL
disk_space.c:152:16: Only storage bin.data (type unsigned char *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:152:16: Only storage bin.ref_bin (type void *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:156:15: Only storage bin.data (type unsigned char *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:156:15: Only storage bin.ref_bin (type void *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:160:7: Operand of ! is non-boolean (int): !enif_is_list(env, term)
disk_space.c:161:10: Null storage returned as non-null: NULL
disk_space.c:161:15: Only storage bin.data (type unsigned char *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:161:15: Only storage bin.ref_bin (type void *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:166:10: Null storage returned as non-null: NULL
disk_space.c:166:15: Only storage bin.data (type unsigned char *) derived from
    variable declared in this scope is not released (memory leak)
disk_space.c:166:15: Only storage bin.ref_bin (type void *) derived from
    variable declared in this scope is not released (memory leak)

Plenty of memory leaks. Ouch. That made the next comment have more “bite” to it. I started thinking that releasing this had been a bad idea, a negligent action.

It didn’t actually work

"‘it mostly worked’ is just a more nuanced way of saying ‘it didn’t work’. Apparently the author did eventually get something working, but it is false to say that the LLMs produced a working project."flax

What is the definition of “a working project”? I cannot know what the poster meant by that. It’s obviously working, i.e. from within Elixir I get the map I want, with the data I intended to get.

When I added “it mostly worked” to the title, my intended meaning was:

  • “yeah it worked, but I had to do a lot of hand-holding” (in terms of using LLMs to generate C code), and
  • “it passes the tests but I cannot tell if the code has memory leaks” (in terms of the generated code quality).

But clearly, it didn’t actually work, when you consider that the generated C code does more than I wanted it to: memory leaks.

That made me pretty sure that releasing vibe-coded C to the wild had been a terrible idea–especially given that the whole exercise had been unnecessary to begin with.

I should have used Rust or Zig

“Why C instead of Rust or Zig? Rustler and Zigler exist. I feel like a Vibecoded NIF in C is the absolute last thing I would want to expose the BEAM to.”weatherlight

Well, I had to start somewhere, and that was the familiar Exqlite and its use of elixir_make and a Makefile to compile SQLite’s amalgamated C source. I also never programmed in Rust or Zig before, though I had been reading the Tigerbeetle articles as part of my (abandoned) tbapi REST API for it in Go many months ago, and I was seriously impressed by Zig.

As for Rust, I only know it as a (in-)famously difficult programming language with a very opinionated fanbase. However, since Rust is now part of the Linux kernel , it must have some merit.

Yeah yeah, OK “rustaceans”… I have heard all about its memory safety (if you use it correctly), so don’t come at me with your rusty pincers… I’m actually quite impressed by the ease of building cross-platform binaries. So much so that I might dabble in Rust–if I ever decide to look after my beloved Elixir (which I doubt)!

Grok 4 code, aided by GPT-5 and Gemini 2.5 Pro code and error reviews

Given all of the above, I thought I’d dedicate at most half a day on the next bout of this experiment: I’d have Grok 4 convert the C source into a lib.rs, then use another days’s free Gemini 2.5 Pro and GPT-5 tokens to review Grok 4’s code, supplemented by any (many) errors of the build-and-test workflow from GitHub Actions.

And that’s what I did. Grok 4 was capable of converting the C code into Rust code. As for whether the NIF would compile and work for the disk_space Elixir project: that was the next piece of the puzzle.

Figuring out Rustler

I looked online to see how to use Rustler to compile and load the NIF into the DiskSpace Elixir module. There is a lot of outdated information on various articles and blogs about using Rustler, in terms of what you need to add to your Elixir project’s mix.exs. In particular, the line indicated below appears in many articles:

  def project do
  [app: :my_app,
  version: "0.1.0",
  compilers: [:rustler] ++ Mix.compilers, # this one
  rustler_crates: rustler_crates(),
  deps: deps()]
  end

  defp rustler_crates do
    [io: [
      path: "native/io",
      mode: (if Mix.env == :prod, do: :release, else: :debug),
    ]]
  end

This is how it used to work in older releases of Rustler. In the latest one, I was getting errors about a missing compile.rustler Mix task.

I looked into the documentation of the installed Rustler dependency’s version and saw that it’s much simpler nowadays. The standard example of NifIo also helped.

I invoked mix rustler.new and then had to adapt the native/Cargo.toml file to also make lib.rs compile on Windows. Grok 4 did that in a single shot, but whether it actually worked (both the Rust code and the Cargo.toml) was still undetermined.

Right first time–but not on Windows

mix compile worked the first time around on Debian, with no compilation warnings even! mix test proved that the NIF also worked from within Elixir. Impressive!

The situation with macOS and Windows was still unclear. I had Grok 4 rewrite the build.yml file of the Build and Test workflow and I saw how much shorter and simpler it became, since it did not include installing Windows-specific dependencies anymore. One preparation step for compile-time dependencies for all three target OSs. Neat!

Pushing to the git repo triggered the workflow and the NIF compiled on macOS/arm64 and Ubuntu/amd64 for all combinations of Elixir and Erlang that the old C source supported. I was impressed once again.

I was not as impressed with the Windows build attempts.

Vibe coding once again

I started “The Process” again, this time only for the Windows-specific parts of lib.rs. The initial list of errors and warnings of the Windows builds was truly a sheet. Tons of issues, for both Elixir 1.17 and 1.18 (and the corresponding Erlang/OTP 26 and 27).

Grok 4 was capable of fixing the mistakes it had made in its first version, but after a while got stuck. It would fix issues with GetDiskFreeSpaceExW in the code, but forget to set the correct imports. It would introduce something called “HLOCAL”, then attempt to get it to work within the make_winapi_error_tuple function again and again. It would fix one thing and break another.

Again, I had Gemini (this time, 2.5 Pro) and GPT-5 (as the other day) review Grok 4’s Rust code and identify errors, without giving me code. I would shovel this feedback to Grok 4 by copying and pasting, ask it to implement these recommendations, push to GitHub, open all workflow tasks in separate tags (otherwise GitHub could not show me the logs after the fact, for unknown reasons), then copy/paste the errors back to Gemini and GPT or Grok (depending on how serious the errors seemed), ad nauseam.

And I mean “ad nauseam” quite literally. It was nauseatingly tiresome, and disillusionment almost made me give up a few times. You only need to look at the GitHub Actions log to see the many failed attempts. All that took a few hours until finally Grok 4 reached a point at which all builds completed successfully.

Vibe coding for the last mile(s)

On that “basecamp”, the Rust code would build and the Elixir package would pass the tests successfully across all Elixir versions from 1.14 to 1.18 (and with the corresponding latest-supported Erlang/OTP version). This was a major boon to the whole Rust port performed by those three top-tier LLMs in collaboration.

That basecamp still had issues that Gemini and GPT identified, so I continued the code reviews by the two LLMs, this time asking Gemini and GPT “could this be considered production-grade?” and prompting Grok to “ensure that any changes you make DO NOT IN ANY CASE break the macOS and Linux builds”. I might have called Grok an idiot in there a few times; not sure whether it made a difference in terms of the result.

On the way to what’s now version 1.0.0 of the DiskSpace Elixir package, I reached 4 more such “camps”, each of which was followed by a series of abject failures by Grok at fixing things without breaking other things, until the next “camp”. Thankfully, it mostly left the macOS and Linux builds unscathed throughout this process.

I had kinda-timeboxed this second experiment to a few hours so that I would return to writing the book (which will now use disksup anyway), so once GPT and Gemini reached a point of recommending improvements that they considered minor, I considered their job done.

By that time, I had realized something. Vibe coding increasingly felt like something I had experienced before and escaped from, and avoided forevermore.

I really dislike vibe coding

I had experienced something like vibe coding as a development engineer for a small part of 4 years back in 2008 and 2009. After I started working as a development engineer for turbocharger components in 2008, I very quickly grew disenchanted with the role. Here’s what the typical trial-end-error design process involved:

  1. Taking an existing design of a shaft, a turbine blade, a turbocharger casing, etc.
  2. Simulating it with CFD (Computational Fluid Dynamics) on Numeca FINE/Turbo and/or FEA (Finite Element Analysis) on Abaqus/CAE to understand how it performs (thermodynamic performance, thermal expansion and Low-Cycle Fatigue during the duty cycle, resonant frequencies and High-Cycle Fatigue, etc.)
  3. Spending time coming up with changes to design parameters.
  4. Adapting the 3D CAD drawing on CATIA V5 accordingly.
  5. Converting the design to geometry input files (STEP files and other formats).
  6. Preparing and meshing the simulation model (finite element/volume type choice, discretization and local refinements, load cases, boundary conditions, etc.)
  7. Sometimes, adapting the simulation setup (simulation parameters, etc.)
  8. Running the simulation locally on the workstation (if expected to be short) or submitting the simulation job to an HPC cluster.
  9. Waiting for anything from 15 minutes (for short FEA), to 2-4 hours (for more complex FEA) to 18 (!) hours (for high-fidelity CFD).
  10. Reviewing the simulation logs for any simulation-run issues (convergence behavior, etc.)
  11. If issues prevented the calculation from completing successfully, go back to step 7 (ideally) or as far back as step 3 (worst-case scenario for a variant).
  12. If no issues appeared with the simulation, investigating the resulting multi-gigabyte files in the FEA/CFD suite’s visualization software.
  13. Post-processing the files to generate various images and graphs.
  14. Evaluating performance against the best-performing design variant so far.
  15. Spending time coming up with changes to design parameters (go back to step 3), or go to step 16.
  16. Reviewing the entire process and the design and documenting it.

That was a slow-going process, with the human (me) entirely in the loop and completely at the mercy of the key bottlenecks, which were:

  • The manual preparation of input files with a lot of error-prone copy-pasting of text between Notepad++ (what a great piece of software!) and pre-processing software.
  • The manual preparation or tweaking of the simulation setup.
  • The manual review of simulation logs, performing minor spot adaptations, and re-running the simulation as needed.
  • The manual post-processing and evaluation of the simulation results.
  • Most importantly, the ungodly amount of waiting time to get simulation results that could be used for a verdict on whether the design was better than before.

In my first role as a development engineer I was not designing turbine blades, and so I hadn’t been subjected to the torturous waiting time of CFD simulations. A few months later I was promoted to an R&D engineer within the turbine design team. Within one week after this change of role, I had grown so utterly frustrated with the slow pace of the process that I told my supervisor that this cannot be how work gets done. He told me that this is what the work is (“it is what it is”, suck it up!), and that I will get faster as I learn more about turbine design.

He wasn’t wrong, but I was still pissed off. Surely, the content of this job could not have been something that I had studied so hard for! Surely, I was not doomed to eternally be a machine operator, shoveling data back and forth between text files and software tools. Does this not resemble vibe-coding-by-copy/paste?

Not quite vibe coding

That short chat with my supervisor happened on a Friday afternoon. As soon as I boarded the tram from Baden to Zurich, I took out a notebook and wrote down the requirements on a semi-automated process, and what kind of Python (Python 2 at the time) code I’d need in order to automate some parts of the process.

The idea was to automate the most boring and human-error-prone parts of the process. Also, to get faster overall by vetting some designs using insanely faster low-fidelity evaluations of many more design variants, before committing to waiting a double-digit amount of hours to get the “proper” results from expensive high-fidelity simulation. My aspiration was to be able to evaluate hundreds or thousands more designs in a semi-automated manner, and get good-enough results so that I would understand better what’s going on.

Also, by weeding out unpromising or even useless design variants early on in the process, I would avoid spending expensive simulation time on variants that even with simplified simulation physics had no chance of approaching the targeted spec improvements to begin with. By a process of elimination, the variants that would remain would then be worthy of higher-fidelity simulations for a more nuanced understanding on my part, and better results for the component and the overall product.

I entered the office at 7:00 on the following Monday. By 12:30 I had a working prototype of a Particle Swarm Optimization (PSO) loop in Python. This is a robust and expensive algorithm that was only a good match for the problem when used with rather inexpensive function evaluations. It was crappy, hastily and enthusiastically written Python 2 code that orchestrated a few other tools, such as:

  • writing input files for an in-house quasi-3D radial-turbine flow calculation program (in Fortran-77, originally developed in the early 1980s, perhaps even earlier) for mass flow, efficiency and Mach number along the blade’s profile at different meridional heights,
  • triggering multiple calculations in parallel with Python’s multiprocessing across the (IIRC) 4 or 8 cores of the Xeon workstation,
  • gathering results from each calculation and aggregating them, and then determining the best one from that generation, and
  • feeding the last generation’s data into the next iteration of the PSO loop.

My gets-the-job-done visualization was an Excel spreadsheet that periodically refreshed the geometry data and the results of the “global best” simulation thus far from a text file, and plotted them.

It worked so well! The results were great; I designed three new, high-performance “turbine trims” of a radial turbine in record time. This means taking an existing turbine design and trimming its meridional profile to get 10% less mass flow rate “steps” while maximizing the total-static thermodynamic efficiency, given that you cannot modify the actual blade shape, i.e. how it curves in 3D.

In fact, this had been such an eye-opener, that I spent the 3.5 years after that automating the heck out of everything I touched as an engineer.

I had found a happy place; programming, engineering, problem-solving, understanding, documenting, improving–day in and day out. I lived and breathed engineering optimization; so much so that a colleague started calling me “Mister Optimizer” in his characteristically endearing Swiss-German English accent.

Whatever I was tasked with designing/optimizing (radial turbine blades, turbine burst behavior, blade resonance modes against High-Cycle Fatigue, friction-welded transitions of turbocharger shafts to the turbine hub, threaded connections of the shaft to a compressor wheel, overall shaft mechanical loads, etc.), I would first set up lower-fidelity simulations, run parametric studies either locally or on the cluster, harvest the results and evaluate them in a semi-automated manner, select the best variants, etc.

Eventually, I reached a point where I was generating too much data to evaluate with heuristics, by eye or by using Excel, so I got into Machine Learning . I started building ANNs with libFANN, training them with early-stopping code I implemented in Python following a paper , building and querying regression models with Weka, clustering with k-means to identify “families” of designs, performing PCA (Principal Component Analysis) to find out the most impactful design parameters, executing Monte Carlo simulations with millions of data points for probabilistic design, and more.

This kind of work delivered great designs again and again, and even made it possible to identify hitherto-unseen design regions of multi-physics turbine-blade design; for example, one of the parametric designs revealed that some turbine designs have a largely undeterminate bursting mode–they might burst at the hub (then the entire turbine breaks into big pieces) or they might burst at the root of the turbine blade due to plastic deformation (like what can happens in airliner jet engines with axial turbofans). It brings me great joy to know that some of “my” turbines are still spinning in engines’ turbochargers worldwide.

Had I not engaged in this semi-automation of my work while always being in the loop for the most knowledge-enriching parts, I would have continued to be subject to the insanely slow speed of each iteration, severely limiting the rate at which I would be learning from the things I was developing.

To be frank, I would have eventually resigned and gone to do something more creative and intellectually demanding than shoveling data back and forth between pieces of software.

The equivalent of “true” vibe coding in the world of mechanical engineering

At the time, a colleague in a different team of the R&D department had been the flagbearer of pushing for deploying a “fully-automated” optimization suite by one of the largest simulation software vendors across all teams.

It all looked so enticing! You would set up the optimization case, select the optimization algorithm, then it was supposedly “pushbutton optimization” from there on. The optimization suite would orchestrate the simulation software to perform automatic meshing and simulation preparation, and then run numerous simulations consuming per-core licenses for hours or days. Coincidentally, this was very beneficially for the simulation software vendor selling licenses with a per-core subscription model.

You just had to lean back and enjoy the pretty graphs. Why spend precious budget on engineers engineering things, when you can let the computer do everything automatically?

Does this remind you of a current situation by any chance? Does this not reek of vibe coding with Claude Code and other “agentic” systems?

Well, I can tell you that this did not go very far, despite my colleague pushing it for at least as long as I was there.

It had all been a bunch of starry-eyed promises of ever-growing automation and efficiency, for better results. Interestingly, my semi-automated approach kept delivering great results, while the fully-automated approach never really managed to escape my colleague’s team; in fact, it never escaped my colleague, and my colleage most likely didn’t escape the approach for a long time. The semi-automated multi-fidelity with a human in the loop that continuously increases their competence was–oh Wunder!–not only consistently better-performing in terms of per-component performance, but also faster and cheaper overall. Fewer high-fidelity simulations means vastly fewer license-hours.

In the time span of 3.5 years I used the semi-automated approach to deliver a large number of designs that made it to the field. To my knowledge, te fully-automated software suit never escaped “pilot mode” as an investigation project.

To clarify: that was not due to my colleague’s competence; it was, I assume, due to the false promises of the software vendor that conveniently (for them) drove engineers to engage in all three wastes of Lean Product Development (courtesy of the late Dr. Allen Ward, may he RIP):

  1. Scatter
  2. Hand-off
  3. Wishful thinking

Especially that last one… the wishful thinking that the marketing of the software vendor engendered in its marks–I mean, customers, i.e. executives, and “champions” like my colleague–was intently manufactured and smartly deployed.

As it turns out, removing the human from the loop entirely in that manner turns a trained engineer into a machine operator; without agency, without understanding what is really going on, without considering the parameterization of your design problem and the Curse of Dimensionality, without even the possibility of understanding why some optimization algorithms are a bad fit for the task at hand, or why a multi-fidelity optimization strategy (at the time, not supported by the optimization suite) would make things better across the board.

Hey there trained engineer! Just click here, then click there; pick something from the dropdown. We have many optimization algorithms. Sure, pick an Evolutionary Algorithm for your 6-hours-per-variant simulation, why not! Laughing all the way to the bank…

And forget about using Machine Learning being part of any such automated software; no no no… this was an optimization suite by a simulation software vendor. Machine Learning? LOL. You must be kidding! The more you calculate, the more licenses you use, the more people complain across teams about waiting for the license manager to free up per-core licenses from an increasingly-constrained supply of licenses, the higher the pressure to pay for more licenses.

Calculate more, you peasants!

Same old promises and worries, in a different cloth

Vibe coding gives me a mix of the same vibes as both the old process I had been subjected to, and of the fully-automated approach at the same time.

At least when running through the old process you will gradually upskill. Seriously, with the number of hours you need to spend getting to understand the problem before setting up your “experiment” in an effective and efficient manner, you will learn more about the task at hand and about engineering design and optimization.

You will achieve better intuition and understanding about the tricky trade-offs between the parameterization of your design, the discretization of the simulation model, the choice of optimization algorithms, the no-go areas of the design space, the inputs vs. the outputs, the engineering technologies (materials, etc.), the impact of inputs to outputs…

You would have to be either stupid (unlikely) or wilfully ignorant and or uninterested in learning, to not gradually start seeing patterns between all the decisions you make during the semi-automated process and the impact they have on the outcome of the design/development process.

“True” vibe coding, like the one espoused by those letting a Claude Code or some other “agentic” system run amok on a software-development problem gives me the exact same vibes as the fully-automated engineering optimization approach.

Only, now, it’s not simulation hours or per-core licenses anymore that you keep consuming; it’s tokens.

Spend more tokens, you peasants!

So, I’ve seen this play out before.

When pursuing a fully-automated approach in the mechanical engineering R&D world, you become a mere machine operator of a “pushbutton optimization suite” and a useful idiot for the software vendor (who sells the shovels in the gold rush) to utilize more and more of the licenses the company has paid for.

In the same way, in agentic coding (because I would not call it software engineering, more on that below), when you hand the reins entirely to an “agentic” system and lean back and look at the beautiful results, you become a mere machine operator for a “pushbutton software coding suite”. You also become a useful idiot for the supply chain that starts at the agentic system’s vendor and spans all the way from there to the LLM vendor, eventually landing to Nvidia supplying the growing global thirst for compute capacity–and even farther back, to the fabs.

In case you are one of those people who are worried about the role of the software engineer becoming obsolete, you might have noticed that mechanical engineers still haven’t become obsolete, and won’t ever become obsolete; and not due a lack of affinity of software automation for engineering work. In leading companies, engineering work changed to an insane degree between the 1990s and the 2000s–and it has kept changing all throughout the last 20 years, as Computer Science-adjacent topics have long entered non-software engineering curricula.

The better engineers among those who stayed in purely technical roles (nothing wrong with that, I grew to love my development-engineer job), kept evolving how they worked as these changes kept coming about.

Some branched out towards systems engineering, looking at the whole picture while having a deep understanding of the details.

Others transitioned towards managing engineers, a job that includes bringing juniors up to speed.

Yet others moved on to Machine Learning, which is seriously one of the best things you can do as a mechanical engineer; chemical engineers already had a better grasp of EDA and Data Mining, for example, and many found ML a natural expansion of their skills.

Some others (like me) branched out to business and to software, while keeping their engineering ethos alive: that of striving to understand and resolve trade-offs between aspects of a system while also understanding its failure modes, under constraints technical, operational, and business-related.

So no; my money is not on software engineers de-facto becoming obsolete. Many code-slingers will, though–but those were not software engineers to begin with.

Software engineers and those who think like engineers (thanks to experience and/or intellectual curiosity despite not having a formal education/degree in engineering) will have to adapt to new technologies and new conditions; new expectations too! I.e., like every other engineer, regardless of their engineering discipline (mechanical, electrical/electronics, chemical, civil, biomedical, …)

Much like, if you are a mechanical engineer, it has been expected of you expected to know how to competently, effectively and efficiently use a 3D CAD piece of software for the past 25 years (as one trivial example), so will be at some point be unheard-of to not know how to use LLMs competently, effectively, and efficiently–or whatever technology will arrive over the next years, for that matter.

Why should the discipline and the job of a software engineer be any different?

So don’t get all riled up by the current rhetoric espoused by those with a self-benefitting incentive and agenda of convincing you that your only professional outlook is to become obsolete. They play their marketing and branding game, and get media outlets targeting the gen-pop to write impressive-sounding articles, usually written by people who have no clue about the nuances of the subject matter (not only of “AI” but also of the alleged to-be-revolutionized applications). Mealy-mouthed normies who portray themselves as “thought leaders” then regurgitate the breathless hype they’ve been fed, parroting someone else’s marketing talking points to portray themselves as “visionaries”, “thought leaders”, “experts” and “influencers”.

It all lands on your phone or computer screen, delivered by algorithms that target engagement instead of triggering critical thinking (actually, aiming to bypass it entirely).

Some choose (or let themselves be led) to believe that the sky is falling; that software engineers are doomed; that the job market for software engineers is perennially gone, dead, a dead-end occupation; and plenty of other doomsaying prophecies that trigger a fight-or-flight response. Either flight towards different career choices, instead of doubling down on becoming better on something valuable (the engineering mindset within software development) or fight, but in the wrong way: spending more tokens to get on the much-hyped bandwagon of agentic coding.

This is the same bandwagon that thousands of others are getting on, out of FOMO (Fear Of Missing Out). Because, after all, the HR drones pose requirements on those newfangled tools now. Didn’t you know? Self-commoditization is in, somehow. Undifferentiation is a career strategy now. Otherwise, you’ll miss out on the hottest, latest trend/fad /s and the salary bump it will (maybe) command for a few years, until it too falls out of fashion.

Relax. It’s all a circus of pomp that distracts you from what remains evergreen: honing your craft, without letting yourself fall entirely behind in terms of the technologies you are competent in. Notably, doing so while also understanding the trade-offs that all these new technologies entail, because as I saw in this experiment and as countless people increasingly realize (judging by online posts), it’s far from all roses when it comes to agentic coding.

Plus, we seem to have entered a period of maturation of the LLM technology and even the Sam Altmans (“AGI” snake-oil peddlers) of this world start downplaying previous claims about GPT becoming “smarter than a PhD” (not a very high or meaningful target, but to must normies who have don’t understand, this sounds impressive), to much scoffing and ridicule, which they so rightly deserve after selling us hype for close to 3 years now.

So, no; becoming obsolete is not the only outlook. It’s a possible outcome though. And it becomes more likely, the more you buy into the hype and divert your attention from evergreen values.

By all means, do become competent in using LLMs as you see fit. Just don’t think that this is the final station of the technological journey. I am personally enthusiastic and optimistic of what may (will) come next, and at the same time I already guess that it too will not be what the future hype will sell it as.

Such is life. Grow your competence, maintain your capacity for critical thinking, and remain vigilant and skeptical .

Is vibe coding engineering?

You may call vibe coding whatever you like, so that it “gives less Gen-Z”. Call it “automated code-generation”. Call it “LLM-assisted coding”. Call it “George” or “Maria”.

As long as the activity is dominated by pursuing full automation and “pushbutton solutions”, engineering it ain’t.

And do note: despite having always been a “STEM supremacist” (emphasis on the “E” and the “T”), I by no means proclaim that engineering is the end-all be-all thing that everyone should strive for, for everything.

No no no… Engineering is an expensive process that not all problems are worthy of. The “iron triangle” (time, cost, quality/scope) of something that must/could/should be developed must always include the cost of engineering, which depends on how you go about tackling the problem to be solved and accomplishing the task at hand.

The engineering is not for free; but hey, neither are the tokens (except if you run mid LLMs locally).

There is also a flip-side to the cost of engineering; a side-effect if you will, if you like functional programming.

The long-term benefit of pursuing anything with an engineer’s mindset and/or in an engineering process is that you increase your skill in understanding anything that you touch; from the physics and constraints that dominate turbine performance (e.g., mechanics and fluid mechanics, manufacturing and metal-casting tolerances) to the “physics” and constraints that dominate the performance and quality of the software you develop, including architecture, computational capacity, architectural fit, maintainability, future-readiness, business risks.

Most importantly: whether it fulfills the requirements posed on it by the system the component (mechanical or electrical or electronic or software, etc.) is a part of.

And, regardless of whether we are talking about turbines, radars, an API, a SaaS, its business model, or anything else, part of the cost-and-benefit balance of engineering process includes understanding better and better what it is to develop. Rarely perfectly, but typically increasingly.

To reap these benefits always requires talking to others; hashing out requirements, specifications and designs; iterating on concepts; unveiling the hypotheses behind your “design”; understanding levels and trade-offs; trying out new things in a well-reasoned, non-YOLO/non-random manner; witnessing the impact of your decisions stacked together and attempting to understand them in isolation and (if possible) their aliasing, understanding what “sticks”, if it sticks, why it sticks–and, most importantly and often neglected: why not.

Touch grass

If you think that an LLM will be able to give you a reliable verdict on a requirement, a spec, a user story, a product concept, or a system architecture just because you managed to find “the perfect prompt”, go out and touch grass, really. Step out into the real world, away from the bubble that algorithms and marketeers have been trying to keep you in, unhealthily optimistic about the potential of this and any other technology and at the same time fearful of the future and what any technology might bring.

Into the fray!

Talk to people, be they colleagues, customers or other stakeholders. Feel unnerved and remain cool about the ambiguities and contradictions you’ll hear and see. Accept the uncertainty and try to reduce it gradually. You have noticed, I think, that LLMs also are not oracles of truth.

If anything, the recently uncovered biases and tendencies towards flattery, added to the perennial and unavoidable “hallucinations” (a euphymism for “stochastically-generated bullshit”) should have made you plenty comfortable with uncertainty, ambiguity, and all the annoyance of not being sure if the answers you get are straight or twisted by a hidden agenda, an algorithm, or people’s biases and preferences seeping into the responses you receive.

As an engineer (or a product manager thinking as an engineer) it’s your job to figure things out, address uncertainty and ambiguity and make decisions regardless of the noise inherent in anything involving fallible, “predictably irrational”, whimsical and unreliable human beings.

That does not negate the usefulness of LLMs or any other revolutionary technology, if you know how to use it properly. You can use LLMs to get up to speed with some aspects and the various dimensions of a problem or domain you are not or not entirely familiar with–with the caveat that in this case you should keep having that nagging, justifiable feeling that you cannot know if what you’re being told is “hallucinated” nonsense or partial truths, an incomplete and biased picture, etc.

As long as LLMs are not tapped into human brains to get immediate feedback, it is a pure pipe-dream that you will get reliable verdicts. And even if they were, they would be tapping into human beings. Good luck getting to the ground truth!

And if you think that what an LLM responds with must be close to the ground truth because, after all, they are consuming “the entirety of humanity’s knowledge and are even running out of training data”, hey, wake up. Reading an online recipe makes me a chef about as much as LLMs becoming capable of providing reliable verdicts because they ingested an insane corpus of text and can access the web now (which is very useful, but imperfect).

After all, what did you really expect? Do you think that the corpus itself is not a sample biased by not only what’s been written (hardly a uniform sample), but also (documented already) by those who train the model?

The Incredible Story of Deft… for AI

The world of software seems to me eternally a victim of its own fast-moving, explosive nature in the past decades and of its own lack of awareness of industrial history (or of how things take place in industries other than software).

Add to that the frothy valuations and insane hype pushed indirectly by Venture Capital firms betting on the casino of startups to justify said valuations in the hope of an “exit” or passing off a hot potato to some other sucker before it explodes, and you have a potent mix of hyperbole, fad-chasing, misplaced expectations, and ultimately disillusionment.

The software industry is not only repeating the same mistakes that other industries have learned painful and valuable lessons from. It’s even eagerly and cyclically repeating the same patterns that it itself went through earlier.

Remember Agile? No-code? Low-code? Microservices for everything? “More microservices than customers”? NoSQL? MongoDB? Deploying Kubernetes because it’s “the way it’s done” and drastically reducing your runway because you hired a CTO who’s turned the acquisition of Microsoft and AWS certificates into a part-time job, and will bring in all that complexity into the business setup to justify… something? The scoff at PHP? The self-indulgent “ooohing” and “aaahing” behind Rust? Design Patterns?

I could go on and on and on. The graveyard of hyped “miracle cures”, cargo-culted “obvious choices”, tribally-shaped echo-chamber opinions and “the only right way to do it” is growing larger every few years. Perhaps by the year, nowadays.

Big fat promises of increased productivity, faster, cheaper, better product with less costs. Fewer, less-experienced developers pushing buttons without understanding or even having the chance to gradually understand more of what they are actually doing. After all, it will all be taken care of by some “pushbutton” system that does the heavy lifting for them.

And don’t get this confused with a false analogy of “but you were using 3D CAD software without ever having programmed a computational geometry kernel”.

The vast majority of mechanical engineers, as one example I’m familiar with, may not have (certainly have not) written such a complex piece of software (a task better left to mathematicians or mathematically-inclined engineers, by the way). But I can assure you that most have taken lectures on numerical computation, linear algebra, materials science, thermodynamics, fluid dynamics, and perhaps even wrote rudimentary FEA code to be able to understand what the software vendors’ big impressive piece of simulation software is doing with high computational efficiency behind the scenes.

But how many of the pushbutton vibe-coders nowadays understand what’s happening, what’s really going on behind the scenes? How many even understand or care about quantization, inference, context and context windows, context compression, etc., and the limits of the LLM technology in the applications they use them for? How many understand anything not only about the trade-offs behind the intricate agentic system they (or their starry-eyed employer) so eagerly pay tokens for, but also about the long-term compromises and sacrifices they blindly engage in, to their own professional and competency-related detriment?

I’m not gate-keeping here. I’m no expert either. I’m a 99% happy user of various LLMs, both proprietary and self-hosted, and have derived immense value, despite being even close to an expert. I’m a dabbler, like the vast majority of us. I’m no better.

You don’t need to go get a PhD in those matters to use LLMs and derive benefit from that use. But at least understand the basics of what you’re using, whether and how they can deliver more reliable results, and what you’re in for in the long run if you don’t maintain a critical view on what you’re being given as responses or results in general.

Or, you know, just YOLO it and gradually yourself into a replaceable cog, an interchangeable machine operator who is an expert “prompt engineer” (what a stupid, ridiculous inflation of the “engineer” term), but don’t understand half of what the agentic coding system gives you, and the compromises and sacrifices you increasingly unknowingly accept regarding the product/codebase you are vibe coding on.

It’s a meme by now: the clueless enthusiastic beginner who posts “I can code anything with Claude Code” on X, Bluesky or Linkedin–and then gets wrecked because of not understanding the fundamentals, such as authentication, securing a server, API keys and keeping them secret, etc., and instead relies on a myriad of subscriptions to SaaS and/or flashy startups that deliver “pushbutton experiences” and the illusion of competence.

This has been happening way before LLMs arrived to the scene. Conveniences and illusions delivered at a hefty monthly bill, be it in $20 per month per developer at Vercel for the priviledge of deploying a NextJS app (trivial on a VPS), or some managed database vendor to deploy PostgreSQL (trivial, except if you want to do clustering, which as an enthusiastic beginner you probably don’t even know what it is), or any of the agentic code-generation tools by vendors who will happily sell you those “streamlined experiences” and the illusion of “care-free” anything, and other marketing bromides that get your brain to switch off and self-commoditize with your eyes open, thinking that the alternative is difficult, prohibitively time-intensive, or even impossible–or worse, “not what you should be doing”. Note that the marketing is always promoting some things as “you should be doing this” while also dumping on viable alternatives, portraying them as “you should definitely not be doing that”.

They try to put you and keep you in The Cave , from which you must one day escape.

The two poles

It all sounds like a curmudgeonly rant to some, I know. But look at the polarized state of opinions about what is called “AI”.

On the one side, we have:

  • Enthusiastic beginners who think they found The Philosopher’s Stone; it will allegedly transmute the base metal of prompts into a golden product. They don’t even know what they need to know. Dangerous, but primarily only for their own potential of developing skills and competencies to make a living–and dangerous for their own wallet, or their employer’s budget.
  • Bombastic executives who want to please shareholders with their visionary “AI strategy” (that a consulting firm will happily bill them for, to deliver boilerplate slides and executive absolution) and proclaim that this will revolutionize their company or industry; exactly like what they said about the last fad-du-jour, like Agile, Lean, IoT, Industry 4.0, Cloud, no-code, low-code, blockchain, etc.
  • Self-aggrandizing middle managers who want to display that they are something more than an agency-less corporate drone and do exactly that: they drone on and on on Linkedin about things they most likely don’t understand; will never understand, except if they have spent or decide to invest years “in the trenches” getting acquainted with the “ground truth” of what any new “miracle cure” can actually do, and under which conditions–and what it can never do, regardless of what the hyperbole around them claims so confidently.
  • Skittish individual contributors who jump on the latest shiny new object to add something new to their CV (already loaded with certificates, GitHub repos, etc.), so that they can soon go negotiate a new position elsewhere. Fine–“don’t hate the player, hate the game”; but understand: if you have been stuck in tutorial-hell levels of competence and can never get to stick to something (a language, a framework, a domain) for longer than a few years to learn to think like an engineer, you are jeopardizing your own overall competency, as well as your resilience outside of an organization that might happen to want X years in $FRAMEWORK or M years of $SKILL. You are at the mercy of the gatekeepers (HR), who are famously clueless. If you are going to play that game or cannot afford not to play it, at least play it in the long run.
  • Self-flatulating consultants and coaches who somehow manage to change their expertise (and LinkedIn headline) insanely fast with every new wave of hyperbole. Wow, these people are seriously incredible (actually, literally in-credible–hey, perhaps that’s the meaning of LinkedIn’s in logo, LOL). ChatGPT launched end of 2022. By the time they were back in their office in 2023 after New Year’s holidays, they somehow managed to build enough understanding, expertise even, to rebrand themselves as “AI expert”. Amazing how inflation of something (words, but really anything) reduces that thing’s value. Groundbreaking insight, I know. Please award me the Nobel Prize of Economics.
  • “Me too” AI-wrapper startup “CEOs” who, alongside countless others doing similar things with “AI”, primarily transfer wealth from VC funds to cloud compute providers, LLM-chat and agentic-coding software vendors and, ultimately, whomever is making the chipsets. Back in the day everyone was making “the next Facebook”. This is today’s equivalent. Congrats; you made yet another AI wrapper-app that is calling upon the OpenAI API. Revolutionary. But at least you are now a “CEO” with startup experience; you probably never had a chance to build a viable, sustainable business; you just serve(d) as the usefully ambitious captain of a bet, and only burned through seed and other rounds of capital-raising, while benefitting from the “CEO salary” (the Expected Value of stock options in a risky startup is zero) and the clout of being in the increasingly-suspect startup ecosystem. Well played. Now watch, as OpenAI rolls out your startup’s entire value proposition as a feature, at vastly lower cost to your wished-for customer base… You just got Embraced, Extended, and Extinguished; this strategy worked well, and will keep working well. But OK, you’ll move on to something else. Some other “revolutionary” startup, most likely, where the cycle will begin anew. Chief Capital Burning Officer.
  • Gambling VC managers who enable the entire ecosystem, hoping to get that one great exit to offset the other gambles, while serving as useful funnels of capital between parties. Judging by their bets, and by what I read on the “vetting process” (which seems to reward “hustle” to fuel unprofitable growth as fast as possible, regardless of eventual business viability) I struggle to trust the process overall and to consider these bets as some kind of indication of long-term potential for a business they have bet on. For sure, there must be some VCs that know their stuff. But the vast majority? The proof is in the pudding. If you have to bet on multiple startups cut from the same cloth and chasing the same trendy fad, this doesn’t reek of confidence or intellect in placing bets, but of an attempt to diversify risk across many, very risky bets. The math sometimes checks out in aggregate. But a crazy amount of resources is wasted on nonsense in the process.
  • Hype-peddling software vendors (and their suppliers) profiting from all of the above. I already expressed my opinion on those, so I won’t reiterate it here.

My point here is not to blame any of those caricatured groups, but to understand that they are all part of a system. POSIWID . These caricatures are playing in a game that serves the purpose of the system, but not one that is not without its risks both to themselves and others (and the system itself, and the systems it connects to), even though its caricature derives its own benefits, somehow–or they would not keep playing that game.

Look at the outcomes and you can figure out what the purpose of the system is in the world of software. I won’t break it down for you. But, wink wink, nudge nudge: if you think it’s to make it possible for anyone to build a software product that can dethrone incumbents, wake up . Part of the incumbent playbook is to corral potential condenders in ways that defuse the danger.

Boundless uninformed enthusiasm, bombast with aplomb, nihilistic self-aggrandization, skittishness and FOMO, portraying yourself as something you have no chance of being without the necessary elbow grease, imitation of others in the “in group”, incompetence hidden behind claims of uncertainty and bold bets, and hype driving the whole system. That’s all part of the human psyche and condition. It will always be there, especially in a system that incentivizes all that and has these elements work hand-in-hand to fulfill the system’s unspoken purpose–which, again, just to be clear, is not to make you competent enough to pose a risk to any incumbent among the list above and in the software industry.

And no, that’s not a “left vs. right” position. It’s one informed by knowing how incumbents play against contenders, and an argument in favor of market competition and more awareness and information transparency, not crony capitalism.

No surprises for me. Been there, seen that; see also: The Incredible Story of Deft .

On the other side we have everyone else. Who is that?

  • Anyone who has seen waves of hype and big promises come and go every 3 to 7 years.
  • Anyone who understands the hype cycle and its links to human nature and to agency issues within and across organizations.
  • Anyone who is enthusiastic about new technologies, but takes vendors’ and consultants’ promises with more than a few grains of salt.
  • Anyone who understands that LLMs are tools, and tools are not universally applicable.
  • Anyone who doesn’t want to become a mindless button-pushing drone, i.e. the exact kind of “worker” that historically again and again commoditizes themselves out of a living by chasing fads, neglecting to build competence, settling in the comfortably familiar world of patting themselves on the back as “visionary”, “expert” etc., and gradually de-skilling themselves while keeping up appearances of always being up to date on things they have no chance even understanding, because this takes time and “friction with the material”, as we say in Greece.
  • Other.

Don’t become or remain a fool with a tool

All this long diatribe for me to give my unsolicited, subjective, undiluted opinion and realization stemming from this experiment:

I really dislike vibe coding.

Actually, during these past two days I came to hate it; more than once I got this disgusted knot in my throat. But “hate” is an expensive feeling, so let’s not inflate it. “Dislike” will do just fine.

  • I dislike that it mirrors the stupid and slow trial-and-error development process (in an industry that is not about software) that I strived so hard and managed to to escape from almost 2 decades ago.

  • I dislike that the promised fully-automated “YOLO it and let the agent code” approach gets so many ooohs and aaahs despite the fact that my experiments from the two past days clearly demonstrate that the technology is amazing in its own right and also not what the “one side” above portrays it as, or wishes it would be.

  • I dislike more than anything else that (some; many?) people once again seem to gravitate so eagerly towards taking the lazy path towards outcomes, without spending time to understand the trade-offs, compromises and sacrifices their actions have, not only on the outcome / the product, but on themselves too. And I would not care about that; you do you, if you are one of those people. But spare us the incessant vomit of regurgitated “thought leadership”.

Because it is one stance to knowingly admit “hey, I don’t have the time or interest to learn Rust, C, or OS internals to write a NIF, so I’ll let LLMs code it for me and see what I’ll get”, and an entirely different and long-term detrimental stance to say, proudly even, “lol bro, why do I need to even understand what I’m doing? Claude Code will write it for me!”

A fool with a tool is still a fool, even if the tool is cool and makes the vendor drool.

The techology of LLMs is amazing, especially if you (as I did years ago) played with NLP (NLTK and spaCy) and the early GPTs. At the same time, LLMs are an incredibly dangerous technology for those who will use it in the wrong way. It is primarily dangerous for themselves, but there are secondary, knock-on effects for everyone else who will be impacted by knee-jerk reactions to hyperbole, and ill-considered optimism for something that fundamentally cannot work as promised.

Again, see this experiment with a super simple NIF, and read up on recent benchmarks and reports on LLMs’ evolution since 2023. We have already entered the “power efficiency per token” phase of the product category’s maturation towards the extraction of some profit, eventually, hopefully. And that’s not bad, given the preposterous amounts of energy expended every minute by people doing anything with LLMs, from virtuous learning and exploration, to eternal idiots using ChatGPT to read coffee grounds

So, again, so that I’m not misunderstood: LLMs are an amazing technology.

My first computer having been an 8088, and having implemented a version of ELIZA in C 22 years ago, I feel as if I’ve been living in a science-fiction novel since ChatGPT launched.

As with any new technology, it deserves your time and energy (and critical thinking) for at least a cursory evaluation and, if you find it useful for your needs, a gradual deep-dive to become aware of what it can and cannot do, regardless of the noise surrounding it that claims it can do anything imaginable.

I am absolutely certain that it’s a good idea to learn how to use this technology competently. This includes when and for what not to use it, when to second-guess its output, and when to use it with caveats , which means understanding if you are in the “Danger Zone”, where I have clearly (and most importanly, knowingly) put myself into during the experiments of the past 2 days.

But, again, this “AGI” thing… sorry, this isn’t it. I remain unconvinced, and it’s a good idea for all of us to remain skeptical.

Rant over. Enjoy DiskSpace v1.0.0 –even if it does something already done with :disksup.get_disk_info/1!

Will I vibe-code ever again?

Not if I can avoid it.

I’ve had great experiences for the past 2.5 years using LLMs as collaborators, as text summarizers, as sounding boards on things I know enough about to critically evaluate the responses, etc. I.e. in situations where the technology works well.

Two weeks ago, I used a Systematic Problem-Solving approach with Grok 3 to figure out the reason(s) behind and possible fixes for intermittent voltage drops on a PC’s PSU that caused the machine to briefly almost shut down. And together, stepping through the SPL process and experimenting methodically with the factors that could contribute to the issue, we figured it out. It was 1) dust on the motherboard’s VRMs causing something to overheat and at the same time 2) dust accumulated in the UPS’s casing that causes that one to overheat. Two relays were briefly tripping at the same time.

So, use LLMs as collaborators, when you are competent, methodical, and clear about their limitations and their strengths. They are amazing. But still not “AGI” :P

Am I angry?

Not at all; I just enjoy writing in that tone. I might be yelling into the void or at the clouds, but it’s also cathartic.

I am, however, disillusioned. But I am also a bit wiser about vibe coding now. And that was the whole point of this experiment with the NIF, once I understood that DiskSpace doesn’t actually need to exist, and that Erlang/OTP has me covered.