Sunsetting the Techempower Framework Benchmarks

(github.com)

72 points | by nbrady 1 day ago

13 comments

idoubtit 1 day ago

I've contributed a few optimisations to some implementations in these benchmarks, but as I read the code of many other implementations (and some frameworks) I lost most of the trust I had in these benchmarks.
I knew that once a benchmark is famous, people start optimising for it or even gaming it, but I didn't realise how much it made the benchmarks meaningless. Some frameworks were just not production ready, or had shortcuts made just for a benchmark case. Some implementations were supposed to use a framework, but the code was skewed in an unrealistic way. And sometimes the algorithm was different (IIRC, some implementation converted the "multiple sql updates" requirements into a single complex update using CASE).
I would ignore the results for most cases, especially the emerging software, but at least the benchmarks suggested orders of magnitudes in a few cases. I.e. the speed of JSON serialization in different languages, or that PHP Laravel was more or less twice slower than PHP Symfony which could be twice slower than Rails.

[-]
- brightball 1 day ago
  
  This was also my experience.
WatchDog 1 day ago

I really liked these benchmarks, and would check in with them from time to time.
No benchmark is perfect, but these ones cover such a wide variety of different languages and frameworks, it's a good resource for getting a rough idea of the kind of performance that a given stack is capable of.
I don't know much about TechEmpower the company, it seems to be a small consultancy, maintaining this project probably takes non insignificant resources from them.
The end of the project seems kind of unceremonious, but they don't owe anything to anyone.
Hopefully an active fork emerges.

[-]
- silisili 1 day ago
  
  It's cool in a 'how much can you tune it' kind of way, but has little practical value. Most sites would be tickled with a 4 digit requests per second number, so does it matter if your chosen framework does 50k/sec or 3 million/sec? Not really.
  I think the biggest problem was it just had too many entries, most of which seem tuned to cheating benchmarks. Would probably be more valuable just choosing the top 3 by popularity from the top 15 languages or so.
  
  [-]
  - fredrikholm 1 day ago
    
    > too many entries, most of which seem tuned to cheating benchmarks
    Even for entries that didn't cheat, the code was sometimes unidiomatic in the sense that "real programmers can write Fortran in any language".
    This[0] article articulates the issue with by highlighting an ASP.NET implementation that was faster than more 'honest' Java/Go implementations primarily by not using ASP.NET features, skirting some philosophical line of what it means to use something.
    For me, the more interesting discussion of whether a language/library is faster/leaner than another exists in actual idiomatic use. In some languages you are actively sweating over individual allocations; in some you're encouraged to allocate collections and immediately throw them away. Being highly concerned with memory and performance in the latter type of language happens, but is seldom the dominant approach in the larger ecosystem.
    [0] https://dusted.codes/how-fast-is-really-aspnet-core
    
    [-]
    - c0wb0yc0d3r 1 day ago
      
      For anyone wondering, the ASP.NET Core benchmark applications appear to be largely the same.
      However it also appears that as of the last benchmark (round 23), “aspnetcore“ has fallen to 35on the fortunes leaderboard. The code for that result, really just uses kestrel. It doesn’t even import any of the usual ASP.NET Core NuGet packages, just what’s provided by the web sdk. [0]
      [0]: https://github.com/TechEmpower/FrameworkBenchmarks/blob/57d9...
  - re-thc 1 day ago
    
    > most of which seem tuned to cheating benchmarks
    The fix would have been requiring tests to catch the cheating. There were suggestions but it didn't happen.
    It was definitely possible to catch not having sent date headers (or caching them) etc.
ezekiel68 23 hours ago

I found a lot of value in these benchmarks and evangelized about them at my various employers over the years. Almost any enterprise is interested in lowering their cloud compute costs. Riddle me this: other than rotating out stale logs in cloud object storage or blocking malicious bandwidth drains from cloud CDN, what intervention lowers non-AI cloud costs more effectively than using a web service stack that requires dramatically fewer CPU and RAM resources while maintaining a high, error-free request rate?
A lot of handwaving about hAx in the benchmarks but many of these claims are from people who got their information secondhand (or worse). Actually reading code from the top submissions in the techempower/FrameworkBenchmarks repo (organized neatly under the frameworks/ directory hive) yielded valuable insigthts for me:
* Pipelining SQL requests has a massive effect on RPS for web services that will access SQL databases
* A well-maintained HTTP2/HTTP3 web server written in c named h2o is relevant in 2026, even if it is used as a proxy that delegates business logic to simpler web service workers written in Rails or in python 3 (via Gunicorn)
* For web services that write to a SQL database, the Axum rust stack, now with a healthy ecosystem of middleware modules, may provide up to twice the RPS as the Spring (Java) stack (externally discovered: at lower CPU and much lower RAM usage)
* Even frameworks written in JS (hyperexpress, just-js) or python (aiohttp) can vault into the realm of top-10 performers if they leverage OS-level asynchronous IO and SQL pipelining.

[-]
- jordiburgos 4 hours ago
  
  I would value more a benchmark of using the framework in the way that the docs describe because that is what the developers are going to use at the end. The micro-ultra-optimizations should be done in the framework/library.
- p2detar 7 hours ago
  
  But what does it benchmark then? The performance of each framework with its defaults or some heavily optimized piece of code for given framework that squeezes the best result possible? Are then all the benchmarks across all different frameworks on par with each other? No. I think these benchmarks were heavily skewed and lot remained hidden behind those results.
xnorswap 1 day ago

I always liked these benchmarks, I've been following them since the earliest rounds.
One thing to note is how much things have improved over that time. Numbers that used to top the benchmarks would now be seen as "slow" compared to the top performers.
The other useful thing about these benchmarks is being able to easily justify the use of out of the box ASP.NET Core.
For many languages, the best performers are custom frameworks and presumably have trade-offs versus better known frameworks.
For C# the best performing framework (at least for "fortunes") is aspnet-core.
That side-steps a lot of conversations that might otherwise drag us into "Should we use framework X or Y" and waste time evaluating things.
Are the benchmarks gamed? Yes of course, the code might not even be recognisable as Asp.NET Core to me, but that doesn't really matter if I can use it as an authoritative source to fend off the "rewrite in go" crowd, and it doesn't matter that it is gamed, because the real-world load is many orders of magnitude less than these benchmarks demonstrate is possible.
mseepgood 1 day ago

This text lacks information about why it is being sunset.

[-]
- bob1029 1 day ago
  
  Maintaining something like this is probably a little bit stressful.
  We all know some of us take our language and framework choices as seriously as religion. I wouldn't be surprised if there was a lawsuit involved.
- cies 1 day ago
  
  Indeed. It's weird they write so much with addressing the elephant.
  So lets discuss it...
  From the start I thought that the TechEmpower Benchmarks were testing all the metrics the JVM is good at, and non the JVM is bad at (mainly: memory usage, start-up time, container size). I got the idea back then than they were a JVM shop (could not confirm this on their current website).
  Lately the JVM contenders are not longer at the top. And the benchmark contains many contenders with highly optimized implementations that do not reflect real life use.
dom96 1 day ago

Sad to see this. I had so much fun implementing a http server (called httpbeast) from scratch to get as far up these benchmarks as possible.
I do agree with others here that it was possible to game them, but it still gave a good indication of the performance bracket a language was in (and you could check if interpreted languages were cheating via FFI pretty easily).
Feels like the end of an era.
nchmy 21 hours ago

My first thought is "good riddance". Not only were the benchmarks surely gamed by many frameworks, but it was my impression that the benchmarks didn't even really reflect any real world application - which have plenty of i/o and compute. Moreover, ain't nobody receiving 1000 (let alone 100k) rps.
dzonga 1 day ago

well done to the techempower team for the work done.
though the benchmarks were not exactly 100% accurate - they gave good estimates on how different frameworks / perform in handling web tasks.
they also helped people move to simpler / lighter web frameworks that are more performant and kind helped usher in the typical 'Sinatra/express' handlers for most web frameworks e.g .net core
they also showed the performance hit of ORMs vs RAW. so yeah well done.
narrator 1 day ago

Engineering has kind of moved on in a weird way from web frameworks. Now AI just writes document.getElementById('longVariableName') javascript and straight SQL without complaining at all. The abstraction isn't as important as it used to be because AI doesn't mind typing.

[-]
- re-thc 1 day ago
  
  > Now AI just writes document.getElementById('longVariableName') javascript and straight SQL without complaining at all
  I got a newer model that bypasses all that. It takes out Wireshark and send bytes straight.
GrooveSAN 1 day ago

Would you know any alternative?

[-]
- jerf 1 day ago
  
  The primary alternatives are:
  One, you don't need this. The vast majority of people working on the web are now so thoroughly overserved by their frameworks, especially the way that benchmarks like this measured only the minimal overhead the frameworks could impose, that measuring your framework on how many nanoseconds per request it consumes (I think time per request is a more sensible measure than request per time) is quintessential premature optimization. All consulting a table like this does for the vast majority of people is pessimize their framework choices by slanting them in the direction of taking speed over features when in fact they are better served by taking features over speed.
  Two, you are performance bound, in which case, these benchmarks still don't help very much, because you really just have to stub out your performance and run benchmarks yourself, because you need to holistically analyze the performance of your framework, with your database, with any other APIs or libraries you use, to know what is going to be the globally best solution. Granted, not starting with a framework that struggles to attain 100 requests per second can help, but if you're in this position and you can't identify that sort of thing within minutes of scanning their documentation you're boned anyhow. They're not really that common anymore.
  This sort of benchmark ranges from "just barely positive" value to a significant hazard of being substantially negative if you aren't very, very careful how you use the information.
  Framework qua framework choice doesn't matter much anymore. It's dominated by so, so many other considerations, as long as you don't take the real stinkers.
- andymors 1 day ago
  
  there are some, I've seen this one which is new https://mda2av.github.io/HttpArena/
nbrady 1 day ago

[dead]
derodero24 1 day ago

[dead]
andrewmcwatters 1 day ago

[dead]