Stop squashing your commits. You're squashing your AI too

5 points | by jannesblobel 165 days ago

5 comments

Hackbraten 165 days ago

I’m finding it difficult to agree with you without a concrete example.
How exactly would it help to have a commit that introduces a problem and then another one that fixes it? How does leaving in a bad refactor, failed attempt, or typo help the AI tool with anything?

[-]
- jannesblobel 165 days ago
  
  Think of a refactor where you tried one approach, rolled it back, then found the right fix. If you squash, all those failures vanish. With full history, an AI (or future you) can see the dead ends and spot patterns. I think that’s what Augment Code is doing with their Context Lineage idea: indexing the messy history so tools can explain how code evolved.
  https://www.augmentcode.com/blog/announcing-context-lineage
  
  [-]
  - skydhash 165 days ago
    
    Today I downloaded the source code of a small utility to check its internals. You know what I was not interested in? The git history. Instead I just download the tarball from Debian.
    Version history is only interesting if you’re doing archeology. And I would prefer seeing a squashed commit that introduce a complete change instead of going back and forth to get the complete picture (anyone with such messy history will introduce unrelated changes too).
    As for failure, put that in some tracker, with an “abandoned” status.
    
    [-]
    - jannesblobel 165 days ago
      
      > You know what I was not interested in? The git history.
      Sure, that makes sense, if you’re just interested in the internals, the history doesn’t matter. I get that.
      But what do you think about the idea of keeping two views of history? One that’s clean and human-readable, and another that preserves all the detailed commits. With the right filters, you could switch between the simple view and the full story.
      EDIT: By the way, I just want to discuss a theory/some thoughts here. There are always pros and cons, and perhaps my text is a little too harshly worded.
      
      [-]
      - skydhash 165 days ago
        
        I’m dealing with a not so clean history at work, and it’s a lot of hassle and confusion. Although, I’m always ready to reset and go with an alternative solution, for me these abandoned branches are like scrap papers. Good when you’re working on the tasks, worthless when you’re done. If an idea was really good, I’d create a patch or have a proper branch for it.
        One thing about code archeology is that you’re not really interested in the diff itself, but the commit description. Which is why an issue tracker can fit that role.
      - Disposal8433 164 days ago
        
        You need time to clean/reorder all those commits, and tools that don't exist yet to handle this double codebase in the hope that it may be useful in the future. Not worth it.
  - raw_anon_1111 165 days ago
    
    The issue is that once you pollute your context window with the “wrong” information even after you have guided the LLM to the right path, it is still more likely to go off the rails.
    https://research.trychroma.com/context-rot
arman_nocapro 162 days ago

Great analysis, but I think you're missing the forest for the trees here. The real issue isn't about "understanding project history" - it's about signal-to-noise ratio, plain and simple.
`raw_anon_1111` nailed it with the context rot reference. After working with LLMs daily for the past year, I've found that garbage in = garbage out, consistently. It's like working with that brilliant junior dev who can't see the big picture through all the implementation details.
You wouldn't dump your entire git history into a code review, would you? So why would you feed it to an LLM? `ManlyBread`'s "poison the context" is exactly right. Every token spent on explaining dead ends or reverted commits is a token wasted.
The solution isn't more data - it's better data. What we need are tools that create concise, high-signal context packages. Architecture diagrams, clean code, and clear requirements. Not the messy sausage-making that got us there.
This isn't just theory - I cut API costs by 40% when I started curating prompts instead of just dumping everything into context. The attention window is precious - use it wisely.
raw_anon_1111 165 days ago

LLMs are so bad with going off the rails when it comes to coding, I purposefully arrange my sessions so it doesn’t have to digest too much at once.
I recently had it go off the rails on some greenfield work where I was clearly using MySQL with Python and in the middle of the session it started generating Postgres code using the Postgres driver and doing Postgres style upserts.
bjourne 163 days ago

I don't think the AI argument has merit, but I agree with your general sentiment. Squashing commits destroys part of the signal and makes software archaeology more difficult. There is huge value in a commit history that reflects how the software actually was made.
ManlyBread 164 days ago

It's as easy to say that this will poison the context and produce worse results. Do you have any actual examples? Without any sort of an example this sounds like some software voodoo.