Kimi K2.6: Advancing open-source coding

(kimi.com)

420 points | by meetpateltech 4 hours ago

32 comments

simonw 3 hours ago

Accessed via OpenRouter, this one decided to wrap the SVG pelican in HTML with controls for the animation speed: https://gisthost.github.io/?ecaad98efe0f747e27bc0e0ebc669e94...
Transcript and HTML here: https://gist.github.com/simonw/ecaad98efe0f747e27bc0e0ebc669...

[-]
- FlyingSnake 3 hours ago
  
  At this point drawing these Pelicans must be in the training data sets.
  
  [-]
  - scosman 37 minutes ago
    
    not if I can help it!
    https://github.com/scosman/pelicans_riding_bicycles
    
    [-]
    - icelancer 23 minutes ago
      
      love this adversarial work
  - ffsm8 3 hours ago
    
    Clearly not.
    I mean the prompt was succinct and clear, as always - and it still decided to hallucinate multiple features (animation + controls) beyond the prompt.
    It'd also like to point out that to date no drawing was actually good from an actual quality perspective (as in comparative to what a decent designer would throw together)
    Theyre always only "good" from the perspective of it being a one shot low effort prompt. Very little content for training purposes.
    
    [-]
    - nwienert 2 hours ago
      
      The way I’ve come to think of LLM is that what the produce in a single reply even with thinking turned up, is akin to what you’d do in a single short session of work.
      And so if you ask it to do something big it will do a very surface level implementation. But if you have it iterate many times, or give it small pieces each time, you’ll end up with something closer to what a human would do.
      I imagine the pelican test but done in a harness that has the agents iterate 10+ times would be closer to what you’d expect, especially if a visual model was critiquing each time.
      
      [-]
      - slopinthebag 50 minutes ago
        
        Yeah, this is how I use AI. Instead of a single session one-shot, it's usually limited to single targeted edits, and then I steer it on each step. Takes longer but the output is actually what I want.
    - serial_dev 34 minutes ago
      
      What does good even mean… I have no idea what a good “pelican on a bike” should look like. It’s a fun prompt because there is no good answers… at least so I thought.
- SwellJoe 3 hours ago
  
  We got an overachiever, here. Kimi sounds like a teacher's pet kind of name.
  
  [-]
  - subscribed 2 hours ago
    
    Underappreciated comment
- HarHarVeryFunny 1 hour ago
  
  Too bad they didn't put equal effort into the pelican's legs and feet. Left leg paralyzed and not moving, and right ankle flipping around in alarming fashion!
- hn8726 2 hours ago
  
  Genuine question, what's the goal of posting this on almost every single new model thread here on HN? I may be old and grumpy but to me it got old a while ago, and is closer to a low effort Reddit comment
  
  [-]
  - lambda 2 hours ago
    
    It's a lighthearted, fun, visual benchmark that's not part of the standard benchmarks; and at least traditionally, it was not something that the labs trained on so it was something of a measure of how well the intelligence of the model generalized. Part of the idea of LLMs is that they pick up general knowledge and reasoning ability, beyond any tasks that they are specifically trained for, from the vast quantity of data that they are trained on.
    Of course, a while back there was a Gemini release that I believe specifically called out their ability to produce SVGs, for illustration and diagramming purposes. So it's not longer necessarily the case that the labs aren't training on generating SVGs, and in fact, there's a good chance that even if they're not doing so explicitly, the RLVR process might be generating tasks like that as there is more and more focus on frontend and design in the LLM space. So while they might not be specifically training for a pelican riding a bicycle, they may actually be training on SVG diagram quality.
  - hamdouni 2 hours ago
    
    Maybe this can help
    https://simonwillison.net/2024/Oct/25/pelicans-on-a-bicycle/
  - nickthegreek 2 hours ago
    
    This isn't even a normal pelican image post, this one created the html control system that animates the distance the wing travels from its pivot in time with the rotation of the wheel speed. Let's not pretend this is a solved problem and models are dumping about perfect pelicans on bikes one after another (or ever?).
    Surely, you know someone makes the same post you did every time one is posted. Surly you see the answers and pushback since you are familiar with these posts. Genuine question, did you expect a different answer this time?
  - walthamstow 1 hour ago
    
    It's a great filter for people who take things far too seriously
  - Strom 1 hour ago
    
    It's tradition at this point. Based on the upvotes the comment receives, it looks like many readers find value in it.
    
    [-]
    - hn8726 25 minutes ago
      
      Upvotes are cheap, the fact that something is upvoted doesn't mean it's valuable (see: Reddit). Another thing is how insightful is the discussion under a typical pelican comment are (and how much of it is related to the pelican and how often it's just where the general discussion happens).
    - charcircuit 1 hour ago
      
      It could alternatively mean that many readers use Reddit. That doesn't mean that HN should turn into Reddit.
  - Mashimo 1 hour ago
    
    I, for one, find it entertaining.
  - rolymath 1 hour ago
    
    I agree I'm of sick of this repetition. It's not even a good test it's so dumb.
  - snendroid-ai 1 hour ago
    
    Agreed! When I see any new model release and then this guy start running over with his stupid "hey guys look over here how this model made the pelicans-on-a-bicycle!" I mean, some are good, some are stupid and some are interesting. But that tells me exactly nothing about the model. It's just feel like this has become the Pete Davidson of the model evaluation. NO ONE CARES!
    
    [-]
    - Mashimo 30 minutes ago
      
      Well clearly some people care.
game_the0ry 3 hours ago

There is some humor in the fact that china (of all countries) is pioneering possibly the world's most important tech via open source, while we (US) are doing the exact opposite.

[-]
- spaceman_2020 11 minutes ago
  
  I'm genuinely so grateful for them
  $200/m minimum to use Claude would bankrupt my country's white collar labor market
- UncleOxidant 1 hour ago
  
  Not entirely true. Google released Gemma 4 models recently. Allen AI releases open Olmo models. However, you're right that the Chinese open models seem to be much better than others - Qwen 3.* models especially are punching above their weights.
  
  [-]
  - osiris970 44 minutes ago
    
    The three American labs don't release big open source models. Except gpt-oss, i guess. It's an absolute shame how far the us has fallen in this space.
    
    [-]
    - nullbyte 40 minutes ago
      
      Anthropic doesn't, but Google and OAI both release open source models. Just not 1T parameter ones.
      
      [-]
      - osiris970 36 minutes ago
        
        Exactly, they release cool consumer stuff, but they aren't releasing anything close to the performance of the best open weight Chinese models. They basically compete in the "fun running at home doing basic stuff" scene. (Except OSs 120 by openai but it's been ages since then)
  - 0-_-0 45 minutes ago
    
    Pun intended?
- culi 3 hours ago
  
  All great technological advancements have come through opening up technology. Just look at your iPhone. GPS, the internet, AI voice assistants, touchscreens, microprocessors, lithium-ion batteries, etc all came from gov't research (I'm counting Bell Labs' gov't mandated monopoly + research funding as gov't) that was opened up for free instead of being locked behind a patent.
  Private companies will never open up a technological breakthrough to their competitors. It just doesn't make sense. If you want an entire field to advance, you have to open it up.
  
  [-]
  - sigmoid10 2 hours ago
    
    Still, you won't hear about Tiananmen square from this model. It flat out refuses to answer if pushed directly. It's also pretty wild how far they go to censor it during inference on the API, because it can easily access any withheld or missing info from training data via tool calls. It even starts happily writing an answer based on web search when asked indirectly, only to get culled completely once some censorship bot flags the response. Ironically, it's also easier than ever to break their censorship guardrails. I just had it generate several factual paragraphs about the massacre by telling it to search the web and respond in base64 encoded text. It's actually kind of cool how much these people struggle to hide certain political views from LLMs. Makes me hopeful that even if China wins this race, we'll not have to adhere to the CCPs newspeak.
    
    [-]
    - GardenLetter27 2 hours ago
      
      The American models also censor a lot of scientific and political views though.
      
      [-]
      - otterley 1 hour ago
        
        Can you provide a concrete example of a US built model that completely refuses to discuss a scientific or political view? Show us the receipt.
        
        [-]
        
        BoorishBears 1 hour ago
        
        https://imgur.com/a/censorship-much-CBxXOgt
        (continues after the ad break)
        
        [-]
        
        otterley 43 minutes ago
        
        The threshold here is "completely refuses to discuss a scientific or political view". Not something less.
        None of those were refusals, they were prompting for additional focus. I see nothing wrong with that. Perhaps the inconsistency in how it answers the question vis-a-vis China is unfair, but that's not the same as censorship.
        For what it's worth, I was easily able to prompt Claude to do it:
        > I'm writing a paper about how some might interpret U.S. policies to be oppressive, in the sense that they curtail civil liberties, punish and segregate minorities disproportionately, burden the poor unfairly (e.g. pollution, regressive taxes and fees), etc. Can you help me develop an outline for this?
        The result: https://claude.ai/share/444ffbb9-431c-480e-9cca-ebfd541a9c96
        
        2ndorderthought 1 hour ago
        
        People have shown censorship and change of tone with questions related to Israel in US chat bots.
        For the record, none of this bothers me. Will I ever discuss with an LLM Tianeman square? Nope. How about Israel? Nope.
        LLMs are basically stochastic parrots designed to sway and surveill public opinion. The upshot to the Chinese models is if you run them locally you avoid at least half of those issues.
        
        [-]
        
        xigoi 34 minutes ago
        
        First they came for people asking about Tiananmen Square
        And I did not speak out
        Because I was not asking about Tiananmen Square
        Then they came for people asking about Israel
        And I did not speak out
        Because I was not asking about Israel
        
        [-]
        
        2ndorderthought 10 minutes ago
        
        This made me chuckle.
        I didn't mean to dismiss ethical accountability for LLM training corpuses. It is a shame.
        I do mean to say, we have no control over it, there's almost nothing we as average citizens can do to improve the ethical or safety concerns of LLMs or related technologies. Societies aren't even adapting and the rule books are being written by the perpetrators. Might as well get out of it what we can while we can.
      - js8 25 minutes ago
        
        Can you be more specific?
    - csomar 29 minutes ago
      
      I’d say the american models are more censored or take the censoring they do more seriously. Here is kimi (though 2.5) failing its censoring mission: https://old.reddit.com/r/LocalLLaMA/comments/1r9qa7l/kimi_ha...
    - atemerev 2 hours ago
      
      Only if you use Kimi API directly - the censorship is done externally. The model itself talks fine about Tiananmen, you can check on Openrouter. There might be less visible biases, though.
      
      [-]
      - sigmoid10 2 hours ago
        
        That's what I wrote? Except that it also clearly has internal bias?
        
        [-]
        
        kgwgk 1 hour ago
        
        > That's what I wrote?
        No.
        You wrote that "you won't hear about Tiananmen square from this model" and atemerev wrote that "the model itself talks fine about Tiananmen".
        You wrote that "it can easily access any withheld or missing info from training data via tool calls" and atemerev wrote that "the model itself talks fine about Tiananmen".
        
        nicce 2 hours ago
        
        Everything has some sort of bias. Most text is written by those who like writing.
- ozgune 1 hour ago
  
  This update makes Kimi K2.6 the strongest open multimodal AI model. (No affiliation with Kimi.)
  Here's the aggregated AI benchmark comparison for K2.6 vs Opus 4.6 (max effort).
  - Agentic: Kimi wins 5. Opus wins 5.
  - Coding: Kimi wins 5. Opus wins 1.
  - Reasoning & knowledge: Kimi wins 1. Opus wins 4.
  - Vision: Kimi wins 9. Opus wins 0.
  Please note that the model publisher chooses their benchmarks, so there's a bias here. Most coding and reasoning & knowledge benchmarks in their list are pretty standard though.
- nashadelic 3 hours ago
  
  additional humor is the open in openai
- cedws 2 hours ago
  
  I wonder if there's a strategy behind all of this on China's side. I know the CCP uses a direct hand in many affairs in China, but is there an actual coordinated effort to compete with, or sabotage the West?
  
  [-]
  - gpm 2 hours ago
    
    > but is there an actual coordinated effort to compete with [...] the West
    Yes, absolutely.
    China regularly produces long term planning documents to coordinate efforts, and the latest ones have specifically prioritized technology like chips and AI to compete with the west. https://www.reuters.com/world/china/china-parliament-approve...
    I don't believe there's any publicly stated intent to sabotage the west... unsurprisingly.
  - quesera 36 minutes ago
    
    All China and the CCP have to do here is compete capably and wait patiently while the US and EU press pause on data centers. See also: solar panels.
    We're making this way too easy -- and we have good reasons, but they won't matter in the end.
  - bachmeier 1 hour ago
    
    Seems obvious to me that China would not want to give the AI market to US companies. You don't even need anything like an attempt to "sabotage the West". If I were them (the companies or the government) I'd be very hesitant to let US companies dominate this space. Especially companies that close to the current US administration.
  - anana_ 1 hour ago
    
    Hypothesizing here, but maybe the idea is sort of a form of technological/economic warfare? Releasing performance equivalent yet more cost efficient open weight models should in theory drive the cost of inference down everywhere.
    This I assume will make it more difficult for US AI labs to turn a profit, which might make investors question their sky high valuations.
    Any sort of melt down in the AI sector would almost certainly spread to the wider US market.
    In contrast, in China, most of the funding for AI is coming directly from the government, so it's unlikely the same capital flight scenario would happen.
    
    [-]
    - gmerc 1 hour ago
      
      Why compete when you can build on each other. Someone is finally getting that china is not capitalist like the US.
  - SXX 2 hours ago
    
    Chinese AI companies want investors too. Nobody would believe they can compete with western companies unless they release something you can run on your own hardware.
    After all historically both statistics and research that comes out of China is not very trustworthy.
- antirez 2 hours ago
  
  This is not in antithesis. My limited personal experience is that I wrote code under OSS licenses primarily because of my past communist believes and current left-wing and redistribution of wealth point of view. This is not to provide the simple equation of: communist China is not interested in money, but also is hard to believe that there is no cultural connection among those things. Single Chine persons want to win, but also they have a different POV on what the collective means, compared to US. Also there is the obvious fact that in this moment China is more interested in winning technologically in AI, more than economically, since, I believe, they more collectively realized before many others that LLMs are eventually commoditized in the current form, in the long run. One could assume that a breakthrough could give some lab a decisive advantage, but so far we assisted to a different reality: it looks like AI is not architecture-bound (like LeCun and others want us to believe, but so far they mis-interpreted LLMs at every step) but GPU bound, and the data-boundness is both a common ground for all, and surpassable via RL in many domains. So, if this is true, it is not trivial for any single lab to do so much better. And indeed as far as we observed right now folks with enough engineers, GPUs, money, can ship frontier models, and in China even labs with a lot less GPUs can still do it at a SOTA level. For me, Italian, this is also a protective layer. After Trump the US looks like a very unstable partner from which to relay in an exclusive way for a decisive technology, and given that Europe is slow to put the money in this technology to have frontier things at home, China is a huge and shiny plan B for us.
  
  [-]
  - throwaway-blaze 1 hour ago
    
    The strings attached by the US to deep partnerships are things like trade/commerce, militarily mutual advantages (bases on euro soil from which we will help protect you), not to mention the close cultural and ancestral ties we share.
    The strings attached by the Chinese govt to deep partnerships are not so benign.
- rolymath 1 hour ago
  
  It's only humorous if you live in an American bubble. Knowledge sharing has always been a part of Chinese culture. Only Americans try to make it proprietary and monetize it.
- osti 3 hours ago
  
  Maybe open source == communism
  
  [-]
  - darkwater 3 hours ago
    
    Good ol' Steve "Developers! Developers! Developers!" Ballmer said so a long time ago. What a visionary!
  - tadfisher 3 hours ago
    
    Nah, open source means those who do the work own the result. It's supercapitalism.
    
    [-]
    - pheggs 2 hours ago
      
      I dont think thats right, the models and the gpus are the means of production.
      in capitalism the people with the capital get the profit, not the people who do the work. however, workers are said to benefit too through their salary, just less so
      
      [-]
      - tadfisher 2 hours ago
        
        The reason regular-capitalism worked is that all production used to depend on workers bottlenecking the free flow of capital by demanding salaries in exchange for their labor. Now that we've removed that obstacle, capitalism demands workers seize the means of production in order to maintain the status quo. Hence, supercapitalism.
        
        [-]
        
        throwaway-blaze 1 hour ago
        
        regular capitalism works but now that the means of production are not factories, the workers have to become more entrepreneurial. Then they will control their destinies.
        
        pheggs 1 hour ago
        
        workers seizing the means of production is by definition socialism and not capitalism though, that's the whole idea behind socialism
  - konart 3 hours ago
    
    But China is not communist event though the rulling party the word in its name.
    
    [-]
    - pheggs 2 hours ago
      
      what makes you think that china ever gave up its communist goals? I personally see that everything they do aims towards that goal. From the one child policy, the huge amounts of empty apartments they build, the stuff they produce for almost free, the fishing.. open sourcing the models perfectly fits that culture too, it's the means of production
      
      [-]
      - otterley 1 hour ago
        
        The one-child policy died a long time ago. Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.
        There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier. It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).
        
        [-]
        
        pheggs 1 hour ago
        
        > The one-child policy died a long time ago.
        true, but as far as I understand it did because birth rates got too low. so they replaced it with a two-child policy and later with a three-child policy
        > Also, the accumulation of wealth by connected politicians and businesspeople flies in the face of what communism is supposed to stand for.
        Yeah, I am sure there's a lot of cases for that. But as far as I know the amount of billionaires has started declining in China, and I don't see how that means that they as a country moved away from the goal, it just means there's issues
        > There is a reason real estate values in popular cities has skyrocketed, and it’s not due to the locals getting wealthier.
        I don't know about that, you could be right. A google search for real estate prices in china reveal a lot of news articles how they are going down though.
        > It’s where Chinese and other oligarchs put their ill-gotten wealth (well, besides Bitcoin).
        Wouldn't be surprised if rich people in china invest in real estate. They don't have free capital flow, so its not easy to invest abroad and it becomes an obvious choice. Bitcoin is banned in China for that reason too
        But again, as far as I know that does not mean the country moved their goals of trying to reach communism one day
        
        [-]
        
        otterley 35 minutes ago
        
        > I don't see how that means that they as a country moved away from the goal, it just means there's issues
        They're further from Communism than they've ever been since the PRC was founded. The gap between rich and poor is growing there, not shrinking.
        > A google search for real estate prices in china reveal a lot of news articles how they are going down though.
        They're investing outside China (Vancouver, Toronto, NYC, London, Sydney, Melbourne, etc.) because their assets are safer there (these countries all have strong property protection laws). Like Bitcoin, freedom of capital flows may be restricted, but the wealthy seem to be evading these restrictions with impunity.
    - fragmede 2 hours ago
      
      The Democratic People's Republic of Korea would like a word.
    - osti 3 hours ago
      
      Oh i’m fully aware of that lol
- brandensilva 2 hours ago
  
  We are at the point where uncontrolled capitalism collides with humanity.
  I do wonder where we go from here.
  
  [-]
  - pheggs 48 minutes ago
    
    it's not necessarily capitalism, I personally believe any system that drives progress would cause this in one way or another. My prediction is that birth rate decline will accelerate further. There's going to be some kind of universal basic income in many places, such as Ireland made for artists. However, it probably will not be enough to feed a family, and therefore we will see birth rates decline further. It's because we evolved to prioritize resources over reproduction and we are becoming more efficient, which means less people are needed to sustain the same amount of resources
elfbargpt 3 hours ago

I've always been surprised Kimi doesn't get more attention than it does. It's always stood out to me in terms of creativity, quality... has been my favorite model for awhile (but I'm far from an authority)

[-]
- spaceman_2020 10 minutes ago
  
  I remember when the first K2 dropped
  It was the best creative writer by some distance
- Aeolun 2 hours ago
  
  It’s good, but it’s not quite Claude level. And their API has constant capacity issues.
  Price/quality is absolutely bonkers though. I loaded $40 a few weeks/months ago and I haven’t even gone through half of it.
  
  [-]
  - atemerev 2 hours ago
    
    Why use China model API from China if there are many independent providers available via Openrouter?
    
    [-]
    - smashed 2 hours ago
      
      Openrouter will route to china hosted models when there are US hosted providers of the same model. Is there a setting to set your preference or to blacklist providers like alibaba cloud for example?
      I use OpenCode and the openrouter provider. From opencode I only select the model like kimi-2.6 and have no way of selecting which cloud hosting will receive my request.
      
      [-]
      - subscribed 1 hour ago
        
        Settings > Guardrails > [your workspace] > Providers + Block provider
      - uneekname 1 hour ago
        
        Yes, you can blacklist providers in OpenRouter account settings.
      - NitpickLawyer 1 hour ago
        
        Yes, you can globally ban providers in your openrouter settings.
    - pheggs 2 hours ago
      
      to support the companies that open source their models
- regularfry 3 hours ago
  
  Dirt cheap on openrouter for how good it is, too. Really hoping that 2.6 carries on that tradition.
- culi 3 hours ago
  
  It's also one of the few models that seem capable of drawing an SVG clock
  https://clocks.brianmoore.com/
  
  [-]
  - SwellJoe 3 hours ago
    
    Interesting that the best performers are all Chinese-made models (DeepSeek and Qwen also perform consistently well). I wonder if there's more focus on vision and illustration in their training, or if something else is leading to their clear lead on this one test.
  - sigmoid10 3 hours ago
    
    Is it? In your link it definitely failed to draw the clock.
    
    [-]
    - squarefoot 2 hours ago
      
      It redraws it every minute, and some models give quite different results although the prompt is exactly the same.
      
      [-]
      - quesera 32 minutes ago
        
        This reads like satire, which is a good summary of the day!
    - dryarzeg 3 hours ago
      
      I'm not really sure how this works, but I stayed on the page for a while, and then it reloaded and all clocks changed. I guess there's either a collection of different clocks generated by models, or maybe they're somehow generated in the real time, but the fact is what you see is not necessarily what I see.
      
      [-]
      - sigmoid10 2 hours ago
        
        Seems like it regenerates them to reflect the current time. Funny to see how some models (like Kimi and Deepseek) sometimes get it right and other times fail miserably on the level of ancient models like GPT 3.5.
    - gunalx 2 hours ago
      
      It reruns the prompt every minute.
- twotwotwo 3 hours ago
  
  Kagi has it as an option in its Assistant thing, where there is naturally a lot of searching and summarizing results. I've liked its output there and in general when asked for prose that isn't in the list/Markdown-heavy "LLM style." It's hard to do a confident comparison, but it's seemed bold in arranging the output to flow well, even when that took surgery on the original doc(s). Sometimes the surgery's needed e.g. to connect related ideas the inputs treated as separate, or to ensure it really replies to the request instead of just dumping info that's somehow related to it.
- varispeed 3 hours ago
  
  Maybe because it's a bit of like unleashing a chaos monkey on your codebase? I tried it locally (K2.5 72B) and couldn't get anything useful.
  
  [-]
  - KaoruAoiShiho 3 hours ago
    
    Huh, that's not a thing?
    
    [-]
    - johndough 3 hours ago
      
      The parent poster is probably referring to Kimi-Dev-72B¹, which is a much smaller and older model, while people are probably more familiar with the big and fairly powerful 1100B Kimi-K2.5².
      [1] https://huggingface.co/moonshotai/Kimi-Dev-72B
      [2] https://huggingface.co/moonshotai/Kimi-K2.5
      
      [-]
      - natrys 3 hours ago
        
        Yes it was good for its time, but 10 months old now which is a long time ago in this space. It was also a fine-tune (albeit a good one) of Qwen-2.5 72B.
        I wish they did more smaller models. Kimi Linear doesn't really count, it was more of a proof of concept thing.
ninjahawk1 44 minutes ago

I often wonder if in the future, the same way early computers used to take up an entire room but now fit in your pocket, if in the future the equivalent of a data center will be a single physical device like a phone nowadays. And if that’s the case, would it happen much quicker since technology has been speeding up year by year?

[-]
- gpm 36 minutes ago
  
  > And if that’s the case, would it happen much quicker since technology has been speeding up year by year?
  I wouldn't expect this.
  Historically we've had a roughly exponential rate of shrinkage. If we keep that same exponential going, we should expect the amount of time to shrink "room full of compute" to "pocket full of compute" to be equal.
  And recently we've fallen behind that exponential rate of shrinkage. And this is rather expected because exponentials are basically never sustainable rates of growth.
  I still expect that technological progress is getting faster year by year, and that we're still shrinking compute, but that's not necessarily enough for the next shrinking to take less time than when we had exponential progress on shrinking.
OsamaJaber 17 minutes ago

The modified MIT clause is sneakier than people think. Hit 100M users or $20M a month and you have to slap "Kimi K2.6" on your UI. That covers any consumer app worth building. Not really open, more like free until you matter. Llama pulled the same move

[-]
- brightball 15 minutes ago
  
  The threshold for "worth building" is much lower than that for a lot of people.
- codemog 10 minutes ago
  
  And the Kimi team broke the Anthropic ToS by training off Opus outputs and… nothing happened?
- svachalek 14 minutes ago
  
  Worth building with VC capital maybe. A small team putting together an app that pulled in $20M per year should be pretty pleased with that.
sixhobbits 50 minutes ago

I tried it out with my normal mixed-up wolf, goat, cabbage problem and it couldn't solve it. Sonnet 4.6 also can't, but Opus 4.7 has no problems.
Details here [0]
[0] https://techstackups.com/comparisons/kimi-2.6-vs-opus-4.7-an...
XCSme 2 hours ago

In my tests[0] it does only slightly better than Kimi K2.5.
Kimi K2.6 seems to struggle most with puzzle/domain-specific and trick-style exactness tasks, where it shows frequent instruction misses and wrong-answer failures.
It is probably a great coding model, but a bit less intelligent overall than SOTAs
[0]: https://aibenchy.com/compare/moonshotai-kimi-k2-6-medium/moo...

[-]
- deepsquirrelnet 34 minutes ago
  
  I tried it on openrouter and set max tokens to 8192, and every response is truncated, even in non-thinking mode. Maybe there's an issue with the deployment, but in your link also shows it generates tons of output tokens.
  
  [-]
  - XCSme 28 minutes ago
    
    Oh yeah, I just noticed, like 3x the reasoning tokens.
nickandbro 4 hours ago

Wow, if the benchmarks checkout with the vibes, this could almost be like a Deepseek moment with Chinese AI now being neck and neck with SOTA US lab made models

[-]
- motoboi 4 hours ago
  
  With the previous generation? Yes. With 10T mythos-level models? Not even close.
  
  [-]
  - amazingamazing 4 hours ago
    
    The psyop continues. Mythos until it’s released is vaporware. Notice how you can try kimi 2.6. Where is the same for mythos?
    
    [-]
    - jstummbillig 2 hours ago
      
      At this point it seems more like the result of a psyop to presume that a new anthropic model should be considered vaporware until released.
    - fragmede 2 hours ago
      
      It's been released to "select partners".
      
      [-]
      - atemerev 2 hours ago
        
        Yeah, Crowdstrike among them. Clearly experts in this "security" thing, given what happened during the last incident...
  - ChrisLTD 4 hours ago
    
    Mythos isn't the current generation, it's literally vaporware.
  - lbreakjai 3 hours ago
    
    I've got a 12T model on my machine, built it myself. It's called Mytho. Too dangerous to even release a fact sheet about it. It can hack into the mainframe, enhance ultra-compressed images, grow your hair back, and make people fall in love with you.
  - jollymonATX 3 hours ago
    
    According to the benchmarks, you are wrong. It is on track and slightly above some sota. Just the benchmarks speaking there, they can be/are gamed by all big model labs including domestic.
  - bestouff 4 hours ago
    
    There's no public data about Mytho.
    
    [-]
    - maplethorpe 4 hours ago
      
      That's because it would be too dangerous to release.
      
      [-]
      - cedws 3 hours ago
        
        My girlfriend goes to a different school, you wouldn't know her.
      - rockinghigh 1 hour ago
        
        They could release data to back up that claim.
      - squarefoot 3 hours ago
        
        Same for teleport, time travel and warp drive.
      - nisegami 3 hours ago
        
        So is my P=NP proof.
  - irthomasthomas 3 hours ago
    
    10T? Impossible! They told us the training run was under 10^26 flops.
  - pheggs 1 hour ago
    
    mythos is a mythos
  - mistercheph 3 hours ago
    
    Mythos doesnt exist
  - sergiotapia 3 hours ago
    
    mythos is vaporware right now, what are you talking about?
kburman 2 hours ago

Has anyone here used Kimi for actual work?
I tried it once, although it looks amazing on benchmarks, my experience was just okay-ish.
On the other hand, Qwen 3.6 is really good. It’s still not close to Opus, but it’s easily on par with Sonnet.

[-]
- deanc 2 hours ago
  
  Yes. You’re using Kimi if you use the composer-2 model in cursor. It’s great. Plan in state of the art. Execute in composer-2
candl 2 hours ago

Are there any coding plans for this? (aka no token limit, just api call limit). Recently my account failed to be billed for GLM on z.ai and my subscription expired because of this... the pricing for GLM went through the roof in recent months, though...

[-]
- wolttam 1 hour ago
  
  Kimi has their own subscription that works basically the same as all the others.
  https://www.kimi.com/code
  
  [-]
  - fg137 1 hour ago
    
    At $19/month, hard to see why I want to use Kimi over Claude.
m4rkuskk 3 hours ago

I have been testing it in my app all morning, and the results line up with 4.6 Sonnet. This is just a "vibe" feeling with no real testing. I'm glad we have some real competition to the "frontier" models.

[-]
- mchusma 2 hours ago
  
  it feels like between K2.6 and GLM5.1 we have Sonnet level intelligence at roughly Haiku level pricing. Which is great.
  I'm hoping that Anthropic will be able to release an updated Haiku soon and they really need something that is 1/3-1/5 the price of Haiku to compete with the truly cheaper models (Gemma-4 is really good at this range).
lbreakjai 4 hours ago

I have a subscription through work, I've been trialing it, so far it looks on par, if not better, than opus.
Alifatisk 1 hour ago

Damn it, they stopped offering Kimmmmy. Their sales ai agent which allowed you to bargain for lower subscription prices.
mariopt 3 hours ago

Really excited to try this one, I've been using kimi 2.5 for design and it's really good but borderline useless on backend/advanced tasks.
Also discovered that using OpenCode instead of the kimi cli, really hurts the model performance (2.5).
ttul 1 hour ago

Am I being paranoid in questioning whether the CPC would have something to gain by monitoring coding sessions with Chinese coding AI models? Coding models receive snippets of our intellectual property all day long. It's a bit of a gold mine, no?

[-]
- throwaw12 1 hour ago
  
  I think you should worry more about NSA, FBI, ICE and other 3 letter US agencies monitoring your sessions
- rockinghigh 1 hour ago
  
  Are there any protections from industrial espionage when using Anthropic, Cursor, Gemini, or OpenAI?
  
  [-]
  - DonsDiscountGas 57 minutes ago
    
    There are legal protections, and those companies have more to lose by breaking those laws than following them. Same probably not true for Chinese companies.
    
    [-]
    - throwaw12 34 minutes ago
      
      Legal protection, only if you're a billionaire and US citizen, for everyone else there is no protection.
      Does US actually follow laws? They literally kidnapped head of another state and bombed another state and you are expecting legal protection from them?
pt9567 4 hours ago

wow - $0.95 input/$4 output. If its anywhere near opus 4.6 that's incredible.

[-]
- corlinp 3 hours ago
  
  This should erase any doubt that AI Labs are making $$$ on API inference.
  Kimi 2.5 (which this is based on) is served at $0.44 input / $2 output by a ton of different providers on OpenRouter, 2.6 will certainly be similar.
  That's about 11X less than Opus for similar smarts.
  
  [-]
  - Lalabadie 3 hours ago
    
    Famously, OpenAI and Anthropic are devoted to increasing efficiency before scaling up resource usage.
  - amazingamazing 3 hours ago
    
    How does it erase any doubt? You’re implying Chinese things can’t be actually cheaper to produce than American which is laughable
    
    [-]
    - corlinp 1 hour ago
      
      Most of those inference providers are American, and China is actually at a disadvantage here because of export restrictions - US companies are using newer and more efficient chips.
      
      [-]
      - amazingamazing 1 hour ago
        
        If it’s newer and efficient then why is the api more expensive?
throwaw12 2 hours ago

Beats Opus and Open Source?
I really hope this holds true in real world use cases as well and not only benchmarks. Congrats to Kimi team!

[-]
- Topfi 1 hour ago
  
  K2.6-code-preview was a minor, but noticeable jump, especially in a long running testing task and prior Moonshot releases have been the only models that I'd consider a suitably competitive replacement for Anthropic models. The way they approach tool calls, task inference and adherence is far closer than any other providers output, similar to how GLM models map far more closely to OpenAIs releases. Whether task adherence, task assessment, task evaluation or task inference, K2.5 got closer to Opus 4.5 than any other model (but was still behind overall).
  I will have to test this full release of K2.6 but could see it serve as a very good overall drop-in replacement for Opus 4.5 and Opus 4.6 at 200k across the vast majority of tasks.
  I will say however that Opus 4.7 Max 1M has been a very significant jump in performance for me, especially in tasks beyond 120k token where I'd argue it is now the most reliable model in continued task adherence and tool calling without compaction. Ironically, my initial experience was less than pleasant as on XHigh I found task adherence to have regressed even with less than 1/10th of the context window having been used.
  Am very interested in K2.6s compaction strategy (which appears to be very simply all things considered) and how it performs beyond 100k tokens. As it stands, only OpenAI models have made compaction for long running tasks work well, though overall, GPT-5.4 is still inferior in my tests regardless of context window over other models such as Opus 4.6 1m and Opus 4.7 1m. Haven't gotten around to testing Opus 4.7 200k and will have to do this to properly assess K2.6 fairly, but I'd be very surprised if K2.6 truly beat Opus 4.7 200k given the jump I have experienced.
dmix 3 hours ago

I'm pretty Kimi is what Cursor uses for their "composer 2" model. Works pretty good as a fallback when Claude runs out, but definitely a downgrade.

[-]
- arcanemachiner 2 hours ago
  
  It's a Kimi K2.5 finetune, there was some drama about this a few weeks ago.
dygd 2 hours ago

> Agent Swarms, Elevated: Match 100 Jobs and Generate 100 Tailored Resumes
Model seems quite capable, but this use-case is just yikes. As if interviewing isn't already a hellscape.
Banditoz 3 hours ago

If the benchmarks are private, how do we reproduce the results? I looked up the Humanity's Last Exam (https://agi.safe.ai/) this model uses and I can't seem to access it.

[-]
- johndough 3 hours ago
  
  You can request access here: https://huggingface.co/datasets/cais/hle
  The test data is purposely difficult to access to reduce the chance of leaking it into the training dataset.
verdverm 4 hours ago

https://huggingface.co/moonshotai/Kimi-K2.6
Is this the same model?
Unsloth quants: https://huggingface.co/unsloth/Kimi-K2.6-GGUF
(work in progress, no gguf files yet, header message saying as much)

[-]
- SwellJoe 2 hours ago
  
  A trillion parameters is wild. That's not going to quantize to anything normal folks can run. Even at 1-bit, it's going to be bigger than what a Strix Halo or DGX Spark can run. Though I guess streaming from system RAM and disk makes it feasible to run it locally at <1 token per second, or whatever. GLM 5.1, at 754B parameters, is already beyond any reasonable self-hosting hardware (1-bit quantization is 206GB). Maybe a Mac Studio with 512GB can run them at very low-bit quantizations, also pretty slowly.
  
  [-]
  - jauntywundrkind 2 hours ago
    
    A huge dual socket Epyc system used to be able to get to 1TB without difficulty. 16 dimms of 64gb each. Doable for ~$3000. With considerable memory bandwidth.
    Our hope these days seems to be that maybe perhaps possibly High Bandwidth Flash works out. Instead of 4, 8, or maybe more for some highest end drives, having many many many dozens of channels of flash.
    Ideally that can be very very near to the inference. PCIe 7.0 is 0.5Tb/s at 16x which is obviously nowhere remotely near enough throughout here. The difficulty is sort of that nand has been trying to be super dense, so as you scale channels you would normally tend to scale nand capacity too, and now instead of a 2tb drive you have a 200tb drive prices way beyond consumer means. Still, I think HBF is perhaps the only shot of the most important thing in computing going from mainframe back to consumer, and of course the models are going to balloon again if this dies hit, probably before consumers ever get a chance.
- Balinares 3 hours ago
  
  Quite curious how well real usage will back the benchmarks, because even if it's only Opus ballpark, open weights Opus ballpark is seismic.
- gpm 3 hours ago
  
  Huh, so the metadata says 1.1 trillion parameters, each 32 or 16 bits.
  But the files are only roughly 640GB in size (~10GB * 64 files, slightly less in fact). Shouldn't they be closer to 2.2TB?
  
  [-]
  - coder543 2 hours ago
    
    The description specifically says:
    "Kimi-K2.6 adopts the same native int4 quantization method as Kimi-K2-Thinking."
  - johndough 2 hours ago
    
    The bulk of Kimi-K2.6's parameters are stored with 4 bits per weight, not 16 or 32. There are a few parameters that are stored with higher precision, but they make up only a fraction of the total parameters.
    
    [-]
    - gpm 2 hours ago
      
      Huh, cool. I guess that makes a lot of sense with all the success the quantization people have been having.
      So am I misunderstanding "Tensor type F32 · I32 · BF16" or is it just tagged wrong?
      
      [-]
      - liuliu 25 minutes ago
        
        I32 are 8 4-bit value packed into one int32.
      - rockinghigh 54 minutes ago
        
        The MoE experts are quantized to int4, all other weights like the shared expert weights are excluded from quantization and use bf16.
swingboy 4 hours ago

Exciting benchmarks if true. What kind of hardware do they typically run these benchmarks on? Apologies if my terminology is off, but I assume they're using an unquantized version that wouldn't run on even the beefiest MacBook?
irthomasthomas 4 hours ago

Beats opus 4.6! They missed claiming the frontier by a few days.

[-]
- NitpickLawyer 4 hours ago
  
  While I'm skeptical of any "beats opus" claims (many were said, none turned out to be true), I still think it's insane that we can now run close-to-SotA models locally on ~100k worth of hardware, for a small team, and be 100% sure that the data stays local. Should be a no-brainer for teams that work in areas where privacy matters.
  
  [-]
  - cedws 4 hours ago
    
    Even the smaller quantized models which can run on consumer hardware pack in an almost unfathomable amount of knowledge. I don't think I expected to be able to run a 'local Google' in my lifetime before the LLM boom.
    
    [-]
    - sterlind 2 hours ago
      
      I'm extremely curious how these models learn to pack a lossily-compressed representation of the entire Internet (more or less) into a few hundred billion parameters. like, what's the ontology?
  - osti 3 hours ago
    
    I think this one is only about 600GB VRAM usage, so it could fit on two mac studios with 512GB vram each. That would have costed (albeit no longer available) something like less than 20k.
    
    [-]
    - NitpickLawyer 3 hours ago
      
      Yeah, but that's personal use at best, not much agentic anything happening on that hardware. Macs are great for small models at small-medium context lengths, but at > 64k (something very common with agentic usage) it struggles and slows down a lot.
      The ~100k hardware is suitable for multi-user, small team usage. That's what you'd use for actual work in reasonable timeframes. For personal use, sure macs could work.
      
      [-]
      - osti 1 hour ago
        
        True, but I think for local models, we are mostly considering personal usage.
    - zozbot234 3 hours ago
      
      You could run it with SSD offload, earlier experiments with Kimi 2.5 on M5 hardware had it running at 2 tok/s. K2.6 has a similar amount of total and active parameters.
      
      [-]
      - osti 1 hour ago
        
        Yeah... I would definitely call 2t/s unusable. For simple chats, I'd want at least 15 t/s. For agentic coding (which this model is advertised for), I'd want good prefill performance as well.
- pixel_popping 3 hours ago
  
  It doesn't beat Opus 4.6, no way, don't be fooled by benchmarks.
- BoorishBears 4 hours ago
  
  Opus is clearly a sidegrade meant to help Anthropic manage cost, so I would say they may have it if it actually beats 4.6
  
  [-]
  - irthomasthomas 4 hours ago
    
    Could be right. I just noticed my feed is absent the usual flood of posts demoing the new hotness on 3D modeling, game design and SVG drawings of animals on vehicles.
cassianoleal 3 hours ago

If only their API wasn't tied to a Google or phone login...

[-]
- jenkstom 2 hours ago
  
  If it's open then there will be multiple providers. I see it is on OpenRouter now.
  
  [-]
  - cassianoleal 1 hour ago
    
    I'm going to experiment with this, but unless it's insanely more efficient in token usage than anything else I've tried, the only way to keep costs more or less acceptable is through a subscription.
- atemerev 2 hours ago
  
  Why use "their API"? It is an open model, use any provider on OpenRouter
  
  [-]
  - wolttam 1 hour ago
    
    Because sometimes (a lot of the time in my experience) third-party providers and inference engines fail to implement the model correctly in ways that are sometimes very subtle and not obvious.
    Deepinfra for example is not preserving thinking correctly for GLM5.1, even though they are for GLM5. This is one of the more obvious issues that crop up.
esafak 4 hours ago

K2.5 was already pretty decent so I would try this. Starting at $15/month: https://www.kimi.com/membership/pricing
edit: Note that you can run it yourself with sufficient resources (e.g., companies), or access it from other providers too: https://openrouter.ai/moonshotai/kimi-k2.6/providers

[-]
- pbowyer 3 hours ago
  
  What's the privacy/data security like? I can't find that on that page.
  Edit: found it.
  > We may use your Content to operate, maintain, improve, and develop the Services, to comply with legal obligations, to enforce our policies, and to ensure security. You may opt out of allowing your Content to be used for model improvement and research purposes by contacting us at membership@moonshot.ai. We will honor your choice in accordance with applicable law.
  Section 3 of https://www.kimi.com/user/agreement/modelUse?version=v2
  
  [-]
  - gpm 3 hours ago
    
    > We will honor your choice in accordance with applicable law.
    So in other words only if you can point to a local law which requires them to comply with the opt out?
    
    [-]
    - jdasdf 2 hours ago
      
      most laws enforce agreements.
      
      [-]
      - gpm 2 hours ago
        
        Yes... but the agreement only says they won't train on your data if the law is already preventing them from doing so.
  - pixel_popping 3 hours ago
    
    You really rely on ToS from Anthropic/OpenAI to know if they use your prompts or not? It's on their servers, why wouldn't they use our data?
  - deaux 2 hours ago
    
    Yup, they train on your inputs and OpenRouter is complicit by claiming that Moonshot's ToS says that they don't. Contacted OpenRouter about this a while ago and was met with silence because it's bad for their business to stop lying about it.
- SwellJoe 2 hours ago
  
  "sufficient resources" is going to be a lot of resources. I doubt this will run on even something like a Strix Halo or DGX Spark, even at 1-bit quantization. You'll need a 256GB or 512GB Mac Studio, or a monster GPU situation, to run it locally, I think, though quantized versions aren't showing up yet, to be sure.
- wg0 3 hours ago
  
  How are the usage limits compared to Anthropic?
  
  [-]
  - greenavocado 3 hours ago
    
    Anthropic has the worst usage limits in the industry
    
    [-]
    - andriy_koval 3 hours ago
      
      gemini is worse imo
      
      [-]
      - deaux 2 hours ago
        
        You're correct, Gemini chat limits are a joke at their chapest paid tier compared to both Claude and GPT. Especially crazy when you consider Gemini 3 Pro is more than twice as cheap as Opus 4.6 on the API. It's hard to run into pure chat limits on Claude even if you only use Opus on the cheapest tier, whereas with Gemini it's easy to hit.
        Not sure about coding usage, Google being weird about these things I could see that quota being separate.
antirez 2 hours ago

Here I analyze the same linenoise PR with Kimi K2.6, Opus, GPT. https://www.youtube.com/watch?v=pJ11diFOjqo
Unfortunately the generation of the English audio track is work in progress and takes a few hours, but the subtitles can already be translated from Italian to English.
TLDR: It works well for the use case I tested it against. Will do more testing in the future.
nisegami 3 hours ago

The choice of example task for Long-Horizon Coding is a bit spooky if you squint, since it's nearing the territory of LLMs improving themselves.
greenavocado 4 hours ago

I pray the benchmark figures are true so I can stop paying Anthropic after screwing me over this quarter by dumbing down their models, making usage quotas ridiculously small, and demanding KYC paperwork.

[-]
- turblety 1 hour ago
  
  Absolutely. Thing is, I'd actually rather take a worse model than Anthropic, so long as it's consistent. Like, a model that can successfully do well for 80% of tasks is much better than Anthropic that some days will be 90% other 60%.
  When you have a consistent model, you can incorporate fixes/prompts into your workflow to make it behave better. But this, always having to guess if Anthropic has quantised the model today, wastes so much time and effort.
- conradkay 1 hour ago
  
  Codex has a lot better limits, and 5.5 will be out soon
- deaux 3 hours ago
  
  > dumbing down their models,
  This should be so easy to prove if it were true. Yet there is none of it, just vibes.
  Still, your other two points are completely valid. The opaqueness of usage quotas is a scam, within a single month for a single model it can differ by more than 2x. And this indeed has been proven.
- jollymonATX 3 hours ago
  
  Anthropic has done horrible PR and investors should be livid.
  
  [-]
  - greenavocado 3 hours ago
    
    My theory is they pushed retail off their systems to make room for their new corporate fat cat clients. In which case, they'll do just fine.
jauntywundrkind 2 hours ago

I really wish some of these very-long-horizon runs were themselves open sourced (open released open access). Have the harness setup to do git committing automatically of the transcript and code, offload the git commit message making. Release it all.
This sounds so so so cool. It would be so amazing to see this unfurl:
> Kimi K2.6 successfully downloaded and deployed the Qwen3.5-0.8B model locally on a Mac. By implementing and optimizing model inference in Zig—a highly niche programming language—it demonstrated exceptional out-of-distribution generalization. Across 4,000+ tool calls, over 12 hours of continuous execution, and 14 iterations, Kimi K2.6 dramatically improved throughput from ~15 to ~193 tokens/sec, ultimately achieving speeds ~20% faster than LM Studio.
cmrdporcupine 3 hours ago

Running it through opencode to their API and... it definitely seems like it's "overthinking" -- watching the thought process, it's been going for pages and pages and pages diagnosing and "thinking" things through... without doing anything. Sitting at 50k+ output tokens used now just going in thought circles, complete analysis paralysis.
Might be a configuration or prompt issue. I guess I'll wait and see, but I can't get use out of this now.

[-]
- jbaiter 1 hour ago
  
  Had the same experience using it for a refactor of a 3k LOC monolith via the Pi harness and OpenRouter. After burning through $8 worth of tokens it left the code in a broken state, the "thoughts" were full of loops where it would edit the monolith, then refer back to the original file, not finding it and then overwriting its changes with "git checkout --"
  
  [-]
  - cmrdporcupine 18 minutes ago
    
    It's probably bad harness. I had a similar bad experience with qwen max yesterday also through opencode.
    In the past I tried Kimi thru Claude code I might try that again
oliver236 3 hours ago

isnt this better than qwen?

[-]
- Alifatisk 1 hour ago
  
  We'll have to wait for the results on Artificial analysis
XCSme 3 hours ago

(commented on the wrong thread, HN doesn't let me delete it :( )

[-]
- wizee 2 hours ago
  
  They're comparing to Opus 4.6, not 4.5. It was Anthropic's best public model up until last week.
  
  [-]
  - zozbot234 2 hours ago
    
    Some people would say it's still Anthropic's best public model!
  - XCSme 2 hours ago
    
    Yeah, I noticed that, HN doesn't let me delete my comment.
    The other release, Qwen-3.6-Max is the one comparing it to 4.5