7 comments

  • whs 6 hours ago

    I signed up, feels like this is something that should've existed long ago.

    Your privacy policy isn't good for a privacy focused provider though. You shouldn't have the rights to use my personal information. The use of Google Tag Manager also not inspire confidence, especially in LLM pages where you might "accidentally" install a user monitoring script and the prompts get logged. I'd suggest looking at how Kagi do the marketing to privacy-conscious customers.

    • cofob_ 12 hours ago

      Cool!

      How are messages counted? For example, in Cursor, one request is 25 tool calls. Does 100 messages in a subscription here mean 100 tool calls or 100 requests each with 25 tool calls?

      When it comes to privacy, there are also some questions. It says that requests can only be used for debugging purposes, but it later mentions a license for using the requests to improve the platform, which can mean that you can use it not only for debugging purposes.

      • jml78 7 hours ago

        I currently use Cerebras for qwen3. One of the things I like is its speed(the TPM limit is rough). I am curious, how fast is qwen3 on your platform and what quantization are you running for your models?

        • rationably 17 hours ago

          Do you plan to offer a high-quality FIM models in the bundle? Would be handy to perform autocompletion locally, say via the Qwen3-coder.

          • reissbaker 17 hours ago

            Interesting! Very open to the idea. What open-source fill-in-the-middle models are good right now? I've stayed on top of the open source primary coding LLMs, but haven't been following along for the open-source FIM ones.

          • logicprog 21 hours ago

            I was literally just wishing there was something like this, this is perfect! Do you do prompt caching?

            • reissbaker 21 hours ago

              Aw thanks! We don't currently, but from a cost perspective as a user it shouldn't matter much since it's all bundled into the same subscription (we rate-limit by requests, not by tokens — our request rate limits are set to "higher than the amount of messages per hour that Claude Code promises", haha). We might at some point just to save GPUs though!

              • logicprog 19 hours ago

                Yeah I wasn't worried so much about costs to me, as sustainability of your own prices — don't want to run into a "we're lowering quotas" situation like CC did :P

                • reissbaker 19 hours ago

                  Lol fair! I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5, which fits on a ~relatively small node compared to the rumored sizes of Anthropic's models. We can throw a lot of tokens at it before running into issues — it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.

                  • logicprog 18 hours ago

                    > I think we're safe for now; our most popular model (and my personal favorite coding model) is GLM-4.5,

                    That's funny, that's also my favorite coding model as well!

                    > the rumored sizes of Anthropic's models

                    Yeah. I've long had a hypothesis that their models are, like, average sized for a SOTA model, but fully dense, like that old llama 3.1 405b model, and that's why their per token inference costs are insane compared to the competition.

                    > it's kind of nice to launch without prompt caching, since it means if we're flying too close to the sun on tokens we still have some pretty large levers left to pull on the infra side before needing to do anything drastic with rate limits.

                    That makes sense.

                    I'm poor as dirt, and my job actually forbids AI code in the main codebase, so I can't justify even a $20 per month prescription right now (especially when, for experimenting with agentic coding, qwen code is currently free (if shitty)) but when or if it becomes financially responsible, you will be at the very top of my list.

          • ykjs 16 hours ago

            Can this be provided as an API?

            • reissbaker 16 hours ago

              Yes! We have a standard OpenAI-compatible API, and we don't restrict subscriptions from using it (unlike Anthropic, where API keys are billed differently unless you're using Claude Code directly, or in a tool that wraps Claude Code).

            • paool 13 hours ago

              how would I point to your API to use in a Mastra ai agent?

              • reissbaker 12 hours ago

                I'm not deeply familiar with Mastra, but reading their docs, it looks like they use the Vercel AI SDK — which is great, since Vercel's AI SDK can work with any OpenAI-compatible API, including ours. All you need to do is set a custom API base URL; in our case, that's https://api.synthetic.new/v1

                Then just plug in your Synthetic API key, and you should be able to use any supported model. For example, to use GLM-4.5, you'd pass the following model string: "hf:zai-org/GLM-4.5"

                The AI SDK docs are here for using custom base URLs: https://ai-sdk.dev/docs/ai-sdk-core/provider-management

                You can also join our Discord if you need help! https://synthetic.new/discord should redirect you to our Discord server :)