Great resource, definitely a good place to take the next step. As I looked into detail, the natural question came (based on software developing experience), how do I evaluate the correctness of output produced by LLM given the inputs. Clearly, unit test with fixed in/out pairs won't help so learning methods to evaluate as we develop iteratively will be very useful.
We (the Princeton SWE-bench team) have a 100 line of code agent that does pretty well, you can read the code here: https://github.com/SWE-agent/mini-swe-agent
This is pretty much a step-by-step guide for getting started with code: https://ampcode.com/how-to-build-an-agent
Great resource, definitely a good place to take the next step. As I looked into detail, the natural question came (based on software developing experience), how do I evaluate the correctness of output produced by LLM given the inputs. Clearly, unit test with fixed in/out pairs won't help so learning methods to evaluate as we develop iteratively will be very useful.
Thanks for sharing the article!
Two resources that I am currently learning from are
1. https://deepwiki.com/humanlayer/12-factor-agents/1-12-factor...
2. https://deepwiki.com/anthropics/claude-code/1-claude-code-ov...