I tweeted at the OSUNLP and they're backed up on eval validation. In the meantime, here's the benchmark repo with the saved runs and also instructions on how to run it locally. https://github.com/theredsix/abp-online-mind2web-results
Good question! ABP keeps a list of all same/parent/sibling network request and wait for them to complete within a timeout. If the timeout hits, it'll still freeze and screenshot back to the agent. There's a browser_wait() that the agent can call with increased timeouts to wait for network requests + DOM changes.
Interesting, I wonder if this would help with other projects too, one project that comes to mind is archivebox, I don't know if they still have the issue I'm thinking of, but archivebox eventually had the Chrome instances (as the meme goes) basically consume all available RAM. If by freezing execution this could stop that, it could be useful for more than just AI agents.
Yeah, I noticed CPU use goes to near zero during the pausing phase. You can also trigger pause via REST/MCP so a script can take advantage of these abilities as well.
agent-browser's biggest selling point is a CLI wrapper around CDP/puppeteer for context management. It'll have mostly the same pros/cons as CDP on the table.
[delayed]
> As proof, ABP with opus 4.6 as the driver scores 90.5% on the Online Mind2Web benchmark
And what does opus score with "regular" browser harnesses?
90% easy or 90% average?
90% average with 85.51% hard!
Nice! Will take a look at this for my homelab - was debating using crawl.cloudflare.com to try it out, as browser rendering was my next stretch goal.
https://huggingface.co/spaces/osunlp/Online_Mind2Web_Leaderb...
Hm I can't see Opus 4.6 on there
I tweeted at the OSUNLP and they're backed up on eval validation. In the meantime, here's the benchmark repo with the saved runs and also instructions on how to run it locally. https://github.com/theredsix/abp-online-mind2web-results
how do you know when a page is "settled"?
Good question! ABP keeps a list of all same/parent/sibling network request and wait for them to complete within a timeout. If the timeout hits, it'll still freeze and screenshot back to the agent. There's a browser_wait() that the agent can call with increased timeouts to wait for network requests + DOM changes.
load event or "DOMContentLoaded" event. No?
Interesting, I wonder if this would help with other projects too, one project that comes to mind is archivebox, I don't know if they still have the issue I'm thinking of, but archivebox eventually had the Chrome instances (as the meme goes) basically consume all available RAM. If by freezing execution this could stop that, it could be useful for more than just AI agents.
Yeah, I noticed CPU use goes to near zero during the pausing phase. You can also trigger pause via REST/MCP so a script can take advantage of these abilities as well.
Love it! From first principles: this kinda answers the "do we really even need CDP" I always have in my head building browser use...
Totally, I feel that CDP was designed for a different category of automations.
Op here, happy to answer any question!
How does it compare with https://agent-browser.dev/ ? It would be great if you could add it to your table: https://github.com/theredsix/agent-browser-protocol?#compari...
agent-browser's biggest selling point is a CLI wrapper around CDP/puppeteer for context management. It'll have mostly the same pros/cons as CDP on the table.
Updated the table!