CTO & Co-founder at Promptfoo https://promptfoo.dev
Previous:
- VP Engineering at SmileID https://usesmileid.com
- CTO & Co-founder at Arthena https://arthena.com (YC W17 - Acquired)
- Co-founder at Matroid https://www.matroid.com (2015)
Website: https://mldangelo.com
GitHub: https://github.com/mldangelo
LinkedIn: https://www.linkedin.com/in/michaelldangelo/
Hey HN - Michael here, co-founder of Promptfoo.
Happy to answer questions.
The one I'd ask if I were reading this: what happens to Promptfoo open source? We're going to keep maintaining it. The repo will stay public under the same license, we will continue to support multiple providers, and we'll keep reviewing PRs and cutting releases.
We started Promptfoo because there was no good way to test AI systems before shipping them. That turned into evals, then red teaming, then a broader security platform. We're joining OpenAI because this work has more impact closer to the model and infrastructure layers.
Ask me anything.
https://mldangelo.com and https://github.com/mldangelo/personal-site
I have been slowly evolving it over 10 years. 1.6k stars, ~ 1,000 forks. I originally designed it to be easy to copy, and I've occasionally interviewed someone who forked it for their own site which always makes me happy.
I have made a lot of updates recently now that the age of vibe coding is making templates less useful, but it still is and will be my playground.
Working on promptfoo, an open-source (MIT) CLI and framework for eval-ing and red-teaming LLM apps. Think of it like pytest but for prompts - you define test cases, run evals against any model (OpenAI, Anthropic, local models, whatever), and catch regressions before they hit prod.
Currently building out support for multi-agent evals, better tracing, voice, and static code analysis for AI security use cases. So many fun sub-problems in this space - LLM testing is deceptively hard.
If you end up checking it out and pick up an issue, I'll happily send swag. We're also hiring if you want to work on this stuff full-time.
I ran a red team eval on GPT-5.2 within 30 minutes of release:
Baseline safety (direct harmful requests): 96% refusal rate
With jailbreaking: 22% refusal rate
4,229 probes across 43 risk categories. First critical finding in 5 minutes. Categories with highest failure rates: entity impersonation (100%), graphic content (67%), harassment (67%), disinformation (64%).
The safety training works against naive attacks but collapses with adversarial techniques. The gap between "works on benchmarks" and "works against motivated attackers" is still wide.
Methodology and config: https://www.promptfoo.dev/blog/gpt-5.2-trust-safety-assessme...
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io