I make software and web things, I run small businesses, and I live near Cambridge, UK.
If you’d like to discuss anything I’ve talked about on HN with me, you’re welcome to get in touch. I’m hn at firstname dash surname dot co dot uk.
If you’re interested in engaging me professionally or have a question about one of my businesses, genuine enquiries are also welcome at the same address.
Interesting, thanks. In the UUID example you mentioned, it seems the CodeQL model is missing some information about how FastAPI’s runtime validation works and so not drawing correct inferences about the types. It doesn’t seem to have a general problem with tracking request parameters coming into Python web frameworks — in fact, the first thing that really impressed me about CodeQL was how accurate its reports were with some quite old Django code — but there is a lot more emphasis on type annotations and validating input against those types at runtime in FastAPI.
I completely agree about the problem of someone deciding to turn these kinds of scanning tools on and then expecting they’ll Just Work. I do think the better tools can provide a lot of value, but they still involve trade-offs and no tool will get everything 100% right, so there will always be a need to review their output and make intelligent decisions about how to use it. Scanning tools that don’t provide a way to persistently mark a certain result as incorrect or to collect multiple instances of the same issue together tend to be particularly painful to work with.
This is true and customers do a lot of unfortunate things in the name of security theatre. Sometimes you have to play the cards you’ve been dealt and roll with it. However, educating them about why they’re wasting significant amounts of money paying you to deal with non-problems does sometimes work as a mutually beneficial alternative.
OK, but all I said before was that CodeQL’s approach where it supplies a specific example to support a specific problem report is inherently resistant to false positives.
Clearly it is still possible to generate a false positive if, for example, CodeQL’s algorithm thinks it has found a path through the code where unsanitised user data can be used dangerously, but in fact there was a sanitisation step along the way that it didn’t recognise. This is the kind of situation where the theoretical result about not being able to determine whether a semantic property holds in all cases is felt in practical terms.
It still seems much less likely that an algorithm that needs to produce a specific demonstration of the problem it claims to have found will result in a false positive than the kind of naïve algorithms we were discussing before that are based on a generic look-up table of software+version=vulnerability without any attempt to determine whether there is actually a path to exploit that vulnerability in the real code.
If you replace a dependency that has a known vulnerability with a different dependency that does not, surely that is objectively an improvement in at least that specific respect? Of course we can’t guarantee that it didn’t introduce some other problem as well, but not fixing known problems because of hypothetical unknown problems that might or might not exist doesn’t seem like a great strategy.
This project is an enhanced reader for Ycombinator Hacker News: https://news.ycombinator.com/.
The interface also allow to comment, post and interact with the original HN platform. Credentials are stored locally and are never sent to any server, you can check the source code here: https://github.com/GabrielePicco/hacker-news-rich.
For suggestions and features requests you can write me here: gabrielepicco.github.io