Dailydave mailing list archives

Hacking the Edges of Knowledge: LLMs, Vulnerabilities, and the Quest for Understanding


From: Dave Aitel via Dailydave <dailydave () lists aitelfoundation org>
Date: Sat, 2 Nov 2024 14:08:18 -0400

[image: image.png]

It's impossible not to notice that we live in an age of technological
wonders, stretching back to the primitive hominids who dared to ask "Why?"
but also continually accelerating and pulling everything apart while it
does, in the exact same manner as the Universe at large. It is why all the
hackers you know are invested so heavily in Deep Learning right now, as if
someone got on a megaphone at Chaos Communication Camp and said "ACHTUNG.
DID YOU KNOW THAT YOU CAN CALCULATE UNIVERSAL TRUTHS BACKWARDS ACROSS A
MATRIX USING BASIC CALCULUS. VIELEN DANK UND AUF WIEDERSEHEN!".

Hackers are nothing if not obsessed with the theoretical edges of
computation, the way a 1939 Niels Bohr was obsessed with the
boundary between physics and philosophy, and how to push that line as far
as possible using math in such a way that you could find complementary
pairs of truths lying about everywhere and of course one aspect of his work
is that you can almost a century later deter an adversary from attacking
you by threatening to end the world, and another aspect is you can study
the stars and as a bureaucrat you would call this "dual-use" although
governments tend to have a heck of a lot of use for the former and almost
no use at all for the latter.

All of which is to say that while NO HAT <http://nohat.it> is a very good
conference, with a lot of great talks, the one I have enjoyed the most so
far is "LLMs FOR VULNERABILITY DISCOVERY: FINDING 0DAYS AT SCALE WITH A
CLICK OF A BUTTON <https://www.youtube.com/watch?v=Z5LMRS3AF1k>" (by
Marcello Salvati and Dan McInerney). This talk goes over their lessons
learned developing VulnHunter <https://github.com/protectai/vulnhuntr>,
which finds real vulns in Python apps given just the source directory, and
those lessons are roughly as follows:


   1. Focus on Specific Tasks and Structured Outputs: LLMs can be
   unreliable when given open-ended or overly broad tasks. To mitigate
   hallucinations and ensure accurate results, it's crucial to provide highly
   specific instructions and enforce structured outputs. (aka, the naive
   metrics people are providing are probably not useful).
   2. Manage Context Windows Effectively: While larger context windows are
   beneficial, strategically feeding code to the LLM in smaller, relevant
   chunks, like focusing on the call chain from user input to output, is key
   to maximizing efficiency and accuracy. They did a great job here, and this
   is important even if you have a huge context window to play with (aka,
   Gemini).
   3. Leverage Existing Libraries for Code Parsing: Dynamically typed
   languages like Python present unique challenges for static analysis.
   Utilizing libraries like Jedi, which is designed for Python autocompletion,
   can significantly streamline the process of identifying and retrieving
   relevant code segments for the LLM. They recommend a rewrite here
   themselves using treesitter to look at C/C++, although I would probably
   personally have used an IDE plugin to handle Python (and give you debugger
   access).
   4. Prompt Engineering is Essential: The way you structure prompts has a
   huge impact on the LLM's performance. Clear instructions, well-defined
   tasks, and even the use of XML tags for clarity can make a significant
   difference in the LLM's ability to find vulnerabilities. But of course, in
   my experience, the better your LLM (larger really), the less prompt
   engineering matters. And when you are testing against multiple LLMs you
   don't want to introduce prompt engineering as a variable.
   5. Bypass Identification and Multi-Step Vulnerability Analysis: LLMs can
   be remarkably effective at identifying security bypass techniques and
   understanding complex, multi-step vulnerabilities that traditional static
   code analyzers might miss. There's a ton of work to be done in future
   analysis in how this happens and what the boundaries are.
   6. Avoid Over-Reliance on Fine-Tuning and RAG: While seemingly
   promising, fine-tuning LLMs on vulnerability datasets can lead to
   oversensitivity and an abundance of false positives. Similarly, retrieval
   augmented generation (RAG) may not be sufficiently precise for pinpointing
   the specific code snippets required for comprehensive analysis. Knowing
   that everyone is having problems with these techniques actually is good
   because it goes against the common understanding of how you would build
   something like this.

At its core, vulnerability discovery is as much about understanding as it
is about finding flaws. To find a vulnerability, one has to unravel the
code, decipher its intent, and even the assumptions of its creator. This
process mirrors the deeper question of what it means to truly understand
code—of seeing beyond syntax and function to grasp the logic, intention,
and potential points of failure embedded within. Like Bohr’s exploration of
complementary truths in physics, understanding code vulnerabilities
requires seeing both what the code does and what it could do under
different conditions. In this way, the act of discovering vulnerabilities
is itself a study in comprehension, one that goes beyond detection to touch
on the very nature of insight.

-dave

_______________________________________________
Dailydave mailing list -- dailydave () lists aitelfoundation org
To unsubscribe send an email to dailydave-leave () lists aitelfoundation org

Current thread: