“The Last Ones” is, “a 32-step corporate network attack simulation spanning initial reconnaissance through to full network takeover, which AISI estimates to require humans 20 hours to complete.” The lines are the average performance across multiple runs (10 runs for Mythos, Opus 4.6, and GPT-5.4), with the “max” lines representing the best of each batch. Mythos was the only model to complete the task, in 3 out of its 10 attempts.
This chart suggests an interesting security economy: to harden a system we need to spend more tokens discovering exploits than attackers spend exploiting them.
El siguiente framing me hace hacerme muchas preguntas:
Code remains cheap, unless it needs to be secure. Even if costs go down as inference optimizations, unless models reach the point of diminishing security returns, you still need to buy more tokens than attackers do. The cost is fixed by the market value of an exploit.
🤯
Comentarios
No hay comentarios aun.
Inicia sesión para comentar.