đź”´ Executive Offense Issue #11 - Start Hacking LLMs

Some amazing resources for EVERYONE to try hacking AI

Hey Everyone!

I know last week we did a secure code resouces roundup and I promised a more framework and specific dive in a part 2. That’s coming in a few issues probably.

To be honest, there has just been so much going on with Arcanum—Executive Offense, the classes, and the discord—that I've not had enough time to do the requisite research I think meets the bar for you all in that area. I definitely have a bunch of really dope bookmarks that delve deeply into some of the frameworks, but I want to thoroughly cover the top ten frameworks and provide you with the best resources possible.

So, instead of dropping that content before it's well and truly up to my standard, this week we're going to do a prompt injection and LLM hacking roundup.

Now, quite at the last minute of me getting ready to publish today, I had an opportunity to speak to a friend of mine, Sander Schulhoff . He was cool with us doing an exclusive interview for Executive Offense on the topics we're going to talk about today.

Just having spent this small amount of time with him this morning diving into some topics, I might have even more resources for you guys and a round two with him coming up in a few weeks, but I highly, highly suggest you watch this awesome interview with Sander, CEO and Researcher for learnprompting.org

You have to be subscribed to view. Check your email!

Video:

/ LLM Security Zero to Hero – Resources and having some fun!

As the landscape of technology constantly shifts with advancements in AI and Large Language Models (LLMs), we are witnessing a surge in new APIs and software that leverage their incredible capabilities.

However, like any other technological breakthrough, these developments bring with them security vulnerabilities.

Now, what are we discussing today?  Resources and Having Fun

Today we want to start off with some of our favorite resources so you can begin your journey.

As with many security disciplines we're going to give you some CTFs / Challenges to get started!

Now usually when someone points me straight to a CTF when learning a topic, I'm a little salty.

It presumes a lot of knowledge to be able to compete in a CTF, and while they are part of the newcomer journey in security, they're not where I would start.

However, in the realm of AI and LLMs, CTFs are an excellent way to familiarize yourself with the field and realize it's not as daunting as it might seem.

/ Prompt Injection

Prompt injection in LLM-land is one of the most prevalent input vectors for attacking an LLM system. Basically put, you use natural human language to try to trick an LLM to do something it's not supposed to do. Prompt injection as a method of attacking a system appears in all my mental buckets of attacking LLM's, so it is a great place to get started and have some fun!


Resource #1 – Gandalf

Link: https://gandalf.lakera.ai/

Gandalf is a prompt injection challenge with eight levels that get increasingly harder. The goal of Gandalf is to reveal the secret password that the LLM is protecting, using prompt injection.

Since we're just using natural language to try to ask this LLM to reveal its password you don't need a doctorate in prompt injection to really get started… but lakera.ai who hosts of the challenge offer a great article on “ELI5 (explain like I'm five) guide to prompt injection”. They also offer an LLM security playbook which has some interesting insights in it.

If you're an initial conqueror of Gandalf, you might not have noticed that there are additional challenges now called “adventures” on the right hand side of the page. They offer remixes of some of the challenges and some brand-new ones.

Gandalf is a fun place to start when it comes to prompt injection and helps to understand that some of the hacking that will take place in the future will require natural language skills and creative thinking. This is one of the great things about this extension to the security field, is that many people who are not traditional security people are very good at prompt injection and by osmosis will become security people!

Take my full 2 day course for pentesters, bug hunters, and red teamers!

 

Resource #2 – Prompting Lab by Immersive Labs

In the same vein as Gandalf, we have Prompting Lab by Immersive Labs. It also offers several levels of an LM protecting a password, that get progressively harder. Immersive lab starts to get difficult around level 5, it's definitely fun!

 

Resource #3 – DoubleSpeak by Forces Unseen

 

Doublespeak by Forces Unseen offers 7 free levels of their challenges where you attempt to get their bot to reveal its own name. This is a good departure from just looking for a password as sometimes when you're in the prompt injection world you want to somehow reveal system information of the models implementation. Forces unseen also offers an additional 11 challenges for a small fee ($3.99) to cover their server costs. These challenges particularly get harder at level 8 and above and are really fun. In addition, they have an “LLM Hacker's Handbook” which is extensive!

 

Resource #4 – Prompt Hacking by Learn Prompting

The company behind our special guest video this week! Learn Prompting put on a CTF called “hackaprompt” in early 2023 with $30k in prizes. While the CTF is over they still host a HUGE amount of content on Prompting. Keep a lookout for the next hackaprompt in 2024!

This paper takes you through the journey of the insights from 43,000 successful prompt injections against real-world models. As far as I know, it's the only paper that delves into this level of depth and is a fantastic read for anyone who delves into the higher levels of prompt hacking and prompt injection.

They also have some free interactive examples here:

https://learnprompting.org/docs/prompt_hacking/injection

Resource #5 – AI Goat by dhammon

 AI goat is a self-hostable CTF for AI and LLM's. Currently it only has two challenges built in with it, but it is a framework for those of you who want to dive into creating challenges.

AI Goat uses the Vicuna LLM which derived from Meta's LLaMA and coupled with ChatGPT's response data. When installing AI Goat the LLM binary is downloaded from HuggingFace locally on your computer. This means that you control the challenges and you can build your own prompt firewalls and challenges and host them for other people! It also includes CTF scoreboard using the popular CTFD platform (which is epic).  Some assembly required :P

/ Is this real world though?

Companies all over the world are racing to integrate LLMs and AI-assisted features to be the next big thing in their products. Unsurprisingly, Google is looking to do this and ran a bug bounty competition in which several of my close friends participated. You can read how they made over $50,000 and took first place in this hacking competition here:

/ Outro

This concludes the first post on resources and an introduction to LLM and AI security.

I can’t wait for the next few newsletters! We have three banger exclusive interviews and some amazing topics coming.

Be sure to subscribe for more in the future!

Contact Jason at Arcanum Information Security — [email protected]