Hello Reader,
Two AI-related posts in a row might seem like a lot, but I’ve been incorporating AI into my research for the past couple of years, so it naturally comes up in my work. While I’m not quite ready to share what I’ve been doing recently with Cursor, an AI IDE, I do want to showcase how I’ve used AI to create better datasets for forensic testing.
Creating engaging datasets for others to learn from can be a challenge, particularly when it comes to adding interesting elements for people to discover. In the past, I relied on downloading random free ebooks or generating large files with random data. However, I wondered if AI could improve this process. So, I tried this prompt:
“Create a PowerShell script that will generate a series of files with names related to AI technologies and themed around projects called ‘AI Tacos’ and ‘AI Burritos.’ The script should create 100 files with random data to vary their sizes. The files should use popular extensions and avoid numbers in the names. Instead, include at least 30 unique terms commonly used in research and development.”
The AI-generated script worked perfectly. When executed, it created 100 files in my specified directory, each with randomized content and varied file extensions. The result, as shown in the screenshot below, added depth and realism to my sample datasets.
This kind of themed, random data is incredibly useful for making datasets feel more authentic. For example, I can create a new virtual hard drive (VHD), populate it with these files, and then focus on generating interactions that leave digital artifacts for forensic analysis.
If you’re looking to populate other types of content, AI can help there too. For instance, this prompt:
“Write 5 text messages between two friends named David Cowen and Matthew Seyer discussing what kind of tacos to have for lunch.”
Produced the following conversation:
David Cowen: Hey Matt, I'm thinking tacos for lunch today. You up for it?
Matthew Seyer: Absolutely, dude! I could eat tacos every day. Any idea what kind?
David Cowen: I'm torn between fish tacos and something spicy, like chipotle chicken. What do you think?
Matthew Seyer: Fish tacos would be great, but I’m kind of craving some good old barbacoa. How about we do both?
David Cowen: Perfect idea! Let’s split an order of each so we can try everything. I’ll meet you in 20.
I even took it a step further and used AI to craft detailed fictional personas. For example, I used this prompt:
“Create a background story for a fictional employee at a company called TacoCorp who has worked there for 10 years as an IT administrator. This person has been recruited by BurritoTown to steal secrets from TacoCorp. Provide motivations, names, and profiles for their boss, coworker, new boss at BurritoTown, and best friend who’s a local police officer. Include a CV, LinkedIn bio, and 10 social media posts detailing their life.”
The AI generated a detailed persona, including:
- Name: Oliver Delgado
- Email: [email protected]
- Social Media Handle: @OllieTacoTech
Motivations for Betrayal:
Oliver felt undervalued and underpaid after a decade at TacoCorp. Financial pressure and resentment toward his boss pushed him to accept BurritoTown’s lucrative offer to steal TacoCorp’s secrets.
Supporting Characters:
- Franklin Hoyt, a strict and demanding boss at TacoCorp.
- Samantha Mills, a friendly coworker unaware of Oliver’s frustrations.
- Grace Li, the ambitious new boss at BurritoTown.
- Daniel Morales, Oliver’s loyal best friend and a local detective.
The AI even provided Oliver’s CV, LinkedIn bio, social media posts, and a detailed headshot description for his persona.
This ability to quickly generate characters, scenarios, and realistic data allows me to focus on ensuring the technical accuracy of my work. AI tools are invaluable for filling in creative gaps and enhancing the realism of forensic datasets. Don’t underestimate how much they can help!