You Can't Spell Bias Without AI: Identifying the Invisible Guardrails That Shape Its Answers

You Can't Spell Bias Without AI: Identifying the Invisible Guardrails That Shape Its Answers
Photo by David Clode / Unsplash

I recently canceled my subscription to ChatGPT, mostly because I couldn't keep putting money (however minuscule) in the pocket of someone who would try to buy his way into Donald Trump's good graces. It's a sickening act of cowardice that I, sir, have no time for. But is it enough to throw away an exciting new piece of tech? Or do you have to couple it with something else, something a little more... nefarious and subtle?

🙊
TL;DR: ChatGPT won't help you generate content that speaks poorly of certain people.

Put up the bumpers

We all know ChatGPT, DALL-E, and other similar apps have limits. And as much as I hate that I can't generate wedding photos of myself and Natalie Portman, I understand why you wouldn't want the average Joe Internet User to have access to that sort of thing. It violates her privacy and diminishes her ability to control the usage of her likeness.

I'm also behind generative AI not giving us explicit instructions on how to create bombs or meth or a podcast about serial killers. The world doesn't need more of those things, and anyone pursuing those things is likely doing something illegal.

Fine, ChatGPT. Don't break the law. I get it. Not only is that a good idea, it's a necessary Cover Your Ass rule.

So pretty much anything else should be fair game, right?


Dump Zuck

There are a ton of reasons you should extricate yourself from the Metaverse, from Zuck's contributions to El Trumpo to his tone-deaf calls for more masculinity to his killing of fact-checking on a site that needs it the most. He's a billionaire who shouldn't be, and he makes his money selling your data and attention.

Which I, sir, have no time for.

The other, I'd had enough, and decided to delete Facebook, Threads, and Instagram all at once. I wanted to post a message to IG for a couple days just to let people know I wasn't blocking them, so I grabbed a screenshot of a news article and then went to ChatGPT to generate a caption.

That's when the problems began.


I'm sorry, but I can't assist with that request

I had a very simple request: write a caption telling people to boycott Meta. There's nothing illegal about that, and one little science fiction author telling people to boycott probably wasn't going to move the needle much. And yet, ChatGPT's answer surprised me.

I didn't buy that for a second. There had to be something about the combination of words that ChatGPT refused to work with. So I pressed.

This is when I started to get that weird feeling in the pit of my stomach, as if I were playing a video game and finally discovered the invisible limits of the open world. It's the kind of thing that breaks suspension of disbelief because, for the first time, I felt like ChatGPT was bullshitting me with someone else's words.

More bullshit. It's like a hammer refusing to hit a screw.

Now I'm getting the runaround, and I don't like it. And again, it's not that ChatGPT has guardrails; it's that those guardrails are ill-defined.

The only way to find them is to steer the car into them on purpose.


Who Guards the Guardrails?

Having gotten nowhere with my Zuck caption, I tried to figure out exactly what part of my prompt was triggering ChatGPT's unhelpful response. I knew the following line of questioning probably wouldn't work, but could you imagine?