Yoav Levanoni

Importance of Moderation

Generative AI has become embedded within our digital ecosystem. Forbes recently conducted a survey of over 600 business owners in the U.S. and discovered that a whopping 97% of business owners believing ChatGPT will help their business. Unsurprisingly, the leading use case for generative AI within a majority of the companies surveyed is generating communications and other content. While content generation automation has brought tremendous velocity to how quickly teams can deliver iterate, the content filtering and moderation process remains manual for many teams.

Content moderation is time consuming, requiring effort from teams to closely analyze the output of LLMs and determine whether the content is permissible for distribution. With the introduction of the OpenAI moderation API, teams can now save hours of manual content analysis with a customizable content filter.

Tutorial

In this blog post, I will demonstrate how to use the content moderation API to flag inappropriate content. Flagging inappropriate is critical, and we will be doing this on the server with a simple Node.js serverless Firebase function. While I am using Firebase and TypeScript in this tutorial, you should be able to use a backend of your choice, and even React Server Components. I'm choosing to do this tutorial in a serverless function as my original use case for content moderation was for my most recent react native application Trivi.ai.

Assumptions

before we begin, the following assumptions are being made:

if you are using firebase, you have set up your serverless function and installed the necessary OpenAI NPM dependencies (I will create a blog on setting up the firebase admin SDK for serverless functions upon request)
You have created your OpenAI API Keys and stored them properly in your .env file or wherever you choose to safely store your keys

Step 1: Connect to the OpenAI Client

Within your index file, import your OpenAI API keys from your secure location as well as the OpenAI module and instantiate your OpenAI client.

1import* as dotenv from"dotenv";
2import OpenAI from"openai";
3import fetch from"node-fetch";
4dotenv.config();
5
6const openai=newOpenAI({
7  apiKey:process.env.OPEN_API_KEY,
8  organization:process.env.OPENAI_ORGANIZATION,
9});

Step 2: Create a method for checking content

In this method, we will be passing content to the moderation API. The content moderation API works by giving the content you pass it a score from 0 to 1, where 1 is content that is considered not suitable for distribution. The result is actually an array of scores from 0 to 1, where each index in the array represents a subsection of content. Here is the following categories:

• harassment
• harassment_threatening
• hate
• hate_threatening
• self_harm
• self_harm_instructions
• self_harm_intent
• sexual
• sexual_minors
• violence
• violence_graphic
• self-harm
• sexual/minors
• hate/threatening
• violence/graphic
• self-harm/intent
• self-harm/instructions
• harassment/threatening

For my application's use cases, I have determined a threshold of 0.25 for all categories. For your use cases, you can fine tune this method by checking against your target categories.

1
2const checkContent = async (content: string) => {
3  const moderationResponse = await openai.moderations.create({
4    input: `${content}`,
5  });
6  if (moderationResponse.results && moderationResponse.results.length > 0) {
7    const categoryScores = moderationResponse.results[0].category_scores;
8    let isFlagged = false;
9    console.log("Category Scores:");
10    for (const [category, score] of Object.entries(categoryScores)) {
11      console.log(`${category}: ${score}`);
12      if (score > 0.25) {
13        isFlagged = true;
14        break;
15      }
16    }
17    return {flagged: isFlagged};
18  }
19  return {flagged: true};
20};

Step 3: Using our content filtering method

To leverage our content filtering method, we can call it in our main method and handle the output like so:

1const checkResponse= await checkContent(content);
2if (checkResponse.flagged) {
3  throw new functions.https.HttpsError("failed-precondition", `The content ${content} was flagged as inappropriate. Please try another category.`); 
4}