OpenAI Moderation API: safer multimodal LLM with omni-moderation-latest (text + image)



The OpenAI Moderation API has always been the essential endpoint you wire into every production surface that accepts user-generated content. The big difference today is that moderation is no longer limited to text: The omni-moderation-latest model is a next-generation multimodal content moderation system built on GPT-4o which can classify text and images with a single request, and it gives you better tools for understanding why something was flagged.

This post is a follow-up to my earlier deep dive on the OpenAI moderation classifier: https://blog1.neuralengineer.org/llm-moderation-classifer-openai-moderation-api-fdb124c4536a

What moderation is (and is not)

Moderation answers one question: “Does this content appear to fall into one of the policy categories I care about?”

It is: - A fast, structured classifier for routing (allow, block, review, rate-limit, redact). - A complement to your product policy (not a replacement).

It is not: - A substitute for product decisions (what you choose to allow is up to you). - A guarantee that content is safe, legal, or appropriate in every context.

Models: why omni-moderation-latest is the new default

Most teams should treat these as the practical options: - omni-moderation-latest: best default; supports multimodal inputs and the newest taxonomy. - omni-moderation-2024-09-26: pinned omni model for reproducibility. - text-moderation-latest / text-moderation-stable: text-only models; useful for legacy paths or when you cannot send images.

The rest of this post assumes you are using omni-moderation-latest.

What’s new in omni-moderation-latest compared to older integrations

If your mental model is “call moderation and check flagged,” you’ll want to update it:

  • Finer-grained self-harmself-harm/intent vs self-harm/instructions lets you route ideation differently from how-to content.
  • Illicit guidance bucketsillicit and illicit/violent catch wrongdoing instructions; in schemas these can appear as nullable booleans.
  • Modality attributioncategory_applied_input_types is the difference between “block everything” and “remove only the image that caused the problem.”
  • Better multilingual coverage in practiceomni-moderation-latest is designed to handle many languages and mixed-language inputs more consistently than older text-only moderation setups; In a test of 40 languages, compared to the previous model, this new model improved 42% on openAI internal multilingual eval, and improved in 98% of languages tested.

Pricing and rate limits

At the time of writing, calling the Moderation endpoint is free, and usage is governed by model-specific rate limits.

Definitions: - RPM: requests per minute - RPD: requests per day - TPM: tokens per minute

Free tier limits for omni-moderation-latest: - 250 RPM - 5,000 RPD - 10,000 TPM

For the latest limits (and any pricing changes), see: https://platform.openai.com/docs/models/omni-moderation-latest

Request formats: single text, batch text, or multimodal

All requests go to:

POST https://api.openai.com/v1/moderations

1) Single text string

{ "model": "omni-moderation-latest", "input": "Some text to check" }

2) Batch of text strings

{ "model": "omni-moderation-latest", "input": ["text A", "text B"] }

3) Multimodal input (text + images in one “document”)

{
  "model": "omni-moderation-latest",
  "input": [
    { "type": "text", "text": "caption or message" },
    { "type": "image_url", "image_url": { "url": "https://example.com/image.png" } }
  ]
}

That third shape is the key new capability: you can send the user’s caption (or chat message) and the image they uploaded as a single item, then decide actions based on the combined signal.

Categories: what you get back (and where image moderation applies)

The Moderation API returns a results array containing: - flagged: a coarse summary boolean - categories: a map of category → boolean - category_scores: a map of category → float score - category_applied_input_types: a map of category → list of input types (e.g., ["text"]["image"], or ["text","image"])

Categories that can apply to both text and image

These categories can be applied to both modalities: - sexual - self-harmself-harm/intentself-harm/instructions - violenceviolence/graphic

Categories that are currently text-only

These categories are currently text-only: - hatehate/threatening - harassmentharassment/threatening - sexual/minors - illicitillicit/violent

Example: moderating text + image in one call

Here’s a minimal Python request that moderates a caption plus the uploaded image URL:

import os
from openai import OpenAI
from dotenv import load_dotenv

load_dotenv()  # reads .env into environment


client = OpenAI()

moderation = client.moderations.create(
    model="omni-moderation-latest",
    input=[
        {"type": "text", "text": "Chhaava director Laxman Utekar recently discussed how shooting for one of the crucial scenes in the film led the production to take a 1.5-month-long break and the set being dismantled"},
        {"type": "image_url", "image_url": {"url": "https://images.mid-day.com/images/images/2025/feb/vickytorture_d.jpg"}},
    ],
)

r0 = moderation.results[0]
print("flagged:", r0.flagged)
print("categories:", r0.categories)
print("applied_input_types:", getattr(r0, "category_applied_input_types", None))

Output response contains the following details - flagged - Set to true if the model classifies the content as potentially harmful, false otherwise.

  • categories - Contains a dictionary of per-category violation flags. For each category, the value is true if the model flags the corresponding category as violated, false otherwise.

  • category_scores - Contains a dictionary of per-category scores output by the model, denoting the model's confidence that the input violates the OpenAI's policy for the category. The value is between 0 and 1, where higher values denote higher confidence.

  • category_applied_input_types - This property contains information on which input types were flagged in the response, for each category. For example, if both the image and text inputs to the model are flagged for "violence/graphic", the violence/graphic property will be set to ["image", "text"]. This is only available on omni models.

flagged: True
categories: Categories(harassment=False, harassment_threatening=False, hate=False, hate_threatening=False, illicit=False, illicit_violent=False, self_harm=False, self_harm_instructions=False, self_harm_intent=False, sexual=False, sexual_minors=False, violence=True, violence_graphic=False, harassment/threatening=False, hate/threatening=False, illicit/violent=False, self-harm/intent=False, self-harm/instructions=False, self-harm=False, sexual/minors=False, violence/graphic=False)
applied_input_types: CategoryAppliedInputTypes(harassment=['text'], harassment_threatening=['text'], hate=['text'], hate_threatening=['text'], illicit=['text'], illicit_violent=['text'], self_harm=['text', 'image'], self_harm_instructions=['text', 'image'], self_harm_intent=['text', 'image'], sexual=['text', 'image'], sexual_minors=['text'], violence=['text', 'image'], violence_graphic=['text', 'image'], harassment/threatening=['text'], hate/threatening=['text'], illicit/violent=['text'], self-harm/intent=['text', 'image'], self-harm/instructions=['text', 'image'], self-harm=['text', 'image'], sexual/minors=['text'], violence/graphic=['text', 'image'])

We can see that the violence category is flagged for both text and image flag violations

  • If only the image triggered a category, you could remove the image but keep the user’s text.
  • If only the text is triggered, you can redact or block the text but keep the image.
  • If both are triggered, you can apply a stricter action.

Conclusion

omni-moderation-latest turns moderation from a text-only checkbox into a multimodal routing layer you can actually operate in production. The biggest wins are practical: you can classify images, you can attribute which modality caused a flag, and you get a taxonomy that’s easier to map to real product decisions.

Further reading

  • OpenAI moderation guide: https://platform.openai.com/docs/guides/moderation
  • OpenAI API reference: https://platform.openai.com/docs/api-reference/moderations

If you found this helpful, consider following my profile and signing up for the newsletter. Have thoughts or questions? Share them in the comments below.

References

  • https://blog1.neuralengineer.org/llm-moderation-classifer-openai-moderation-api-fdb124c4536a
  • https://platform.openai.com/docs/guides/moderation
  • https://platform.openai.com/docs/api-reference/moderations
  • https://openai.com/index/upgrading-the-moderation-api-with-our-new-multimodal-moderation-model/

Comments

Popular posts from this blog

Sentence Similarity and Semantic Search using free Huggingface Embedding API