Natural language processing

Black Forest Labs’ FLUX.1 merges text-to-image generation with image editing in one model

Black Forest Labs’ FLUX.1 merges text-to-image generation with image editing in one model



summary
Summary

With FLUX.1 Context, Black Forest Labs extends text-to-image systems to support both image generation and editing. The model enables fast, context-aware manipulation using a mix of text and image prompts, while preserving consistent styles and characters across multiple images.

Unlike earlier models, FLUX.1 Context lets users combine text and images as prompts to edit existing images, generate new scenes in the style of a reference image, or maintain character consistency across different outputs. A key feature is local image editing, where individual elements can be modified without altering the rest of the image. Style cues can also be provided via text to create scenes that match the look of a given reference.

One model for editing, generation and style transfer

FLUX.1 Context [pro] combines traditional text-to-image generation with step-by-step image editing. It handles both text and image prompts and, according to Black Forest Labs, runs up to ten times faster than comparable models. The system is designed to preserve the consistency of characters, styles, and objects across multiple edits—something current tools like GPT-Image-1 or Midjourney often struggle with.

Image sequence: AI-generated bird wearing VR glasses (FLUX.1) in various scenes (bar, cinema, supermarket), demonstrating character consistency.
An image sequence created with FLUX.1 shows how the model preserves character consistency even as complex scenes change in response to new text prompts. The bird with the VR headset stays recognizable whether it’s at a bar, at the movies, or out shopping. | Image: BFL

An experimental version, FLUX.1 Context [max], targets users who need greater typographic precision, more reliable editing, and faster inference. The goal is to maximize prompt accuracy while delivering high performance.

Ad

To measure performance, Black Forest Labs ran its own tests using an in-house benchmark called KontextBench. The company reports that FLUX.1 Kontext [pro] led the field, especially in text editing and character retention. It also beat other current models in speed and in sticking to the user’s prompt.

Bar charts: Performance of AI image processing models (FLUX.1, GPT-Image-1, Gemini, etc.) for various processing tasks.
FLUX.1 Kontext is also expected to outperform OpenAI’s GPT-Image-1 model. | Image: BFL

There are still some trade-offs, the company says: The model can introduce visible artifacts during longer editing chains, and sometimes fails to follow prompts accurately. Its limited world knowledge can also affect its ability to generate contextually accurate images.

Open version for research and new interfaces for testing

For research, BFL offers a smaller model called FLUX.1 Context [dev]. With 12 billion parameters, it’s intended for security testing and custom adaptations, and will be available through a private beta. Once released publicly, it will be distributed via partners such as Hugging Face.

BFL has also launched the “FLUX Playground,” a web interface for testing the models in real time without any technical setup. It lets users explore capabilities, experiment with prompts, and validate use cases on the fly.

FLUX.1 Context models are also available through platforms like KreaAI, Freepik, Lightricks, OpenArt, and LeonardoAI, as well as infrastructure providers including FAL, Replicate, and DataCrunch.

Recommendation

Black Forest Labs' FLUX.1 merges text-to-image generation with image editing in one model

Source link