VLM Image Editing Prompt Cookbook

Understanding VLM Editing

Vision-Language Models (VLMs) like Qwen-VL understand both images and text. For image editing, you describe the desired change in natural language, and the model modifies the image accordingly. The quality of your edit depends heavily on prompt specificity—vague prompts produce unpredictable results, while precise prompts give surgical control.

This cookbook provides tested prompt templates for common editing tasks, organized by difficulty level. Each template includes the prompt text, optimal parameters, and tips for best results.

Object Removal

Easy

Remove a single object from background

Remove the [object] from the image. Fill the area with the surrounding background seamlessly. Maintain original lighting and perspective.

Best for: Removing people from landscapes, objects from tabletops, signs from buildings.

Tip: Name the object specifically. "Remove the red car" works better than "remove the car" when multiple vehicles are present.

Medium

Remove object with complex background

Remove the [object] located at [position description, e.g., 'center-left of the image']. Reconstruct the background behind it using the surrounding context: [describe expected background, e.g., 'continuation of the brick wall pattern']. Preserve shadows and reflections.

Tip: When the background behind the object is complex (patterns, textures), explicitly describe what should appear after removal.

Style Transfer

Easy

Apply artistic style

Transform this photograph into a [style] painting. Maintain the composition and subject positioning. Apply [specific characteristics, e.g., 'thick impasto brushstrokes and vibrant warm colors'].

Styles: oil painting, watercolor, pencil sketch, anime, pixel art, Art Nouveau, Pop Art

Hard

Partial style transfer (selective area)

Apply [style] only to the [area, e.g., 'background']. Keep the [foreground subject, e.g., 'person'] photorealistic. Create a natural transition between the styled and realistic areas.

Note: This is advanced and results may vary. Works best with clear foreground/background separation.

Inpainting & Outpainting

Medium

Inpainting: Replace object with new content

Replace the [existing object] with a [new object]. Match the lighting direction from the [direction, e.g., 'upper left']. The new object should cast a shadow consistent with other objects in the scene.

Key: Always mention lighting direction and shadow consistency for realistic results.

Hard

Outpainting: Extend the image

Extend this image [direction, e.g., 'to the right by 50%']. Continue the scene naturally: [describe expected content, e.g., 'more of the beach with additional palm trees and the ocean horizon line maintaining the same angle']. Match the existing color palette and mood.

Color & Lighting Adjustment

Easy

Time of day change

Change the lighting to [time, e.g., 'golden hour sunset']. Add warm orange tones to highlights and deep blue tones to shadows. Add long shadows pointing [direction]. Keep all objects and their positions unchanged.

Prompt Engineering Best Practices

Be specific over general: "elderly man with gray beard" beats "person"
Mention what to preserve: "Keep the original composition" prevents unwanted changes
Describe lighting explicitly: Direction, warmth, and intensity
Use spatial references: "center-left," "upper third," "foreground"
Iterate: Start with a simple edit, then refine with follow-up prompts

For academic researchers documenting image editing techniques, clear before/after figures are essential. SciDraw can generate comparison grids, attention maps, and architecture diagrams suitable for conference papers.

Understanding VLM Editing

Object Removal

Remove a single object from background

Remove object with complex background

Style Transfer

Apply artistic style

Partial style transfer (selective area)

Inpainting & Outpainting

Inpainting: Replace object with new content

Outpainting: Extend the image

Color & Lighting Adjustment

Time of day change

Prompt Engineering Best Practices

Related