Hands-on: AI Image Generation

Practical, step by step hands on guide to AI image generation using Stable Diffusion, img2img, inpainting, ControlNet, LoRA, DreamBooth, upscaling, prompt engineering and production deployment in Hindi and English.

ЁЯЫая╕П Hands-on: AI Image Generation тАФ Practical Guide

рдпрд╣ рд╡реНрдпрд╛рдкрдХ рдкреНрд░реИрдХреНрдЯрд┐рдХрд▓ рдЧрд╛рдЗрдб рдЖрдкрдХреЛ local рд╕реЗ production рддрдХ AI рдЗрдореЗрдЬ рдЬрдирд░реЗрд╢рди рдХреЗ рд╕рднреА рдорд╣рддреНрд╡рдкреВрд░реНрдг рдкрд╣рд▓реБрдУрдВ рдкрд░ рд▓реЗ рдЬрд╛рдПрдЧрд╛. рд╣рдо рд╕реЗрдЯрдЕрдк, рдмреЗрд╕рд┐рдХ рдЯреЗрдХреНрд╕реНрдЯ рдЯреВ рдЗрдореЗрдЬ рдкрд╛рдЗрдкрд▓рд╛рдЗрдиреНрд╕, img2img рдФрд░ рдЗрдирдкреЗрдВрдЯрд┐рдВрдЧ, рдХрдВрдЯреНрд░реЛрд▓рдиреЗрдЯ рдХреЗ рд╕рд╛рде рд▓реЗрдЖрдЙрдЯ рдХрдВрдбреАрд╢рдирд┐рдВрдЧ, рд╕реНрдЯрд╛рдЗрд▓ рдлрд╛рдЗрди рдЯреНрдпреВрдирд┐рдВрдЧ, рдмреИрдЪ рд░реЗрдВрдбрд░рд┐рдВрдЧ, рдкреЛрд╕реНрдЯ рдкреНрд░реЛрд╕реЗрд╕рд┐рдВрдЧ, рдФрд░ рдкреНрд░реЛрдбрдХреНрд╢рди рд╕рд░реНрд╡рд┐рдВрдЧ рддрдХ рд╣рд░ рдХрджрдо рдкрд░ рдЙрджрд╛рд╣рд░рдг рдФрд░ рд╕реБрдЭрд╛рд╡ рджреЗрдВрдЧреЗ. рд▓реЗрдЦ рдореЗрдВ рджрд┐рдП рдЧрдП рдХреЛрдб рд╕реНрдирд┐рдкреЗрдЯреНрд╕ Hugging Face рдХреЗ рдЯреВрд▓рд┐рдВрдЧ рдФрд░ diffusers ecosystem рдХреЗ рд╕рд╛рде рдХрд╛рдо рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП рддреИрдпрд╛рд░ рдХрд┐рдП рдЧрдП рд╣реИрдВ.

1. рдХреНрдпрд╛ рд╕реАрдЦреЗрдВрдЧреЗ рдФрд░ рдХрд┐рд╕рдХреЗ рд▓рд┐рдП рд╣реИ

рдЗрд╕ рдЧрд╛рдЗрдб рд╕реЗ рдЖрдк рд╕реАрдЦреЗрдВрдЧреЗ:

  • GPU environment setup рдФрд░ рдЖрд╡рд╢реНрдпрдХ рдкреИрдХреЗрдЬ рдЗрдВрд╕реНрдЯреЙрд▓реЗрд╢рди
  • Stable Diffusion рдХреЗ рд╕рд╛рде text to image generation
  • img2img, inpainting рдФрд░ mask based edits
  • ControlNet рд╕реЗ рд▓реЗрдЖрдЙрдЯ рдХрдВрдбреАрд╢рдирд┐рдВрдЧ
  • LoRA рдФрд░ DreamBooth рдЬреИрд╕реЗ lightweight fine tuning methods
  • Upscaling, post processing рдФрд░ export workflows
  • Production serving, cost optimization рдФрд░ safety measures

2. Environment setup рдФрд░ prerequisites

рдмреЗрд╣рддрд░ рдкрд░рд┐рдгрд╛рдо рдХреЗ рд▓рд┐рдП рдПрдХ GPU enabled machine рдЬрд░реВрд░реА рд╣реИ. minimum prerequisites:

  • Python 3.9 рдпрд╛ рдКрдкрд░
  • NVIDIA GPU рдФрд░ compatible CUDA toolkit
  • Virtual environment рдпрд╛ Conda environment
  • Disk space for model weights рдФрд░ datasets

2.1 Recommended packages

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install diffusers transformers accelerate safetensors pillow opencv-python
pip install transformers[torch]   # optional helper utilities
    

Hugging Face hub рд╕реЗ pretrained model ids рдЗрд╕реНрддреЗрдорд╛рд▓ рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП Hugging Face account рдФрд░ token configured рд░рдЦреЗрдВ. рдХреБрдЫ models рдХреА license restrictions рд╣реЛрддреА рд╣реИрдВ, рд╕реБрдирд┐рд╢реНрдЪрд┐рдд рдХрд░реЗрдВ рдХрд┐ intended use supported рд╣реИ.

3. Quick Start: рдкрд╣рд▓рд╛ text to image рд░рди

рд╕рдмрд╕реЗ рдЖрд╕рд╛рди рддрд░реАрдХрд╛ Stable Diffusion pipeline рдХрд╛ рдЙрдкрдпреЛрдЧ рдХрд░рдирд╛ рд╣реИ. рдиреАрдЪреЗ рджрд┐рдпрд╛ рдЧрдпрд╛ code sketch local inference рдХрд╛ рдиреНрдпреВрдирддрдо рд╡рд░реНрдХрдлрд╝реНрд▓реЛ рджрд┐рдЦрд╛рддрд╛ рд╣реИ.

from diffusers import StableDiffusionPipeline
import torch
from PIL import Image

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A cinematic portrait of a young woman, soft film lighting, 85mm lens, photorealistic"
image = pipe(prompt, guidance_scale=7.5, num_inference_steps=50).images[0]
image.save("output_basic.png")
    

рдорд╣рддреНрд╡рдкреВрд░реНрдг parameters:

  • guidance_scale: prompt adherence рдФрд░ creativity рдХреЗ рдмреАрдЪ tradeoff
  • num_inference_steps: рдЕрдзрд┐рдХ steps рдЖрдо рддреМрд░ рдкрд░ рдмреЗрд╣рддрд░ рдкрд░рд┐рдгрд╛рдо рдкрд░рдиреНрддреБ рдзреАрд░реЗ-рдзреАрд░реЗ diminishing returns
  • seed рддрдерд╛ generator: deterministic results рдХреЗ рд▓рд┐рдП seed рд╕реЗрдЯ рдХрд░реЗрдВ

4. Reproducibility: seed рдФрд░ generator

reproducible experiments рдХреЗ рд▓рд┐рдП generator рдХрд╛ рдЙрдкрдпреЛрдЧ рдХрд░реЗрдВ рдФрд░ seed fix рдХрд░реЗрдВ.

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(prompt, generator=generator, guidance_scale=7.5).images[0]
    

5. img2img: source image рд╕реЗ creative transformation

img2img pipeline existing image рдХреЛ latent рдореЗрдВ map рдХрд░ рдХреЗ рдЙрд╕реЗ modify рдХрд░рддрд╛ рд╣реИ. рдпрд╣ stylization, retouching рдФрд░ forms transfer рдХреЗ рд▓рд┐рдП рдмрд╣реБрдд рдЙрдкрдпреЛрдЧреА рд╣реИ.

from diffusers import StableDiffusionImg2ImgPipeline
pipe_img2img = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

init_image = Image.open("source.jpg").convert("RGB")
prompt = "A watercolor painting version of the scene, soft colors, minimal brush strokes"
result = pipe_img2img(prompt=prompt, init_image=init_image, strength=0.6, guidance_scale=7.0).images[0]
result.save("img2img_result.png")
    

strength parameter рдирд┐рдпрдВрддреНрд░рд┐рдд рдХрд░рддрд╛ рд╣реИ рдХрд┐ init image рдХрд┐рддрдирд╛ preserve рд╣реЛрдЧрд╛. lower value рдЕрдзрд┐рдХ preservation, higher value рдЬреНрдпрд╛рджрд╛ transformation рджреЗрддрд╛ рд╣реИ.

6. Inpainting рдФрд░ mask based edits

Inpainting masked regions рдХреЛ specific prompt рдХреЗ рдЕрдиреБрд╕рд╛рд░ replace рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП рдкреНрд░рдпреЛрдЧ рдХрд┐рдпрд╛ рдЬрд╛рддрд╛ рд╣реИ. mask image рдореЗрдВ white рдЯрд╛рдЗрдк regions replace рд╣реЛрдВрдЧреЗ.

from diffusers import StableDiffusionInpaintPipeline
pipe_inpaint = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

init_image = Image.open("photo.png").convert("RGB")
mask = Image.open("mask.png").convert("L")  # white area to replace
prompt = "Replace masked area with a modern lamp on a wooden table, photorealistic"
result = pipe_inpaint(prompt=prompt, image=init_image, mask_image=mask, guidance_scale=7.5).images[0]
result.save("inpainted.png")
    

7. Prompt engineering for images

рдмреЗрд╣рддрд░ images рдХреЗ рд▓рд┐рдП prompt design рдорд╣рддреНрд╡рдкреВрд░реНрдг рд╣реИ. components:

  • Subject description: who or what
  • Style tokens: photorealistic, watercolor, oil painting, cinematic
  • Camera and lighting: lens focal length, lighting conditions
  • Composition and mood
  • Negative prompts to exclude artifacts

7.1 Example prompts

Positive: "A highly detailed cyberpunk cityscape at night, neon reflections, cinematic atmosphere, 35mm lens"
Negative: "lowres, blurry, watermark, extra fingers, deformed"
    

8. Samplers рдФрд░ speed tradeoffs

рдЕрд▓рдЧ sampler algorithms quality рдФрд░ latency рдкрд░ рдЕрд╕рд░ рдбрд╛рд▓рддреЗ рд╣реИрдВ. DPM++ рдФрд░ Euler samplers рдЬреАрд░реНрдг-рд╢реАрд░реНрдг steps рдореЗрдВ рдЕрдЪреНрдЫреЗ рдкрд░рд┐рдгрд╛рдо рджреЗрддреЗ рд╣реИрдВ. performance рдХреЗ рд▓рд┐рдП guided step count tuning рдЬрд╝рд░реВрд░реА рд╣реИ.

9. ControlNet: structure and layout conditioning

ControlNet рдХреЗ рд╕рд╛рде рдЖрдк edge maps, poses, segmentation maps рдпрд╛ layout sketches рд╕реЗ model рдХреЛ guide рдХрд░ рд╕рдХрддреЗ рд╣реИрдВ. рдпрд╣ posters рдФрд░ product design рдХреЗ рд▓рд┐рдП рдЙрдкрдпреЛрдЧреА рд╣реИ рдЬрд╣рд╛рдБ composition рдирд┐рдпрдВрддреНрд░рд┐рдд рдХрд░рдиреА рд╣реЛ.

# conceptual flow
# 1) prepare control image such as edges or layout
# 2) pass control image with prompt to ControlNet enabled pipeline
# 3) tune control strength to balance fidelity and creativity
    

10. Fine tuning: LoRA рдФрд░ DreamBooth overview

рдпрджрд┐ рдЖрдкрдХреЛ specific subject рдпрд╛ style рдХреА repeatable generation рдЪрд╛рд╣рд┐рдП рддреЛ lightweight fine tuning рдЙрдкрдпреЛрдЧреА рд╣реИ. LoRA low rank adapters provide рдХрд░рддрд╛ рд╣реИ рдЬрдмрдХрд┐ DreamBooth personal subject embedding рдХреЗ рд▓рд┐рдП effective рд╣реИ.

  • LoRA advantages: low storage, quick training, shareable adapters
  • DreamBooth use case: personalized subject token generation from few examples
  • Textual Inversion for new token embeddings to represent unique concepts

10.1 Practical LoRA workflow sketch

# minimal conceptual steps
# 1) prepare dataset of style images with captions
# 2) attach LoRA adapters to UNet or text encoder
# 3) train adapters for few epochs with low learning rate
# 4) save adapter and load during inference with prompt token for style
    

11. High resolution generation and upscaling

Posters рдФрд░ prints рдХреЗ рд▓рд┐рдП high DPI images рдЪрд╛рд╣рд┐рдП рд╣реЛрддреЗ рд╣реИрдВ. strategies:

  • Two stage: generate base image then apply diffusion-based upscaler
  • Tile based generation with overlap and seam blending for very large canvases
  • Perceptual upscalers like Real-ESRGAN рдпрд╛ diffusion upscalers for best fidelity

11.1 Tile generation outline

# tile based approach outline
# 1) split canvas into overlapping tiles
# 2) generate each tile with consistent prompt and seed schedule
# 3) blend overlaps using feathered alpha and color match
# 4) final global color correction
    

12. Post processing, typography рдФрд░ layout overlays

Generated image рдХреЗ рдКрдкрд░ headline рдФрд░ CTA overlay рдХрд░рдиреЗ рдХреЗ рд▓рд┐рдП vector fonts рдФрд░ padding guidelines follow рдХрд░реЗрдВ. use PIL, cairo рдпрд╛ design tool integrations for final composition.

from PIL import Image, ImageDraw, ImageFont
img = Image.open("generated.png")
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("OpenSans-Bold.ttf", 120)
draw.text((100,100), "EVENT TITLE", font=font, fill=(255,255,255))
img.save("poster_final.png")
    

13. Metadata, provenance рдФрд░ reproducibility

рд╣рд░ generated asset рдХреЗ рд╕рд╛рде metadata store рдХрд░реЗрдВ: prompt, negative prompt, seed, model id, LoRA id, ControlNet inputs, timestamp рдФрд░ license details. рдпрд╣ audit рдФрд░ legal compliance рдХреЗ рд▓рд┐рдП рдорд╣рддреНрд╡рдкреВрд░реНрдг рд╣реИ.

14. Batch rendering рдФрд░ pipeline automation

Campaigns рдХреЗ рд▓рд┐рдП batch rendering рдЬрд░реБрд░реА рд╣реИ. implement a queuing system рдФрд░ worker pool architecture to scale rendering jobs. use cloud spot instances for cost efficiency.

14.1 Example job queue sketch

# conceptual components
# - producer: creates job with prompt and metadata
# - queue: stores jobs (e.g., Redis, SQS)
# - workers: pop jobs and run rendering on GPU nodes
# - result store: save image and metadata to object storage
    

15. Safety, bias рдФрд░ copyright considerations

Production рдореЗрдВ safety layers implement рдХрд░реЗрдВ. checks:

  • NSFW detectors рдФрд░ profanity filters
  • Artist style policy: avoid prompts that request direct imitation of living artists when restricted
  • Remove or redact PII in any user provided images or text
  • Log and human review flagged outputs before public release

16. Cost optimization strategies

  • Use lightweight student models for previews and large models for final renders
  • Cache commonly requested prompts and seeds
  • Use asynchronous workers and batch inference
  • Quantize models and use mixed precision for inference

17. Monitoring рдФрд░ observability

Track metrics: job latency, failure rates, GPU utilization, flag rates рдФрд░ user satisfaction. maintain logs for prompts, retrieved ControlNet inputs рдФрд░ post processing steps.

18. Troubleshooting рд╕рд╛рдорд╛рдиреНрдп рд╕рдорд╕реНрдпрд╛рдПрдБ

  • Out of memory: reduce batch size, use fp16, or offload to CPU
  • Poor composition: improve prompt with composition hints рдФрд░ ControlNet layout
  • Artifacts: increase steps, tune guidance, use negative prompts
  • Seams in tiled images: increase overlap and improve blending

19. Example end to end notebook structure

  1. Environment setup рдФрд░ model download
  2. Simple text to image generation examples
  3. img2img рдФрд░ inpainting experiments
  4. ControlNet layout conditioning demo
  5. LoRA quick training demo рдФрд░ inference with adapter
  6. Upscaling рдФрд░ final overlay composition
  7. Metadata save рдФрд░ job cataloging

20. Next steps рдФрд░ project ideas

рдХреБрдЫ рдЖрдЧреЗ рдХреЗ рдкреНрд░реЛрдЬреЗрдХреНрдЯ рд╡рд┐рдЪрд╛рд░:

  • Interactive web editor for prompt tweaking рдФрд░ layout control
  • Poster generator pipeline with templates рдФрд░ batch rendering
  • Personalized avatar generator using DreamBooth
  • Campaign generator that produces consistent visual identity across formats

┬й Generative AI with Images. Keep model versions, prompt templates рдФрд░ metadata under version control for reproducibility and auditability.