🛠️ Hands-on: AI Image Generation — Practical Guide

यह व्यापक प्रैक्टिकल गाइड आपको local से production तक AI इमेज जनरेशन के सभी महत्वपूर्ण पहलुओं पर ले जाएगा. हम सेटअप, बेसिक टेक्स्ट टू इमेज पाइपलाइन्स, img2img और इनपेंटिंग, कंट्रोलनेट के साथ लेआउट कंडीशनिंग, स्टाइल फाइन ट्यूनिंग, बैच रेंडरिंग, पोस्ट प्रोसेसिंग, और प्रोडक्शन सर्विंग तक हर कदम पर उदाहरण और सुझाव देंगे. लेख में दिए गए कोड स्निपेट्स Hugging Face के टूलिंग और diffusers ecosystem के साथ काम करने के लिए तैयार किए गए हैं.

1. क्या सीखेंगे और किसके लिए है

इस गाइड से आप सीखेंगे:

GPU environment setup और आवश्यक पैकेज इंस्टॉलेशन
Stable Diffusion के साथ text to image generation
img2img, inpainting और mask based edits
ControlNet से लेआउट कंडीशनिंग
LoRA और DreamBooth जैसे lightweight fine tuning methods
Upscaling, post processing और export workflows
Production serving, cost optimization और safety measures

2. Environment setup और prerequisites

बेहतर परिणाम के लिए एक GPU enabled machine जरूरी है. minimum prerequisites:

Python 3.9 या ऊपर
NVIDIA GPU और compatible CUDA toolkit
Virtual environment या Conda environment
Disk space for model weights और datasets

2.1 Recommended packages

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install diffusers transformers accelerate safetensors pillow opencv-python
pip install transformers[torch]   # optional helper utilities

Hugging Face hub से pretrained model ids इस्तेमाल करने के लिए Hugging Face account और token configured रखें. कुछ models की license restrictions होती हैं, सुनिश्चित करें कि intended use supported है.

3. Quick Start: पहला text to image रन

सबसे आसान तरीका Stable Diffusion pipeline का उपयोग करना है. नीचे दिया गया code sketch local inference का न्यूनतम वर्कफ़्लो दिखाता है.

from diffusers import StableDiffusionPipeline
import torch
from PIL import Image

model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A cinematic portrait of a young woman, soft film lighting, 85mm lens, photorealistic"
image = pipe(prompt, guidance_scale=7.5, num_inference_steps=50).images[0]
image.save("output_basic.png")

महत्वपूर्ण parameters:

guidance_scale: prompt adherence और creativity के बीच tradeoff
num_inference_steps: अधिक steps आम तौर पर बेहतर परिणाम परन्तु धीरे-धीरे diminishing returns
seed तथा generator: deterministic results के लिए seed सेट करें

4. Reproducibility: seed और generator

reproducible experiments के लिए generator का उपयोग करें और seed fix करें.

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(prompt, generator=generator, guidance_scale=7.5).images[0]

5. img2img: source image से creative transformation

img2img pipeline existing image को latent में map कर के उसे modify करता है. यह stylization, retouching और forms transfer के लिए बहुत उपयोगी है.

from diffusers import StableDiffusionImg2ImgPipeline
pipe_img2img = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

init_image = Image.open("source.jpg").convert("RGB")
prompt = "A watercolor painting version of the scene, soft colors, minimal brush strokes"
result = pipe_img2img(prompt=prompt, init_image=init_image, strength=0.6, guidance_scale=7.0).images[0]
result.save("img2img_result.png")

strength parameter नियंत्रित करता है कि init image कितना preserve होगा. lower value अधिक preservation, higher value ज्यादा transformation देता है.

6. Inpainting और mask based edits

Inpainting masked regions को specific prompt के अनुसार replace करने के लिए प्रयोग किया जाता है. mask image में white टाइप regions replace होंगे.

from diffusers import StableDiffusionInpaintPipeline
pipe_inpaint = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

init_image = Image.open("photo.png").convert("RGB")
mask = Image.open("mask.png").convert("L")  # white area to replace
prompt = "Replace masked area with a modern lamp on a wooden table, photorealistic"
result = pipe_inpaint(prompt=prompt, image=init_image, mask_image=mask, guidance_scale=7.5).images[0]
result.save("inpainted.png")

7. Prompt engineering for images

बेहतर images के लिए prompt design महत्वपूर्ण है. components:

Subject description: who or what
Style tokens: photorealistic, watercolor, oil painting, cinematic
Camera and lighting: lens focal length, lighting conditions
Composition and mood
Negative prompts to exclude artifacts

7.1 Example prompts

Positive: "A highly detailed cyberpunk cityscape at night, neon reflections, cinematic atmosphere, 35mm lens"
Negative: "lowres, blurry, watermark, extra fingers, deformed"

8. Samplers और speed tradeoffs

अलग sampler algorithms quality और latency पर असर डालते हैं. DPM++ और Euler samplers जीर्ण-शीर्ण steps में अच्छे परिणाम देते हैं. performance के लिए guided step count tuning ज़रूरी है.

9. ControlNet: structure and layout conditioning

ControlNet के साथ आप edge maps, poses, segmentation maps या layout sketches से model को guide कर सकते हैं. यह posters और product design के लिए उपयोगी है जहाँ composition नियंत्रित करनी हो.

# conceptual flow
# 1) prepare control image such as edges or layout
# 2) pass control image with prompt to ControlNet enabled pipeline
# 3) tune control strength to balance fidelity and creativity

10. Fine tuning: LoRA और DreamBooth overview

यदि आपको specific subject या style की repeatable generation चाहिए तो lightweight fine tuning उपयोगी है. LoRA low rank adapters provide करता है जबकि DreamBooth personal subject embedding के लिए effective है.

LoRA advantages: low storage, quick training, shareable adapters
DreamBooth use case: personalized subject token generation from few examples
Textual Inversion for new token embeddings to represent unique concepts

10.1 Practical LoRA workflow sketch

# minimal conceptual steps
# 1) prepare dataset of style images with captions
# 2) attach LoRA adapters to UNet or text encoder
# 3) train adapters for few epochs with low learning rate
# 4) save adapter and load during inference with prompt token for style

11. High resolution generation and upscaling

Posters और prints के लिए high DPI images चाहिए होते हैं. strategies:

Two stage: generate base image then apply diffusion-based upscaler
Tile based generation with overlap and seam blending for very large canvases
Perceptual upscalers like Real-ESRGAN या diffusion upscalers for best fidelity

11.1 Tile generation outline

# tile based approach outline
# 1) split canvas into overlapping tiles
# 2) generate each tile with consistent prompt and seed schedule
# 3) blend overlaps using feathered alpha and color match
# 4) final global color correction

12. Post processing, typography और layout overlays

Generated image के ऊपर headline और CTA overlay करने के लिए vector fonts और padding guidelines follow करें. use PIL, cairo या design tool integrations for final composition.

from PIL import Image, ImageDraw, ImageFont
img = Image.open("generated.png")
draw = ImageDraw.Draw(img)
font = ImageFont.truetype("OpenSans-Bold.ttf", 120)
draw.text((100,100), "EVENT TITLE", font=font, fill=(255,255,255))
img.save("poster_final.png")

13. Metadata, provenance और reproducibility

हर generated asset के साथ metadata store करें: prompt, negative prompt, seed, model id, LoRA id, ControlNet inputs, timestamp और license details. यह audit और legal compliance के लिए महत्वपूर्ण है.

14. Batch rendering और pipeline automation

Campaigns के लिए batch rendering जरुरी है. implement a queuing system और worker pool architecture to scale rendering jobs. use cloud spot instances for cost efficiency.

14.1 Example job queue sketch

# conceptual components
# - producer: creates job with prompt and metadata
# - queue: stores jobs (e.g., Redis, SQS)
# - workers: pop jobs and run rendering on GPU nodes
# - result store: save image and metadata to object storage

15. Safety, bias और copyright considerations

Production में safety layers implement करें. checks:

NSFW detectors और profanity filters
Artist style policy: avoid prompts that request direct imitation of living artists when restricted
Remove or redact PII in any user provided images or text
Log and human review flagged outputs before public release

16. Cost optimization strategies

Use lightweight student models for previews and large models for final renders
Cache commonly requested prompts and seeds
Use asynchronous workers and batch inference
Quantize models and use mixed precision for inference

17. Monitoring और observability

Track metrics: job latency, failure rates, GPU utilization, flag rates और user satisfaction. maintain logs for prompts, retrieved ControlNet inputs और post processing steps.

18. Troubleshooting सामान्य समस्याएँ

Out of memory: reduce batch size, use fp16, or offload to CPU
Poor composition: improve prompt with composition hints और ControlNet layout
Artifacts: increase steps, tune guidance, use negative prompts
Seams in tiled images: increase overlap and improve blending

19. Example end to end notebook structure

Environment setup और model download
Simple text to image generation examples
img2img और inpainting experiments
ControlNet layout conditioning demo
LoRA quick training demo और inference with adapter
Upscaling और final overlay composition
Metadata save और job cataloging

20. Next steps और project ideas

कुछ आगे के प्रोजेक्ट विचार:

Interactive web editor for prompt tweaking और layout control
Poster generator pipeline with templates और batch rendering
Personalized avatar generator using DreamBooth
Campaign generator that produces consistent visual identity across formats