Model Switching in SteadyText¶
SteadyText v2.0.0+ supports dynamic model switching with the Gemma-3n model family, allowing you to use different model sizes without restarting your application.
Overview¶
The model switching feature enables you to:
- Use different models for different tasks - Choose smaller models for speed or larger models for quality
- Switch models at runtime - No need to restart your application
- Maintain deterministic outputs - Each model produces consistent results
- Cache multiple models - Models are cached after first load for efficiency
Usage Methods¶
1. Using Size Parameter (New!)¶
The simplest way to choose a model based on your needs:
from steadytext import generate
# Quick, lightweight tasks
text = generate("Simple task", size="small") # Uses Gemma-3n-2B (default)
text = generate("Complex analysis", size="large") # Uses Gemma-3n-4B
2. Using the Model Registry¶
For more specific model selection:
from steadytext import generate
# Use a smaller, faster model
text = generate("Explain machine learning", size="small") # Gemma-3n-2B
# Use a larger, more capable model
text = generate("Write a detailed essay", size="large") # Gemma-3n-4B
Available models in the registry (v2.0.0+):
Model Name | Size | Use Case | Size Parameter |
---|---|---|---|
gemma-3n-2b |
2B | Default, fast tasks | small |
gemma-3n-4b |
4B | High quality, complex tasks | large |
Note: SteadyText v2.0.0+ focuses on the Gemma-3n model family. Previous versions (v1.x) supported Qwen models which are now deprecated.
3. Using Custom Models¶
Specify any GGUF model from Hugging Face:
from steadytext import generate
# Use a custom model
text = generate(
"Create a Python function",
model_repo="ggml-org/gemma-3n-E4B-it-GGUF",
model_filename="gemma-3n-E4B-it-Q8_0.gguf"
)
4. Using Environment Variables¶
Set default models via environment variables:
# Use small model by default
export STEADYTEXT_DEFAULT_SIZE="small"
# Or specify custom model (advanced)
export STEADYTEXT_GENERATION_MODEL_REPO="ggml-org/gemma-3n-E2B-it-GGUF"
export STEADYTEXT_GENERATION_MODEL_FILENAME="gemma-3n-E2B-it-Q8_0.gguf"
Streaming Generation¶
Model switching works with streaming generation too:
from steadytext import generate_iter
# Stream with a specific model size
for token in generate_iter("Tell me a story", size="large"):
print(token, end="", flush=True)
Model Selection Guide¶
For Speed (2B model)¶
- Use cases: Chat responses, simple completions, real-time applications
- Recommended:
gemma-3n-2b
(size="small") - Trade-off: Faster generation, simpler outputs
For Quality (4B model)¶
- Use cases: Complex reasoning, detailed content, creative writing
- Recommended:
gemma-3n-4b
(size="large") - Trade-off: Best quality, slower generation
Performance Considerations¶
- First Load: The first use of a model downloads it (if not cached) and loads it into memory
- Model Caching: Once loaded, models remain in memory for fast switching
- Memory Usage: Each loaded model uses RAM - consider your available resources
- Determinism: All models maintain deterministic outputs with the same seed
Examples¶
Adaptive Model Selection¶
from steadytext import generate
def smart_generate(prompt, complexity="medium"):
"""Use different models based on task complexity."""
if complexity == "low":
# Use fast model for simple tasks
return generate(prompt, size="small")
else:
# Use high-quality model for complex tasks
return generate(prompt, size="large")
A/B Testing Models¶
from steadytext import generate
prompts = ["Explain quantum computing", "Write a haiku", "Solve 2+2"]
for prompt in prompts:
print(f"\nPrompt: {prompt}")
# Test with small model
small = generate(prompt, size="small")
print(f"Small model: {small[:100]}...")
# Test with large model
large = generate(prompt, size="large")
print(f"Large model: {large[:100]}...")
Troubleshooting¶
Model Not Found¶
If a model download fails, you'll get deterministic fallback text. Check: - Internet connection - Hugging Face availability - Model name spelling
Out of Memory¶
Large models require significant RAM. Solutions:
- Use smaller quantized models
- Clear model cache with clear_model_cache()
- Use one model at a time
Slow First Load¶
Initial model loading takes time due to: - Downloading (first time only) - Loading into memory - Model initialization
Subsequent uses are much faster as models are cached.