API Reference¶
Complete documentation for SteadyText's Python API and command-line interface.
Overview¶
SteadyText provides a simple, consistent API for deterministic AI operations:
generate()
- Deterministic text generation with customizable seedsgenerate_iter()
- Streaming text generation with token-by-token outputembed()
- Deterministic embeddings for semantic search and similaritypreload_models()
- Pre-load models for better performance- Daemon mode - Persistent model serving for 160x faster responses
- CLI tools - Command-line interface for all operations
All functions are designed to never fail - they return deterministic fallbacks when models can't be loaded.
Quick Reference¶
import steadytext
# Text generation with custom seed
text = steadytext.generate("your prompt", seed=42)
text = steadytext.generate("your prompt", seed=123) # Different output
# Return log probabilities
text, logprobs = steadytext.generate("prompt", return_logprobs=True)
# Streaming generation with custom seed
for token in steadytext.generate_iter("prompt", seed=456):
print(token, end="", flush=True)
# Embeddings with custom seed
vector = steadytext.embed("text to embed", seed=789)
vectors = steadytext.embed(["multiple", "texts"], seed=789)
# Model management
steadytext.preload_models(verbose=True)
cache_dir = steadytext.get_model_cache_dir()
# Daemon usage (for better performance)
from steadytext.daemon import use_daemon
with use_daemon():
text = steadytext.generate("fast generation")
vec = steadytext.embed("fast embedding")
# Cache management
from steadytext import get_cache_manager
cache_manager = get_cache_manager()
stats = cache_manager.get_cache_stats()
cache_manager.clear_all_caches()
Detailed Documentation¶
Core APIs¶
- Text Generation - Complete guide to
generate()
andgenerate_iter()
- Basic usage and parameters
- Custom seed support for variations
- Streaming generation
- Advanced patterns and pipelines
- Error handling and edge cases
-
Integration examples
-
Embeddings - Complete guide to
embed()
function - Creating embeddings with seeds
- Vector operations and similarity
- Batch processing
- Advanced use cases
- Performance optimization
Command Line Interface¶
- CLI Reference - Complete command-line documentation
- Text generation commands
- Embedding operations
- Model management
- Daemon control
- Index operations
-
Real-world examples
-
Vector Operations - Vector math and operations
- Similarity calculations
- Distance metrics
- Vector arithmetic
- Search operations
API Signatures¶
Text Generation¶
def generate(
prompt: str,
max_new_tokens: Optional[int] = None,
return_logprobs: bool = False,
eos_string: str = "[EOS]",
model: Optional[str] = None,
model_repo: Optional[str] = None,
model_filename: Optional[str] = None,
size: Optional[str] = None,
seed: int = DEFAULT_SEED,
schema: Optional[Union[Dict[str, Any], type, object]] = None,
regex: Optional[str] = None,
choices: Optional[List[str]] = None,
response_format: Optional[Dict[str, Any]] = None,
) -> Union[str, Tuple[str, Optional[Dict[str, Any]]]]
Streaming Generation¶
def generate_iter(
prompt: str,
max_new_tokens: Optional[int] = None,
eos_string: str = "[EOS]",
include_logprobs: bool = False,
model: Optional[str] = None,
model_repo: Optional[str] = None,
model_filename: Optional[str] = None,
size: Optional[str] = None,
seed: int = DEFAULT_SEED
) -> Iterator[Union[str, Tuple[str, Optional[Dict[str, Any]]]]]
Embeddings¶
Utilities¶
def preload_models(verbose: bool = False) -> None
def get_model_cache_dir() -> Path
def get_cache_manager() -> CacheManager
Constants¶
Core Constants¶
steadytext.DEFAULT_SEED = 42 # Default seed for all operations
steadytext.GENERATION_MAX_NEW_TOKENS = 512 # Default max tokens for generation
steadytext.EMBEDDING_DIMENSION = 1024 # Embedding vector dimensions
Model Constants¶
# Current models (v2.0.0+)
DEFAULT_GENERATION_MODEL = "gemma-3n-2b"
DEFAULT_EMBEDDING_MODEL = "qwen3-embedding"
DEFAULT_RERANKING_MODEL = "qwen3-reranker-4b"
# Model sizes
MODEL_SIZES = {
"small": "gemma-3n-2b", # 2.0GB
"large": "gemma-3n-4b" # 4.2GB
}
Environment Variables¶
Cache Configuration¶
# Cache backend selection (sqlite, d1, memory)
STEADYTEXT_CACHE_BACKEND=sqlite # Default
# Generation cache settings
STEADYTEXT_GENERATION_CACHE_CAPACITY=256 # Maximum cache entries
STEADYTEXT_GENERATION_CACHE_MAX_SIZE_MB=50.0 # Maximum cache file size
# Embedding cache settings
STEADYTEXT_EMBEDDING_CACHE_CAPACITY=512 # Maximum cache entries
STEADYTEXT_EMBEDDING_CACHE_MAX_SIZE_MB=100.0 # Maximum cache file size
# D1 backend configuration (when CACHE_BACKEND=d1)
STEADYTEXT_D1_API_URL=https://your-worker.workers.dev
STEADYTEXT_D1_API_KEY=your-api-key
STEADYTEXT_D1_BATCH_SIZE=50
# Disable caching entirely (not recommended)
STEADYTEXT_DISABLE_CACHE=1
For detailed cache backend documentation, see Cache Backends Guide.
Daemon Configuration¶
# Disable daemon usage globally
STEADYTEXT_DISABLE_DAEMON=1
# Custom daemon settings
STEADYTEXT_DAEMON_HOST=127.0.0.1
STEADYTEXT_DAEMON_PORT=5557
Development/Testing¶
# Allow model downloads (for testing)
STEADYTEXT_ALLOW_MODEL_DOWNLOADS=true
# Use fallback models for compatibility
STEADYTEXT_USE_FALLBACK_MODEL=true
# Set default seed globally
STEADYTEXT_DEFAULT_SEED=42
# Python hash seed (for reproducibility)
PYTHONHASHSEED=0
Model Paths¶
# Custom model cache directory
STEADYTEXT_MODEL_DIR=/path/to/models
# Skip model verification
STEADYTEXT_SKIP_MODEL_VERIFICATION=1
Error Handling¶
SteadyText uses a "never fail" design philosophy with v2.1.0+ updates:
Deterministic Behavior
- Text generation: Returns
None
when models unavailable (v2.1.0+) - Embeddings: Returns
None
when models unavailable (v2.1.0+) - Streaming: Returns empty iterator when models unavailable
- No exceptions: Functions handle errors gracefully
- Seed support: All fallbacks respect custom seeds
Breaking Changes in v2.1.0
The deterministic fallback behavior has been disabled. Functions now return None
instead of generating fallback text/embeddings when models are unavailable.
Thread Safety¶
All functions are thread-safe and support concurrent usage:
- Singleton models: Models loaded once with thread-safe locks
- Thread-safe caches: All caches use proper locking mechanisms
- Concurrent calls: Multiple threads can call functions simultaneously
- Daemon mode: ZeroMQ handles concurrent requests automatically
Example of concurrent usage:
import concurrent.futures
import steadytext
def process_prompt(prompt, seed):
return steadytext.generate(prompt, seed=seed)
# Process multiple prompts concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
prompts = ["prompt1", "prompt2", "prompt3", "prompt4"]
seeds = [100, 200, 300, 400]
futures = [executor.submit(process_prompt, p, s)
for p, s in zip(prompts, seeds)]
results = [f.result() for f in futures]
Performance Notes¶
Startup Performance¶
- First call: Downloads models if needed (~2.6GB total)
- Model loading: 2-3 seconds on first use
- Daemon mode: Eliminates model loading overhead
- Preloading: Use
preload_models()
to load at startup
Runtime Performance¶
- Generation speed: ~50-100 tokens/second
- Embedding speed: ~100-500 embeddings/second
- Cache hits: <0.01 seconds for cached results
- Memory usage: ~2.6GB for all models loaded
Optimization Tips¶
- Use daemon mode for production deployments
- Preload models at application startup
- Warm up cache with common prompts
- Use consistent seeds for better cache efficiency
- Batch operations when possible
- Monitor cache stats to tune capacity
Version Compatibility¶
Model Versions¶
Each major version uses fixed models:
- v2.0.0+: Gemma-3n models (generation), Qwen3 (embeddings)
- v1.x: Older model versions (deprecated)
API Stability¶
- Stable APIs:
generate()
,embed()
,generate_iter()
- Seed parameter: Added in all APIs for v2.0.0+
- Daemon mode: Stable since v1.3.0
- Cache system: Centralized since v1.3.3
Best Practices¶
Production Usage
- Always specify seeds for reproducible results
- Use daemon mode for better performance
- Configure caches based on usage patterns
- Handle None returns appropriately (v2.1.0+)
- Monitor performance with cache statistics
- Test with models unavailable to ensure robustness
- Use environment variables for configuration
- Implement proper error handling for production
- Batch similar operations for efficiency
- Document your seed choices for reproducibility