API Reference¶

Complete documentation for SteadyText's Python API and command-line interface.

Overview¶

SteadyText provides a simple, consistent API for deterministic AI operations:

generate() - Deterministic text generation with customizable seeds
generate_iter() - Streaming text generation with token-by-token output
embed() - Deterministic embeddings for semantic search and similarity
preload_models() - Pre-load models for better performance
Daemon mode - Persistent model serving for 160x faster responses
CLI tools - Command-line interface for all operations

All functions are designed to never fail - they return deterministic fallbacks when models can't be loaded.

Quick Reference¶

import steadytext

# Text generation with custom seed
text = steadytext.generate("your prompt", seed=42)
text = steadytext.generate("your prompt", seed=123)  # Different output

# Return log probabilities
text, logprobs = steadytext.generate("prompt", return_logprobs=True)

# Streaming generation with custom seed
for token in steadytext.generate_iter("prompt", seed=456):
    print(token, end="", flush=True)

# Embeddings with custom seed
vector = steadytext.embed("text to embed", seed=789)
vectors = steadytext.embed(["multiple", "texts"], seed=789)

# Model management
steadytext.preload_models(verbose=True)
cache_dir = steadytext.get_model_cache_dir()

# Daemon usage (for better performance)
from steadytext.daemon import use_daemon
with use_daemon():
    text = steadytext.generate("fast generation")
    vec = steadytext.embed("fast embedding")

# Cache management
from steadytext import get_cache_manager
cache_manager = get_cache_manager()
stats = cache_manager.get_cache_stats()
cache_manager.clear_all_caches()

Detailed Documentation¶

Core APIs¶

Text Generation - Complete guide to generate() and generate_iter()
Basic usage and parameters
Custom seed support for variations
Streaming generation
Advanced patterns and pipelines
Error handling and edge cases
Integration examples
Embeddings - Complete guide to embed() function
Creating embeddings with seeds
Vector operations and similarity
Batch processing
Advanced use cases
Performance optimization

Command Line Interface¶

CLI Reference - Complete command-line documentation
Text generation commands
Embedding operations
Model management
Daemon control
Index operations
Real-world examples
Vector Operations - Vector math and operations
Similarity calculations
Distance metrics
Vector arithmetic
Search operations

API Signatures¶

Text Generation¶

def generate(
    prompt: str,
    max_new_tokens: Optional[int] = None,
    return_logprobs: bool = False,
    eos_string: str = "[EOS]",
    model: Optional[str] = None,
    model_repo: Optional[str] = None,
    model_filename: Optional[str] = None,
    size: Optional[str] = None,
    seed: int = DEFAULT_SEED,
    schema: Optional[Union[Dict[str, Any], type, object]] = None,
    regex: Optional[str] = None,
    choices: Optional[List[str]] = None,
    response_format: Optional[Dict[str, Any]] = None,
) -> Union[str, Tuple[str, Optional[Dict[str, Any]]]]

Streaming Generation¶

def generate_iter(
    prompt: str,
    max_new_tokens: Optional[int] = None,
    eos_string: str = "[EOS]",
    include_logprobs: bool = False,
    model: Optional[str] = None,
    model_repo: Optional[str] = None,
    model_filename: Optional[str] = None,
    size: Optional[str] = None,
    seed: int = DEFAULT_SEED
) -> Iterator[Union[str, Tuple[str, Optional[Dict[str, Any]]]]]

Embeddings¶

def embed(
    text_input: Union[str, List[str]], 
    seed: int = DEFAULT_SEED
) -> np.ndarray

Utilities¶

def preload_models(verbose: bool = False) -> None
def get_model_cache_dir() -> Path
def get_cache_manager() -> CacheManager

Constants¶

Core Constants¶

steadytext.DEFAULT_SEED = 42              # Default seed for all operations
steadytext.GENERATION_MAX_NEW_TOKENS = 512  # Default max tokens for generation
steadytext.EMBEDDING_DIMENSION = 1024      # Embedding vector dimensions

Model Constants¶

# Current models (v2.0.0+)
DEFAULT_GENERATION_MODEL = "gemma-3n-2b"
DEFAULT_EMBEDDING_MODEL = "qwen3-embedding"
DEFAULT_RERANKING_MODEL = "qwen3-reranker-4b"

# Model sizes
MODEL_SIZES = {
    "small": "gemma-3n-2b",  # 2.0GB
    "large": "gemma-3n-4b"   # 4.2GB
}

Environment Variables¶

Cache Configuration¶

# Cache backend selection (sqlite, d1, memory)
STEADYTEXT_CACHE_BACKEND=sqlite  # Default

# Generation cache settings
STEADYTEXT_GENERATION_CACHE_CAPACITY=256      # Maximum cache entries
STEADYTEXT_GENERATION_CACHE_MAX_SIZE_MB=50.0  # Maximum cache file size

# Embedding cache settings
STEADYTEXT_EMBEDDING_CACHE_CAPACITY=512       # Maximum cache entries
STEADYTEXT_EMBEDDING_CACHE_MAX_SIZE_MB=100.0  # Maximum cache file size

# D1 backend configuration (when CACHE_BACKEND=d1)
STEADYTEXT_D1_API_URL=https://your-worker.workers.dev
STEADYTEXT_D1_API_KEY=your-api-key
STEADYTEXT_D1_BATCH_SIZE=50

# Disable caching entirely (not recommended)
STEADYTEXT_DISABLE_CACHE=1

For detailed cache backend documentation, see Cache Backends Guide.

Daemon Configuration¶

# Disable daemon usage globally
STEADYTEXT_DISABLE_DAEMON=1

# Custom daemon settings
STEADYTEXT_DAEMON_HOST=127.0.0.1
STEADYTEXT_DAEMON_PORT=5557

Development/Testing¶

# Allow model downloads (for testing)
STEADYTEXT_ALLOW_MODEL_DOWNLOADS=true

# Use fallback models for compatibility
STEADYTEXT_USE_FALLBACK_MODEL=true

# Set default seed globally
STEADYTEXT_DEFAULT_SEED=42

# Python hash seed (for reproducibility)
PYTHONHASHSEED=0

Model Paths¶

# Custom model cache directory
STEADYTEXT_MODEL_DIR=/path/to/models

# Skip model verification
STEADYTEXT_SKIP_MODEL_VERIFICATION=1

Error Handling¶

SteadyText uses a "never fail" design philosophy with v2.1.0+ updates:

Deterministic Behavior

Text generation: Returns None when models unavailable (v2.1.0+)
Embeddings: Returns None when models unavailable (v2.1.0+)
Streaming: Returns empty iterator when models unavailable
No exceptions: Functions handle errors gracefully
Seed support: All fallbacks respect custom seeds

Breaking Changes in v2.1.0

The deterministic fallback behavior has been disabled. Functions now return None instead of generating fallback text/embeddings when models are unavailable.

Thread Safety¶

All functions are thread-safe and support concurrent usage:

Singleton models: Models loaded once with thread-safe locks
Thread-safe caches: All caches use proper locking mechanisms
Concurrent calls: Multiple threads can call functions simultaneously
Daemon mode: ZeroMQ handles concurrent requests automatically

Example of concurrent usage:

import concurrent.futures
import steadytext

def process_prompt(prompt, seed):
    return steadytext.generate(prompt, seed=seed)

# Process multiple prompts concurrently
with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    prompts = ["prompt1", "prompt2", "prompt3", "prompt4"]
    seeds = [100, 200, 300, 400]

    futures = [executor.submit(process_prompt, p, s) 
               for p, s in zip(prompts, seeds)]

    results = [f.result() for f in futures]

Performance Notes¶

Startup Performance¶

First call: Downloads models if needed (~2.6GB total)
Model loading: 2-3 seconds on first use
Daemon mode: Eliminates model loading overhead
Preloading: Use preload_models() to load at startup

Runtime Performance¶

Generation speed: ~50-100 tokens/second
Embedding speed: ~100-500 embeddings/second
Cache hits: <0.01 seconds for cached results
Memory usage: ~2.6GB for all models loaded

Optimization Tips¶

Use daemon mode for production deployments
Preload models at application startup
Warm up cache with common prompts
Use consistent seeds for better cache efficiency
Batch operations when possible
Monitor cache stats to tune capacity

Version Compatibility¶

Model Versions¶

Each major version uses fixed models:

v2.0.0+: Gemma-3n models (generation), Qwen3 (embeddings)
v1.x: Older model versions (deprecated)

API Stability¶

Stable APIs: generate(), embed(), generate_iter()
Seed parameter: Added in all APIs for v2.0.0+
Daemon mode: Stable since v1.3.0
Cache system: Centralized since v1.3.3

Best Practices¶

Production Usage

Always specify seeds for reproducible results
Use daemon mode for better performance
Configure caches based on usage patterns
Handle None returns appropriately (v2.1.0+)
Monitor performance with cache statistics
Test with models unavailable to ensure robustness
Use environment variables for configuration
Implement proper error handling for production
Batch similar operations for efficiency
Document your seed choices for reproducibility