Core Concepts¶
Understanding the fundamental principles behind SteadyText's deterministic AI.
Determinism in AI¶
What is Deterministic AI?¶
Traditional AI models are non-deterministic - they produce different outputs for the same input due to: - Random sampling during text generation - Floating-point arithmetic variations - Model initialization differences - Hardware and software variations
SteadyText makes AI deterministic - identical inputs always produce identical outputs, like a hash function.
How SteadyText Achieves Determinism¶
- Fixed Seeds: All randomness uses a consistent seed (default: 42)
- Greedy Decoding: Always selects the highest probability token
- Quantized Models: 8-bit quantization ensures numerical consistency
- Aggressive Caching: Deterministic outputs enable perfect caching
# Traditional AI - unpredictable
result1 = ai_generate("Hello") # "Hi there!"
result2 = ai_generate("Hello") # "Hello! How can I help?"
assert result1 == result2 # FAILS!
# SteadyText - deterministic
result1 = steadytext.generate("Hello") # Always same output
result2 = steadytext.generate("Hello") # Exact same output
assert result1 == result2 # Always passes!
Seeds and Reproducibility¶
Understanding Seeds¶
Seeds control the random number generation in AI models. SteadyText uses seeds to ensure reproducibility:
# Same seed = same output
text1 = steadytext.generate("Write a poem", seed=123)
text2 = steadytext.generate("Write a poem", seed=123)
assert text1 == text2 # Always true
# Different seed = different output
text3 = steadytext.generate("Write a poem", seed=456)
assert text1 != text3 # Different results
When to Use Custom Seeds¶
- A/B Testing: Generate variations with different seeds
- Research: Document seeds for reproducible experiments
- Testing: Use consistent seeds across test runs
- Content Variation: Create multiple versions of content
# Generate 3 variations for A/B testing
variations = []
for seed in [100, 200, 300]:
variant = steadytext.generate("Product description", seed=seed)
variations.append(variant)
Temperature Parameter¶
What is Temperature?¶
Temperature controls the randomness in text generation:
- Temperature = 0.0 (default): Fully deterministic, always picks highest probability
- Temperature = 0.1-0.5: Low randomness, mostly coherent
- Temperature = 0.6-1.0: Balanced creativity
- Temperature = 1.0-2.0: High creativity, more unpredictable
Temperature with Seeds¶
Even with temperature > 0, the same seed + temperature combination produces identical output:
# Same seed + temperature = reproducible randomness
creative1 = steadytext.generate("Story", seed=42, temperature=0.8)
creative2 = steadytext.generate("Story", seed=42, temperature=0.8)
assert creative1 == creative2 # Still deterministic!
Caching System¶
How Caching Works¶
SteadyText's caching leverages determinism for perfect cache hits:
- Cache Key: Generated from prompt + seed + parameters
- Frecency Algorithm: Balances frequency and recency
- Persistent Storage: SQLite database for durability
- Shared Cache: Daemon and direct access use same cache
Cache Backends¶
- SQLite (default): Thread-safe local storage
- D1: Cloudflare's distributed SQLite
- Memory: In-memory for testing
Cache Configuration¶
# Generation cache
export STEADYTEXT_GENERATION_CACHE_CAPACITY=512
export STEADYTEXT_GENERATION_CACHE_MAX_SIZE_MB=100
# Embedding cache
export STEADYTEXT_EMBEDDING_CACHE_CAPACITY=1024
export STEADYTEXT_EMBEDDING_CACHE_MAX_SIZE_MB=200
Model Architecture¶
Generation Models¶
SteadyText uses Qwen3 models for text generation:
- Small (default): Qwen3-4B - Fast, efficient
- Large: Qwen3-30B - Higher quality for complex tasks
- Mini: Gemma-270M - For CI/testing only
Embedding Model¶
- Jina v4: State-of-the-art retrieval embeddings
- 2048 dimensions: Truncated to 1024 for compatibility
- L2 normalized: Unit vectors for cosine similarity
Reranking Model¶
- Qwen3-Reranker-4B: Binary relevance scoring
- Task-aware: Customizable with task descriptions
- Fallback: Simple word overlap when unavailable
Daemon Architecture¶
What is the Daemon?¶
The daemon is a persistent background process that keeps models loaded in memory:
Benefits¶
- 160x faster first request: No model loading time
- Lower memory usage: Single model instance
- Shared cache: Consistent across all clients
Usage¶
# Start daemon (optional but recommended)
st daemon start
# Python automatically uses daemon if available
text = steadytext.generate("Hello") # Fast with daemon
Local-First Design¶
Why Local Models?¶
SteadyText runs entirely on your infrastructure:
- No API keys: Self-contained system
- No network calls: Everything runs locally
- Data privacy: Your data never leaves your servers
- Predictable costs: No per-token charges
- Offline capable: Works without internet
Trade-offs¶
Local models provide: - ✅ Perfect determinism - ✅ Data privacy - ✅ Predictable performance - ❌ Smaller model sizes than cloud APIs - ❌ Manual model management
Structured Generation¶
What is Structured Generation?¶
Force model output to conform to specific formats:
- JSON schemas: Generate valid JSON
- Regular expressions: Match patterns
- Multiple choice: Select from options
How It Works¶
SteadyText converts constraints to GBNF grammars that guide generation:
from pydantic import BaseModel
class User(BaseModel):
name: str
age: int
# Model output guaranteed to match schema
result = steadytext.generate("Create user Alice age 30", schema=User)
# Output: <json-output>{"name": "Alice", "age": 30}</json-output>
Best Practices¶
For Testing¶
# Use consistent seed in tests
TEST_SEED = 42
def test_feature():
expected = steadytext.generate("Expected output", seed=TEST_SEED)
actual = my_function()
assert actual == expected
For Production¶
# Use caching effectively
result = steadytext.generate(prompt) # First call: generates
result = steadytext.generate(prompt) # Second call: from cache
# Start daemon for performance
# Run: st daemon start
For Development¶
# Use mini models for fast iteration
export STEADYTEXT_USE_MINI_MODELS=true
# Enable model downloads in CI
export STEADYTEXT_ALLOW_MODEL_DOWNLOADS=true
Common Patterns¶
Content Variations¶
# Generate multiple versions
for i in range(3):
variant = steadytext.generate("Product description", seed=100 + i)
print(f"Version {i+1}: {variant}")
Reproducible Research¶
# Document seeds for reproducibility
EXPERIMENT_SEED = 12345
results = []
for prompt in experiments:
result = steadytext.generate(prompt, seed=EXPERIMENT_SEED)
results.append(result)
# Save seed with results for reproducibility
Semantic Search¶
# Create embeddings for similarity search
query_vec = steadytext.embed("search query")
doc_vecs = [steadytext.embed(doc) for doc in documents]
# Find most similar
similarities = [np.dot(query_vec, doc_vec) for doc_vec in doc_vecs]
best_match = documents[np.argmax(similarities)]
Next Steps¶
- Quick Start Guide - Get running in minutes
- API Reference - Complete function documentation
- Configuration Reference - All configuration options
- Examples - Real-world usage patterns