Prompt
Max tokens
Temperature · 0.7
Top-p · 0.90
Output
Output will appear here once a model is trained and loaded.
0%
Training in Progress — Space 1
⏱ ETA: calculating…
⚡ — steps/min
📉 PPL: —
Progress
0%
0 / — steps
Train Loss
—
val: —
Time Elapsed
—
remaining: —
Learning Rate
—
— steps/min
Configuration
This space's number (1–8). Set once in Settings.
Merge every 2,000–5,000 steps for best quality.
Path to merged.pt from a previous merge round.
Live Log
Live Log
Not running
Waiting for log data…
Live Status
No spaces training yet. Start training first.
Loss Chart
Checkpoints
None yet.
Merge Weights
How it works
1
All 8 spaces finish a training round and save latest_space{N}.pt.
2
Click Merge below. Weights are averaged, weighted by how many steps each space trained.
3
Download merged.pt and upload it to all 8 spaces.
4
Start the next round with that merged checkpoint. The model compounds every round.
Run Merge
Merge every 2,000–5,000 steps for best results.
Waiting 50,000 steps before merging causes the spaces to drift apart — the
averaged result will be weaker than frequent, smaller merges.
Use
--max_steps 5000 and run many rounds.
Training Datasets
19 datasets across 3 pillars — Reasoning (35%) · Coding (40%) · English (25%)
🧠 Logical Reasoning
35%
7 datasets
💻 Coding
40%
8 datasets
📝 English
25%
4 datasets
🧠 Logical Reasoning — 35%
Chain-of-Thought Math
openai/gsm8k
8.5k grade school math problems with step-by-step reasoning chains. Trains the model to break problems down before solving.
Competition Math
lighteval/MATH
12.5k hard competition problems (AMC, AIME, MATH). Forces multi-step logical deduction with full solution writeups.
Word Problems
microsoft/orca-math-word-problems-200k
200k math word problems with detailed reasoning. Broad coverage of arithmetic, algebra, and applied math scenarios.
Science MCQ
allenai/ai2_arc (Challenge)
Science questions requiring genuine understanding. Tests generalised reasoning beyond arithmetic — eliminates surface-level pattern matching.
Commonsense
Rowan/hellaswag
70k activity descriptions requiring commonsense completion. Critical for understanding intent and context in user prompts.
NEW · Logic Puzzles
KbsdJames/Omni-MATH
Olympiad-level math with worked solutions. Expands coverage of abstract and proof-based reasoning far beyond GSM8k.
NEW · Structured Reasoning
nvidia/OpenMathInstruct-2
14M math problem-solution pairs generated with step-level verification. Best coverage of reasoning diversity in this dataset.
💻 Coding — 40%
Multi-Language Code
bigcode/the-stack-smol
Real code across Python, JS, Go, Rust, C++, Java and more. The broadest code foundation in the training mix.
Code Instruction
nampdn-ai/tiny-codes
Natural-language spec → code pairs. Teaches the model to translate requirements into working implementations.
Competitive Programming
deepmind/code_contests
Codeforces, AtCoder, LeetCode problems with accepted solutions. Hardest algorithmic reasoning in the mix.
Code Alpaca
sahil2801/CodeAlpaca-20k
20k instruction-following triplets: prompt → explanation → code. Teaches structured code generation with explanations.
Code Search (Python)
code_search_net · python
Docstring ↔ function pairs. Teaches the critical mapping between English descriptions and Python implementations.
Code Search (JS)
code_search_net · javascript
Same docstring ↔ function pairs for JavaScript. Expands code-English mapping to the web's most common language.
NEW · SQL & Data
b-mc2/sql-create-context
78k SQL query generation examples. Adds structured data querying — a critical skill gap in code-only models.
NEW · Code Reviews
microsoft/CodeReviewer
Real GitHub PR reviews with code diffs and feedback. Teaches the model to critique and improve existing code, not just write new code.
📝 English & Communication — 25%
Encyclopedia
wikimedia/wikipedia (EN)
Broad factual English with encyclopedic precision. Grounds the model in real-world knowledge and formal writing style.
Technical Q&A
HuggingFaceH4/stack-exchange-preferences
StackExchange technical discussions. The exact register used to ask coding questions and explain solutions — bridges English and code.
Instruction Following
Open-Orca/OpenOrca
GPT-4-level instruction-following pairs. The single most important dataset for making the model communicate well and follow prompts precisely.
Complex Instructions
WizardLM/WizardLM_evol_instruct_70k
70k evolved instructions of increasing complexity. Teaches the model to handle multi-step, nuanced requests without losing context.
Total: 19 datasets · ~50M+ training examples
Datasets marked NEW were added in this v3 update. Weight bars show relative sampling probability (max = 4.0 / 4.0). The interleave is seeded per-space so all 8 spaces see different data shards — equivalent to training on 8× the data when merged.
Settings
Space ID
Which of the 8 spaces is this?
Select this space's number. Each of your 8 HuggingFace spaces must have a different number.
1
2
3
4
5
6
7
8
Three ways to set the Space ID:
1. From this UI — select a number above and click Save. Persists across restarts.
2. HuggingFace environment variable (recommended) — in each space go to Settings → Variables → New variable and add
3. Dockerfile CMD — edit the last line to:
1. From this UI — select a number above and click Save. Persists across restarts.
2. HuggingFace environment variable (recommended) — in each space go to Settings → Variables → New variable and add
CAI_SPACE_NUM = 1 (through 8).
The server reads it on startup automatically.3. Dockerfile CMD — edit the last line to:
CMD ["sh","-c","CAI_SPACE_NUM=1 python3 server.py"]
Current Configuration
Loading…