Coding AI v3

Prompt

Max tokens

Temperature · 0.7

Top-p · 0.90

Output

Output will appear here once a model is trained and loaded.

Training in Progress — Space 1

⏱ ETA: calculating…

⚡ — steps/min

📉 PPL: —

Progress

0 / — steps

Train Loss

—

val: —

Time Elapsed

—

remaining: —

Learning Rate

—

— steps/min

Configuration

Space ID

This space's number (1–8). Set once in Settings.

Model size

Steps per round

Merge every 2,000–5,000 steps for best quality.

Resume from merged (optional)

Path to merged.pt from a previous merge round.

Live Log

Not running

Waiting for log data…

Live Status

No spaces training yet. Start training first.

Loss Chart

Checkpoints

None yet.

Merge Weights

How it works

All 8 spaces finish a training round and save latest_space{N}.pt.

Click Merge below. Weights are averaged, weighted by how many steps each space trained.

Download merged.pt and upload it to all 8 spaces.

Start the next round with that merged checkpoint. The model compounds every round.

Run Merge

Merge every 2,000–5,000 steps for best results. Waiting 50,000 steps before merging causes the spaces to drift apart — the averaged result will be weaker than frequent, smaller merges. Use --max_steps 5000 and run many rounds.

Training Datasets

19 datasets across 3 pillars — Reasoning (35%) · Coding (40%) · English (25%)

🧠 Logical Reasoning

35%

7 datasets

💻 Coding

40%

8 datasets

📝 English

25%

4 datasets

🧠 Logical Reasoning — 35%

Chain-of-Thought Math

openai/gsm8k

8.5k grade school math problems with step-by-step reasoning chains. Trains the model to break problems down before solving.

Competition Math

lighteval/MATH

12.5k hard competition problems (AMC, AIME, MATH). Forces multi-step logical deduction with full solution writeups.

Word Problems

microsoft/orca-math-word-problems-200k

200k math word problems with detailed reasoning. Broad coverage of arithmetic, algebra, and applied math scenarios.

Science MCQ

allenai/ai2_arc (Challenge)

Science questions requiring genuine understanding. Tests generalised reasoning beyond arithmetic — eliminates surface-level pattern matching.

Commonsense

Rowan/hellaswag

70k activity descriptions requiring commonsense completion. Critical for understanding intent and context in user prompts.

NEW · Logic Puzzles

KbsdJames/Omni-MATH

Olympiad-level math with worked solutions. Expands coverage of abstract and proof-based reasoning far beyond GSM8k.

NEW · Structured Reasoning

nvidia/OpenMathInstruct-2

14M math problem-solution pairs generated with step-level verification. Best coverage of reasoning diversity in this dataset.

💻 Coding — 40%

Multi-Language Code

bigcode/the-stack-smol

Real code across Python, JS, Go, Rust, C++, Java and more. The broadest code foundation in the training mix.

Code Instruction

nampdn-ai/tiny-codes

Natural-language spec → code pairs. Teaches the model to translate requirements into working implementations.

Competitive Programming

deepmind/code_contests

Codeforces, AtCoder, LeetCode problems with accepted solutions. Hardest algorithmic reasoning in the mix.

Code Alpaca

sahil2801/CodeAlpaca-20k

20k instruction-following triplets: prompt → explanation → code. Teaches structured code generation with explanations.

Code Search (Python)

code_search_net · python

Docstring ↔ function pairs. Teaches the critical mapping between English descriptions and Python implementations.

Code Search (JS)

code_search_net · javascript

Same docstring ↔ function pairs for JavaScript. Expands code-English mapping to the web's most common language.

NEW · SQL & Data

b-mc2/sql-create-context

78k SQL query generation examples. Adds structured data querying — a critical skill gap in code-only models.

NEW · Code Reviews

microsoft/CodeReviewer

Real GitHub PR reviews with code diffs and feedback. Teaches the model to critique and improve existing code, not just write new code.

📝 English & Communication — 25%

Encyclopedia

wikimedia/wikipedia (EN)

Broad factual English with encyclopedic precision. Grounds the model in real-world knowledge and formal writing style.

Technical Q&A

HuggingFaceH4/stack-exchange-preferences

StackExchange technical discussions. The exact register used to ask coding questions and explain solutions — bridges English and code.

Instruction Following

Open-Orca/OpenOrca

GPT-4-level instruction-following pairs. The single most important dataset for making the model communicate well and follow prompts precisely.

Complex Instructions

WizardLM/WizardLM_evol_instruct_70k

70k evolved instructions of increasing complexity. Teaches the model to handle multi-step, nuanced requests without losing context.

Total: 19 datasets · ~50M+ training examples

Datasets marked NEW were added in this v3 update. Weight bars show relative sampling probability (max = 4.0 / 4.0). The interleave is seeded per-space so all 8 spaces see different data shards — equivalent to training on 8× the data when merged.

Settings

Space ID Which of the 8 spaces is this?

Select this space's number. Each of your 8 HuggingFace spaces must have a different number.

Three ways to set the Space ID:

1. From this UI — select a number above and click Save. Persists across restarts.

2. HuggingFace environment variable (recommended) — in each space go to Settings → Variables → New variable and add CAI_SPACE_NUM = 1 (through 8). The server reads it on startup automatically.

3. Dockerfile CMD — edit the last line to: CMD ["sh","-c","CAI_SPACE_NUM=1 python3 server.py"]

Current Configuration

Loading…