Symbolic piano continuation

GPT-2 Piano Generation on Apple Silicon

A small machine-learning music project that trains a GPT-2-style model to continue symbolic piano MIDI.

  • PyTorch
  • MIDI
  • GPT-2
  • Miditok
  • Apple Silicon

Overview

What this project is

A GPT-2-style piano model

GPT-2 Piano MPS 12k is a symbolic music generator. Instead of natural-language prompts, it reads short piano MIDI prompts, turns them into token sequences, and generates a continuation.

Built as a training workflow

The repo includes scripts for splitting MIDI data, augmenting training files, tokenizing with Miditok REMI, training the model, generating samples, and comparing checkpoints.

Designed for a local machine

Training targets Apple Silicon MPS. Large artifacts such as datasets and checkpoints stay local, so the public repo keeps the code and one tracked example MIDI.

PyTorch Hugging Face Transformers Miditok REMI pretty_midi NumPy Apple Silicon MPS
Not text-to-music

This project does not take written prompts. It continues short MIDI or token prompts, so the input is musical context rather than natural language.

Demo

Demo

The model generates a continuation from a short piano MIDI prompt. This repo includes one tracked sample output at examples/generated-example.mid.

No MP3 file is included, so this page links to the MIDI directly. Browser MIDI playback may depend on the user's system or installed software.

Pipeline

How it works

The workflow keeps the music symbolic from start to finish: MIDI files become REMI tokens, the transformer learns token continuations, and generated tokens are written back to MIDI.

01

Raw MIDI

Start with piano MIDI files stored locally.

data/raw/source_midis
02

Split data

Create train, validation, and test splits.

prepare_12k_split.py
03

Transpose training files

Add bounded piano transpositions for augmentation.

augment_train_transpose.py
04

Tokenize with REMI

Convert MIDI into compact token arrays.

tokenize_12k_augmented.py
05

Train GPT-2 model

Train a transformer on 2048-token windows.

train_gpt2_piano_12k.py
06

Compare checkpoints

Run prompt profiles over saved epochs.

generation_pipeline.py
07

Generate MIDI

Save generated continuations as MIDI files.

generate_piano_sample.py

Model

Model

The current run uses a compact GPT-2-style configuration for symbolic music tokens. It is prompt-conditioned by MIDI or token prefixes, not by text.

12 layers
12 attention heads
2048 token context
423 token vocabulary
768 embedding size
MPS Apple Silicon training

MIDI prompt

REMI tokens

Bar Position Pitch Velocity Duration

GPT-2 transformer

embedding 12 layers LM head

Generated tokens

Position Pitch Duration Bar

MIDI output

Checkpoint comparison

Comparing saved runs

The repo includes a small batch script for comparing checkpoints by generating MIDI continuations from the same prompt profiles. Checkpoint files are local only and are not included in the repo.

epoch_02

Local best note

val_loss 1.429826

Saved as checkpoints/best on the local training machine.

epoch_04

Later comparison

val_loss 1.531063

Used for side-by-side continuation checks against the earlier checkpoint.

script

Batch generation

generation_pipeline.py

Writes generated MIDI files, token arrays, metadata JSON, and a pipeline manifest.

Local use

Run it locally

The code is set up for local experiments. Training and generation need local MIDI data and local checkpoints.

Smoke check

python3 -m pip install -r requirements-smoke.txt
python3 -m unittest tests.test_dependency_smoke

Training pipeline

python3 scripts/prepare_12k_split.py
python3 scripts/augment_train_transpose.py
python3 scripts/tokenize_12k_augmented.py
python3 scripts/train_gpt2_piano_12k.py --train-from-scratch

Generate after training

python3 scripts/generate_piano_sample.py \
  --checkpoint checkpoints/best

Side experiment

Melody Intensity Editor

This side experiment explores controlling the density and energy of generated phrases with an intensity value from 0.0 to 1.0.

soft / sparse 0.45 loud / dense
balanced phrase 5 preview notes active velocity target 64 / 127

Notes

Limitations

Local artifacts

Datasets and checkpoints are not included in the repo.

Prompt sensitivity

Output quality depends on the MIDI prompt and the checkpoint being sampled.

MIDI continuation

This is MIDI continuation, not text-to-music generation from written prompts.

Experimental control

The Melody Intensity Editor is still a side experiment.

Next step

Train. Compare. Generate.