GPT-2 Piano MPS 12k

Overview

What this project is

A GPT-2-style piano model

GPT-2 Piano MPS 12k is a symbolic music generator. Instead of natural-language prompts, it reads short piano MIDI prompts, turns them into token sequences, and generates a continuation.

Built as a training workflow

The repo includes scripts for splitting MIDI data, augmenting training files, tokenizing with Miditok REMI, training the model, generating samples, and comparing checkpoints.

Designed for a local machine

Training targets Apple Silicon MPS. Large artifacts such as datasets and checkpoints stay local, so the public repo keeps the code and one tracked example MIDI.

PyTorch Hugging Face Transformers Miditok REMI pretty_midi NumPy Apple Silicon MPS

Not text-to-music

This project does not take written prompts. It continues short MIDI or token prompts, so the input is musical context rather than natural language.

Demo

The model generates a continuation from a short piano MIDI prompt. This repo includes one tracked sample output at examples/generated-example.mid.

Download Sample MIDI View File

No MP3 file is included, so this page links to the MIDI directly. Browser MIDI playback may depend on the user's system or installed software.

short MIDI prompt

REMI token ids GPT-2 checkpoint sample tokens

generated-example.mid

Pipeline

How it works

The workflow keeps the music symbolic from start to finish: MIDI files become REMI tokens, the transformer learns token continuations, and generated tokens are written back to MIDI.

01

Raw MIDI

Start with piano MIDI files stored locally.

data/raw/source_midis

02

Split data

Create train, validation, and test splits.

prepare_12k_split.py

03

Transpose training files

Add bounded piano transpositions for augmentation.

augment_train_transpose.py

04

Tokenize with REMI

Convert MIDI into compact token arrays.

tokenize_12k_augmented.py

05

Train GPT-2 model

Train a transformer on 2048-token windows.

train_gpt2_piano_12k.py

06

Compare checkpoints

Run prompt profiles over saved epochs.

generation_pipeline.py

07

Generate MIDI

Save generated continuations as MIDI files.

generate_piano_sample.py

Model

The current run uses a compact GPT-2-style configuration for symbolic music tokens. It is prompt-conditioned by MIDI or token prefixes, not by text.

12 layers

12 attention heads

2048 token context

423 token vocabulary

768 embedding size

MPS Apple Silicon training

MIDI prompt

REMI tokens

Bar Position Pitch Velocity Duration

GPT-2 transformer

embedding 12 layers LM head

Generated tokens

Position Pitch Duration Bar

MIDI output

Checkpoint comparison

Comparing saved runs

The repo includes a small batch script for comparing checkpoints by generating MIDI continuations from the same prompt profiles. Checkpoint files are local only and are not included in the repo.

epoch_02

Local best note

val_loss 1.429826

Saved as checkpoints/best on the local training machine.

epoch_04

Later comparison

val_loss 1.531063

Used for side-by-side continuation checks against the earlier checkpoint.

script

Batch generation

generation_pipeline.py

Writes generated MIDI files, token arrays, metadata JSON, and a pipeline manifest.

Local use

Run it locally

The code is set up for local experiments. Training and generation need local MIDI data and local checkpoints.

Smoke check

python3 -m pip install -r requirements-smoke.txt
python3 -m unittest tests.test_dependency_smoke

Training pipeline

python3 scripts/prepare_12k_split.py
python3 scripts/augment_train_transpose.py
python3 scripts/tokenize_12k_augmented.py
python3 scripts/train_gpt2_piano_12k.py --train-from-scratch

Generate after training

python3 scripts/generate_piano_sample.py \
  --checkpoint checkpoints/best

Side experiment

Melody Intensity Editor

This side experiment explores controlling the density and energy of generated phrases with an intensity value from 0.0 to 1.0.

soft / sparse 0.45 loud / dense

balanced phrase 5 preview notes active velocity target 64 / 127

Notes

Limitations

Local artifacts

Datasets and checkpoints are not included in the repo.

Prompt sensitivity

Output quality depends on the MIDI prompt and the checkpoint being sampled.

MIDI continuation

This is MIDI continuation, not text-to-music generation from written prompts.

Experimental control

The Melody Intensity Editor is still a side experiment.

Next step

Train. Compare. Generate.

Open GitHub Repository Download Sample MIDI

GPT-2 Piano Generation on Apple Silicon

What this project is

A GPT-2-style piano model

Built as a training workflow

Designed for a local machine

Demo

How it works

Raw MIDI

Split data

Transpose training files

Tokenize with REMI

Train GPT-2 model

Compare checkpoints

Generate MIDI

Model

MIDI prompt

REMI tokens

GPT-2 transformer

Generated tokens

MIDI output

Comparing saved runs

Local best note

Later comparison

Batch generation

Run it locally

Smoke check

Training pipeline

Generate after training

Melody Intensity Editor

Limitations

Local artifacts

Prompt sensitivity

MIDI continuation

Experimental control

Train. Compare. Generate.