Transparency

How we score your singing

Every number you see in IntonationAI comes from a specific measurement on your recorded audio. This page explains what each one is, how the calculation works, and — just as importantly — what it cannot tell you. No black boxes.

Instrument calibration (future)

Same analyser core, different strictness later

Pitch and rhythm math is shared across instruments, but a fretboard and a piano key don’t need the same “green zone” cents tolerance as the human voice. We keep a small scoring profile registry on the server (see the scoring profiles service in the backend) so guitar and piano can tighten tolerances and relabel breath proxies without forking the detector.

Pitch

How close your notes are to where you meant them to sit

We track the fundamental frequency of your voice using pYIN — a probabilistic pitch detector that runs frame-by-frame across your recording and then smooths the result with a hidden Markov model so brief mispronunciations or consonants don’t drag the number around.

The detector covers E2 (82 Hz) to F6 (1400 Hz), which spans bass low through soprano whistle threshold. The earlier version of IntonationAI capped pitch detection at 400 Hz — which silently dropped every note above G4 and made the app useless for sopranos, altos, and belters. That’s fixed.

Cents deviation is measured against equal temperament (the 12-tone tuning most Western pop, jazz, and musical theatre use). If you’re singing in a system that isn’t equal-tempered — just intonation, maqam, gamelan — our pitch numbers will flag you as “off” when you’re actually perfectly in tune for the system you’re in. That’s a real limit we don’t currently handle.

Mauch & Dixon, “PYIN: a fundamental frequency estimator using probabilistic threshold distributions,” ICASSP 2014.

Rhythm

How close your note onsets are to the beat

We detect the onset of every note in your recording and measure its distance to the nearest expected beat on the tempo grid. The result is reported as a mean absolute deviation in milliseconds — not a 0-to-100 score. “28 ms late” is a number you can actually feel; “rhythm accuracy: 0.74” was always meaningless.

Under 50 ms is “in the pocket.” Past 150 ms is loose enough that a human listener would hear it. The tempo grid comes from your exercise’s nominal tempo so we can measure rhythm without a backing track playing.

The older version of IntonationAI reported rhythm as one of three fixed values (0.6, 0.8, 1.0) based only on whether any onset was detected and whether the estimated tempo fell in a reasonable range. That was “the mic heard something” dressed up as a score. Gone.

Breath support

How well you carried the phrase on a single breath

Breath support is hard to measure with a microphone alone — a vocal coach in a room with you can feel your ribs and watch your shoulders; we can’t. What we can do is measure three things that correlate with good support and report each separately rather than averaging them into a single opaque number:

Duration ratio — what fraction of the phrase actually voiced (vs. silence, cut-offs, or breathless patches).
Decay slope — how steeply the volume trailed off across the last quarter of a sustained note. A steep decay suggests the support gave out before the note ended.
Release smoothness — how much jitter is in the last ~150 ms of phonation. Smooth release is controlled support; noisy release is running out of air.

These are proxies for breath support, not measurements of it. A singer with perfect breath mechanics but a microphone-farther-than-usual will still see artifacts. A singer using compression/dynamics processing on their mic will see the wrong numbers entirely. Trust the trend over time, not any single session.

Titze, “Voice training and therapy with a semi-occluded vocal tract,” J. Speech Lang. Hear. Res., 2006.

Vibrato & pitch stability

Whether your tone is held or trembling

We compute how much your pitch wobbles from frame to frame. A steady vibrato (3-7 Hz modulation with consistent depth) is reported as vibrato present. A tone that drifts unpredictably registers as low pitch stability — usually a sign of tension or uncertain support, not of musical intent.

Like the breath metric, this is acoustic only: we cannot tell intentional vibrato from an unsteady tone we happen to hear as vibrato-shaped. Use the number as a prompt, not a verdict.

The coach’s commentary

What Coach Joy actually sees when she writes feedback

When you finish a warm-up exercise, the raw numbers above are packaged into a short prompt and sent to a large language model. The prompt explicitly tells the model to write in Knowledge-of-Performance style — describe the action the singer should take next, not the score they received. Motor-learning research (Wulf & Mornell 2008) consistently finds that beginners learn complex movements faster from KP feedback than from scores.

The prompt also forbids the coach from confusing “head voice” with “falsetto,” from telling you to “push” or “reach for” notes at the top of your range, and from commenting on posture or body alignment — she can’t see you, and pretending she can would be a lie.

The commentary is also translated into speech through Google Cloud’s text-to-speech so you hear Coach Joy say it in the same warm voice every time.

What we do NOT measure

Things a real vocal coach knows that we cannot see

The honest list of things IntonationAI is blind to, because a microphone cannot capture them:

Jaw, tongue, and soft-palate position. A coach in a room can see your tongue shape and tell you to drop your jaw. We only hear the resulting sound.
Posture and alignment. Slouching, locked knees, craned neck — all invisible to us.
Subglottal pressure. The actual physiological driver of breath support. We can only measure what the mic hears, which is a lossy proxy.
Laryngeal tension. We can sometimes hear it (a strained upper register) but not always.
Microtonal systems. Our pitch measurement assumes 12-tone equal temperament; if you sing in a different system, the cents number will mislead you.

None of this means IntonationAI can’t help — it absolutely can help you practice more, catch pitch excursions you didn’t notice, and hold yourself accountable to a daily warm-up habit. It means: for anything involving your body, a real coach is still worth finding.

Your self-assessment matters

Before we show you the score, we ask how the take felt to you. If you rate a take “👎” but the measurements score it high, Coach Joy acknowledges that you’re being hard on yourself before giving the technical feedback. If you rate it “👍” but the measurements flag a problem, she gently names one specific thing to listen for next time.

Over time, your self-assessment alongside the objective numbers lets us calibrate better — and lets you notice when your ear and the measurements start to agree.

← Back to home Start free