Midi Files
A file format called Standard Midi File SMF is the official format to hold sequence data so any midi device can read it.
The next is an extract of the Midi File Specification:
0 - Introduction
The document outlines the specification for MIDI Files. The purpose of MIDI Files is to provide a way of interchanging time-stamped MIDI data between different programs on the same or different computers. One of the primary design goals is compact representation, which makes it very appropriate for disk-based file format, but which might make it inappropriate for storing in memory for quick access by a sequencer program. (It can be easily converted to a quickly-accessible format on the fly as files are read in or written out.) It is not intended to replace the normal file format of any program, though it could be used for this purpose if desired.
MIDI Files contain one or more MIDI streams, with time information for each event. Song, sequence, and track structures, tempo and time signature information, are all supported. Track names and other descriptive information may be stored with the MIDI data. This format supports multiple tracks and multiple sequences so that if the user of a program which supports multiple tracks intends to move a file to another one, this format can allow that to happen.
This spec defines the 8-bit binary data stream used in the file. The data can be stored in a binary file, nibbilized, 7-bit-ized for efficient MIDI transmission, converted to Hex ASCII, or translated symbolically to a printable text file. This spec addresses what's in the 8-bit stream. It does not address how a MIDI File will be transmitted over MIDI. It is the general feeling that a MIDI transmission protocol will be developed for files in general and MIDI Files will use this scheme.
1 - Sequences, Tracks, Chunks: File Block Structure
CONVENTIONS
In this document, bit 0 means the least significant bit of a byte, and bit 7 is the most significant.
Some numbers in MIDI Files are represented is a form called VARIABLE-LENGTH QUANTITY. These numbers are represented 7 bits per byte, most significant bits first. All bytes except the last have bit 7 set, and the last byte has bit 7 clear. If the number is between 0 and 127, it is thus represented exactly as one byte.
Here are some examples of numbers represented as variable-length quantities:
00000000 00
00000040 40
0000007F 7F
00000080 81 00
00002000 C0 00
00003FFF FF 7F
00004000 81 80 00
00100000 C0 80 00
001FFFFF FF FF 7F
00200000 81 80 80 00
08000000 C0 80 80 00
0FFFFFFF FF FF FF 7F
The largest number which is allowed is 0FFFFFFF so that the variable-length representations must fit in 32 bits in a routine to write variable-length numbers. Theoretically, larger numbers are possible, but 2 x 10^8 96ths of a beat at a fast tempo of 500 beats per minute is four days, long enough for any delta-time!
FILES
To any file system, a MIDI File is simply a series of 8-bit bytes. On the Macintosh, this byte stream is stored in the data fork of a file (with file type 'MIDI'), or on the Clipboard (with data type 'MIDI'). Most other computers store 8-bit byte streams in files -- naming or storage conventions for those computers will be defined as required.
CHUNKS
MIDI Files are made up of -chunks-. Each chunk has a 4-character type and a 32-bit length, which is the number of bytes in the chunk. This structure allows future chunk types to be designed which may be easily be ignored if encountered by a program written before the chunk type is introduced. Your programs should EXPECT alien chunks and treat them as if they weren't there.
Each chunk begins with a 4-character ASCII type. It is followed by a 32-bit length, most significant byte first (a length of 6 is stored as 00 00 00 06). This length refers to the number of bytes of data which follow: the eight bytes of type and length are not included. Therefore, a chunk with a length of 6 would actually occupy 14 bytes in the disk file.
This chunk architecture is similar to that used by Electronic Arts' IFF format, and the chunks described herein could easily be placed in an IFF file. The MIDI File itself is not an IFF file: it contains no nested chunks, and chunks are not constrained to be an even number of bytes long.
Converting it to an IFF file is as easy as padding odd length chunks, and sticking the whole thing inside a FORM chunk.
MIDI Files contain two types of chunks: header chunks and track chunks. A -header- chunk provides a minimal amount of information pertaining to the entire MIDI file. A -track- chunk contains a sequential stream of MIDI data which may contain information for up to 16 MIDI channels. The concepts of multiple tracks, multiple MIDI outputs, patterns, sequences, and songs may all be implemented using several track chunks.
A MIDI File always starts with a header chunk, and is followed by one or more track chunks.
MThd <length of header data>
<header data>
MTrk <length of track data>
<track data>
MTrk <length of track data>
<track data>
. . .
2 - Chunk Descriptions
HEADER CHUNKS
The header chunk at the beginning of the file specifies some basic information about the data in the file. Here's the syntax of the complete chunk:
<Header Chunk> = <chunk type><length><format><ntrks><division>
As described above, <chunk type> is the four ASCII characters 'MThd'; <length> is a 32-bit representation of the number 6 (high byte first).
The data section contains three 16-bit words, stored most-significant byte first.
The first word, <format>, specifies the overall organization of the file.
Only three values of <format> are specified:
0-the file contains a single multi-channel track
1-the file contains one or more simultaneous tracks (or MIDI outputs) of a sequence
2-the file contains one or more sequentially independent single-track patterns
More information about these formats is provided below.
The next word, <ntrks>, is the number of track chunks in the file. It will always be 1 for a format 0 file.
The third word, <division>, specifies the meaning of the delta-times. It has two formats, one for metrical time, and one for time-code-based time:
| 15 |
14 - - - - - - - - - - - - - - - - - - 8 |
7 - - - - - - - - - - - - - - - - - - - 0 |
| 0 |
Ticks per quarter-note |
| 1 |
Negative SMPTE format |
Ticks per frame |
If bit 15 of <division> is zero, the bits 14 thru 0 represent the number of delta time "ticks" which make up a quarter-note. For instance, if division is 96, then a time interval of an eighth-note between two events in the file would be 48.
If bit 15 of <division> is a one, delta times in a file correspond to subdivisions of a second, in a way consistent with SMPTE and MIDI Time Code. Bits 14 thru 8 contain one of the four values -24, -25, -29, or -30, corresponding to the four standard SMPTE and MIDI Time Code formats (-29 corresponds to 30 drop frame), and represents the number of frames per second. These negative numbers are stored in two's compliment form. The second byte (stored positive) is the resolution within a frame: typical values may be 4 (MIDI Time Code resolution), 8, 10, 80 (bit resolution), or 100. This stream allows exact specifications of time-code-based tracks, but also allows millisecond-based tracks by specifying 25|frames/sec and a resolution of 40 units per frame. If the events in a file are stored with a bit resolution of thirty-frame time code, the division word would be E250 hex.
FORMATS 0, 1, AND 2
A Format 0 file has a header chunk followed by one track chunk. It is the most interchangeable representation of data. It is very useful for a simple single-track player in a program which needs to make synthesizers make sounds, but which is primarily concerned with something else such as mixers or sound effect boxes. It is very desirable to be able to produce such a format, even if your program is track-based, in order to work with these simple programs. On the other hand, perhaps someone will write a format conversion from format 1 to format 0 which might be so easy to use in some setting that it would save you the trouble of putting it into your program.
A Format 1 or 2 file has a header chunk followed by one or more track chunks. Programs which support several simultaneous tracks should be able to save and read data in format 1, a vertically one-dementional form, that is, as a collection of tracks. Programs which support several independent patterns should be able to save and read data in format 2, a horizontally one-dementional form. Providing these minimum capabilities will ensure maximum interchangeability.
In a MIDI system with a computer and a SMPTE synchronizer which uses Song Pointer and Timing Clock, tempo maps (which describe the tempo throughout the track, and may also include time signature information, so that the bar number may be derived) are generally created on the computer. To use them with the synchronizer, it is necessary to transfer them from the computer.
To make it easy for the synchronizer to extract this data from a MIDI File, tempo information should always be stored in the first MTrk chunk. For a format 0 file, the tempo will be scattered through the track and the tempo map reader should ignore the intervening events; for a format 1 file, the tempo map must be stored as the first track. It is polite to a tempo map reader to offer your user the ability to make a format 0 file with just the tempo, unless you can use format 1.
All MIDI Files should specify tempo and time signature. If they don't, the time signature is assumed to be 4/4, and the tempo 120 beats per minute. In format 0, these meta-events should occur at least at the beginning of the single multi-channel track. In format 1, these meta-events should be contained i| the first track. In format 2, each of the temporally independent patterns should contain at least initial time signature and tempo information.
We may decide to define other format IDs to support other structures. A program encountering an unknown format ID may still read other MTrk chunks it finds from the file, as format 1 or 2, if its user can make sense of them and arrange them into some other structure if appropriate. Also, more parameters may be added to the MThd chunk in the future: it is important to read and honor the length, even if it is longer than 6.
TRACK CHUNKS
The track chunks (type MTrk) are where actual song data is stored. Each track chunk is simply a stream of MIDI events (and non-MIDI events), preceded by delta-time values. The format for Track Chunks (described below) is exactly the same for all three formats (0, 1, and 2: see "Header Chunk" above) of MIDI Files.
Here is the syntax of an MTrk chunk (the + means "one or more": at least one MTrk event must be present):
<Track Chunk> = <chunk type><length><MTrk event>+
The syntax of an MTrk event is very simple:
<MTrk event> = <delta-time><event>
<delta-time> is stored as a variable-length quantity. It represents the amount of time before the following event. If the first event in a track occurs at the very beginning of a track, or if two events occur simultaneously, a delta-time of zero is used. Delta-times are always present. (Not storing delta-times of 0 requires at least two bytes for any other value, and most delta-times aren't zero.) Delta-time is in some fraction of a beat (or a second, for recording a track with SMPTE times), as specified in the header chunk.
<event> = <MIDI event> | <sysex event> | <meta-event>
<MIDI event> is any MIDI channel message. Running status is used: status bytes of MIDI channel messages may be omitted if the preceding event is a MIDI channel message with the same status. The first event in each MTrk chunk must specify status. Delta-time is not considered an event itself: it is an integral part of the syntax for an MTrk event. Notice that running status occurs across delta-times.
<sysex event> is used to specify a MIDI system exclusive message, either as one unit or in packets, or as an "escape" to specify any arbitrary bytes to be transmitted. A normal complete system exclusive message is stored in a MIDI File in this way:
F0 <length> <bytes to be transmitted after F0>
The length is stored as a variable-length quantity. It specifies the number of bytes which follow it, not including the F0 or the length itself. For instance, the transmitted message F0 43 12 00 07 F7 would be stored in a MIDI File as F0 05 43 12 00 07 F7. It is required to include the F7 at the end so that the reader of the MIDI File knows that it has read the entire message.
Another form of sysex event is provided which does not imply that an F0 should be transmitted. This may be used as an "escape" to provide for the transmission of things which would not otherwise be legal, including system real time messages, song pointer or select, MIDI Time Code, etc. This uses the F7 code:
F7 <length> <all bytes to be transmitted>
Unfortunately, some synthesizer manufacturers specify that their system exclusive messages are to be transmitted as little packets. Each packet is only part of an entire syntactical system exclusive message, but the times they are transmitted are important. Examples of this are the bytes sent in a CZ patch dump, or the FB-01's "system exclusive mode" in which micro tonal data can be transmitted. The F0 and F7 sysex events may be used together to break up syntactically complete system exclusive messages into timed packets.
An F0 sysex event is used for the first packet in a series -- it is a message in which the F0 should be transmitted. An F7 sysex event is used for the remainder of the packets, which do not begin with F0. (Of course, the F7 is not considered part of the system exclusive message).
A syntactic system exclusive message must always end with an F7, even if the real-life device didn't send one, so that you know when you've reached the end of an entire sysex message without looking ahead to the next event in the MIDI File. If it's stored in one complete F0 sysex event, the last byte must be an F7. There also must not be any transmittable MIDI events in between the packets of a multi-packet system exclusive message. This principle is illustrated in the paragraph below.
Here is a MIDI File of a multi-packet system exclusive message: suppose the bytes F0 43 12 00 were to be sent, followed by a 200-tick delay, followed by the bytes 43 12 00 43 12 00, followed by a 100-tick delay, followed by the bytes 43 12 00 F7, this would be in the MIDI File:
F0 03 43 12 00
81 48 200-tick delta time
F7 06 43 12 00 43 12 00
64 100-tick delta time
F7 04 43 12 00 F7
When reading a MIDI File, and an F7 sysex event is encountered without a preceding F0 sysex event to start a multi-packet system exclusive message sequence, it should be presumed that the F7 event is being used as an "escape". In this case, it is not necessary that it end with an F7, unless it is desired that the F7 be transmitted.
<meta-event> specifies non-MIDI information useful to this format or to sequencers, with this syntax:
FF <type> <length> <bytes>
All meta-events begin with FF, then have an event type byte (which is always less than 128), and then have the length of the data stored as a variable-length quantity, and then the data itself. If there is no data, the length is 0. As with chunks, future meta-events may be designed which may not be known to existing programs, so programs must properly ignore meta-events which they do not recognize, and indeed should expect to see them. Programs must never ignore the length of a meta-event which they do not recognize, and they shouldn't be surprised if it's bigger than expected. If so, they must ignore everything past what they know about.
However, they must not add anything of their own to the end of the meta-event. Sysex events and meta-events cancel any running status which was in effect. Running status does not apply to and may not be used for these messages.
3 - Meta-Events A few meta-events are defined herein. It is not required for every program to support every meta-event.
In the syntax descriptions for each of the meta-events a set of conventions is used to describe parameters of the events. The FF which begins each event, the type of each event, and the lengths of events which do not have a variable amount of data are given directly in hexadecimal. A notation such as dd or se, which consists of two lower-case letters, mnemonically represents an 8-bit value. Four identical lower-case letters such as wwww mnemonically refer to a 16-bit value, stored most-significant-byte first.
Six identical lower-case letters such as tttttt refer to a 24-bit value, stored most-significant-byte first. The notation len refers to the length portion of the meta-event syntax, that is, a number, stored as a variable-length quantity, which specifies how many bytes (possibly text) data were just specified by the length.
In general, meta-events in a track which occur at the same time may occur in any order. If a copyright event is used, it should be placed as early as possible in the file, so it will be noticed easily. Sequence Number and Sequence/Track Name events, if present, must appear at time 0. An end-of-track event must occur as the last event in the track. Meta-events initially defined include:
FF 00 02 ss ss Sequence Number
This optional event, which must occur at the beginning of a track, before any nonzero delta-times, and before any transmittable MIDI events, specifies the number of a sequence. The two data bytes ss ss, are that number which corresponds to the MIDI Cue message. In a format 2 MIDI File, it is used to identify each "pattern" so that a "song" sequence using the Cue message to refer to the patterns. If the ss ss numbers are omitted (ie, length byte = 0 instead of 2), the sequences' locations in order in the file are used as defaults. In a format 0 or 1 MIDI File, which only contain one sequence, this number should be contained in the first (or only) track. If transfer of several multi track sequences is required, this must be done as a group of format 1 files, each with a different sequence number.
FF 01 len text Text Event
Any amount of text describing anything. It is a good idea to put a text event right at the beginning of a track, with the name of the track, a description of its intended orchestration, and any other information which the user wants to put there. Text events may also occur at other times in a track, to be used as lyrics, or descriptions of cue points.
The text in this event should be printable ASCII characters for maximum interchange. However, other characters codes using the high-order bit may be used for interchange of files between different programs on the same computer which supports an extended character set. Programs on a computer which does not support non-ASCII characters should ignore those characters.
Meta-event types 01 through 0F are reserved for various types of text events, each of which meet the specification of text events (above) but are used for a different purpose:
FF 02 len text Copyright Notice
Contain a copyright notice as printable ASCII text. The notice should contain the characters (C), the year of the copyright, and the owner of the copyright. If several pieces of music are in the same MIDI File, all of the copyright notices should be placed together in this event so that it will be at the beginning of the file. This event should be the first event in the track chunk, at time 0.
FF 03 len text Sequence/Track Name
If in a format 0 track, or the first track in a format 1 file, the name of the sequence. Otherwise, the name of the track.
FF 04 len text Instrument Name
A description of the type of instrumentation to be used in that track. May be used with the MIDI Prefix meta-event to specify which MIDI channel the description applies to, or the channel may be specified as text in the event itself.
FF 05 len text Lyric
A lyric to be sung. Generally, each syllable will be a separate lyric event which begins at the event's time.
FF 06 len text Marker
Normally in a format 0 track, or the first track in a format 1 file. The name of that point in the sequence, such as a rehearsal letter or section name ("First Verse", etc.)
FF 07 len text Cue Point
A description of something happening on a film or video screen or stage at that point in the musical score ("Car crashes into house", "curtain opens", "she slaps his face", etc.)
FF 20 01 cc MIDI Channel Prefix
The MIDI channel (0-15) contained in this event may be used to associate a MIDI channel with all events which follow, including System exclusive and meta-events. This channel is "effective" until the next normal MIDI event (which contains a channel) or the next MIDI Channel Prefix meta-event. If MIDI channels refer to "tracks", this message may into a format 0 file, keeping their non-MIDI data associated with a track. This capability is also present in Yamaha's ESEQ file format.
FF 2F 00 End of Track
This event is not optional. It is included so that an exact ending point may be specified for the track, so that an exact length, which is necessary for tracks which are looped or concatenated.
FF 51 03 tttttt Set Tempo (in microseconds per MIDI quarter-note)
This event indicates a tempo change. Another way of putting "microseconds per quarter-note" is "24ths of a microsecond per MIDI clock". Representing tempos as time per beat instead of beat per time allows absolutely exact long-term synchronization with a time-based sync protocol such as SMPTE time code or MIDI time code. This amount of accuracy provided by this tempo resolution allows a four-minute piece at 120 beats per minute to be accurate within 500 usec at the end of the piece. Ideally, these events should only occur where MIDI clocks would be located -- this convention is intended to guarantee, or at least increase the likelihood, of compatibility with other synchronization devices so that a time signature/tempo map stored in this format may easily be transfered to another device.
BPM
Normally, musicians express tempo as "the amount of quarter notes in every minute (ie, time period)". This is the opposite of the way that the MIDI file format expresses it.
When musicians refer to a "beat" in terms of tempo, they are referring to a quarter note (ie, a quarter note is always 1 beat when talking about tempo, regardless of the time signature. Yes, it's a bit confusing to non-musicians that the time signature's "beat" may not be the same thing as the tempo's "beat" -- it won't be unless the time signature's beat also happens to be a quarter note. But that's the traditional definition of BPM tempo). To a musician, tempo is therefore always "how many quarter notes happen during every minute". Musicians refer to this measurement as BPM (ie, Beats Per Minute). So a tempo of 100 BPM means that a musician must be able to play 100 steady quarter notes, one right after the other, in one minute. That's how "fast" the "musical tempo" is at 100 BPM. It's very important that you understand the concept of how a musician expresses "musical tempo" (ie, BPM) in order to properly present tempo settings to a musician, and yet be able to relate it to how the MIDI file format expresses tempo.
To convert the MIDI file format's tempo (ie, the 3 bytes that specify the amount of microseconds per quarter note) to BPM:
BPM = 60,000,000/(tt tt tt)
For example, a tempo of 120 BPM = 07 A1 20 microseconds per quarter note.
So why does the MIDI file format use "time per quarter note" instead of "quarter notes per time" to specify its tempo? Well, its easier to specify more precise tempos with the former. With BPM, sometimes you have to deal with fractional tempos (for example, 100.3 BPM) if you want to allow a finer resolution to the tempo. Using microseconds to express tempo offers plenty of resolution.
Also, SMPTE is a time-based protocol (ie, it's based upon seconds, minutes, and hours, rather than a musical tempo). Therefore it's easier to relate the MIDI file's tempo to SMPTE timing if you express it as microseconds. Many musical devices now use SMPTE to sync their playback.
PPQN Clock
A sequencer typically uses some internal hardware timer counting off steady time (ie, microseconds perhaps) to generate a software "PPQN clock" that counts off the timebase (Division) "ticks". In this way, the time upon which an event occurs can be expressed to the musician in terms of a musical bar:beat:PPQN-tick rather than how many microseconds from the start of the playback. Remember that musicians always think in terms of a beat, not the passage of seconds, minutes, etc.
As mentioned, the microsecond tempo value tells you how long each one of your sequencer's "quarter notes" should be. From here, you can figure out how long each one of your sequencer's PPQN clocks should be by dividing that microsecond value by your MIDI file's Division. For example, if your MIDI file's Division is 96 PPQN, then that means that each of your sequencer's PPQN clock ticks at the above tempo should be 500,000 / 96 (or 5,208.3) microseconds long (ie, there should be 5,208.3 microseconds inbetween each PPQN clock tick in order to yield a tempo of 120 BPM at 96 PPQN. And there should always be 96 of these clock ticks in each quarter note, 48 ticks in each eighth note, 24 ticks in each sixteenth, etc).
Note that you can have any timebase at any tempo. For example, you can have a 96 PPQN file playing at 100 BPM just as you can have a 192 PPQN file playing at 100 BPM. You can also have a 96 PPQN file playing at either 100 BPM or 120 BPM. Timebase and tempo are two entirely separate quantities. Of course, they both are needed when you setup your hardware timer (ie, when you set how many microseconds are in each PPQN tick). And of course, at slower tempos, your PPQN clock tick is going to be longer than at faster tempos.
MIDI Clock
MIDI clock bytes are sent over MIDI, in order to sync the playback of 2 devices (ie, one device is generating MIDI clocks at its current tempo which it internally counts off, and the other device is syncing its playback to the receipt of these bytes). Unlike with SMPTE frames, MIDI clock bytes are sent at a rate related to the musical tempo.
Since there are 24 MIDI Clocks in every quarter note, the length of a MIDI Clock (ie, time inbetween each MIDI Clock message) is the microsecond tempo divided by 24. In the above example, that would be 500,000/24, or 20,833.3 microseconds in every MIDI Clock. Alternately, you can relate this to your timebase (ie, PPQN clock). If you have 96 PPQN, then that means that a MIDI Clock byte must occur every 96 / 24 (ie, 4) PPQN clocks.
SMPTE
SMPTE counts off the passage of time in terms of seconds, minutes, and hours (ie, the way that non-musicians count time). It also breaks down the seconds into smaller units called "frames". The movie industry created SMPTE, and they adopted 4 different frame rates. You can divide a second into 24, 25, 29, or 30 frames. Later on, even finer resolution was needed by musical devices, and so each frame was broken down into "subframes".
So, SMPTE is not directly related to musical tempo. SMPTE time doesn't vary with "musical tempo".
Many devices use SMPTE to sync their playback. If you need to sychronize with such a device, then you may need to deal with SMPTE timing. Of course, you're probably still going to have to maintain some sort of PPQN clock, based upon the passing SMPTE subframes, so that the user can adjust the tempo of the playback in terms of BPM, and can consider the time of each event in terms of bar:beat:tick. But since SMPTE doesn't directly relate to musical tempo, you have to interpolate (ie, calculate) your PPQN clocks from the passing of subframes/frames/seconds/minutes/hours (just as we previously calculated the PPQN clock from a hardware timer counting off microseconds).
Let's take the easy example of 25 Frames and 40 SubFrames. As previously mentioned in the discussion of Division, this is analogous to millisecond based timing because you have 1,000 SMPTE subframes per second. (You have 25 frames per second. Each second is divided up into 40 subframes, and you therefore have 25 * 40 subframes per second. And remember that 1,000 milliseconds are also in every second). Every millisecond therefore means that another subframe has passed (and vice versa). Every time you count off 40 subframes, a SMPTE frame has passed (and vice versa). Etc.
Let's assume you desire 96 PPQN and a tempo of 500,000 microseconds. Considering that with 25-40 Frame-SubFrame SMPTE timing 1 millisecond = 1 subframe (and remember that 1 millisecond = 1,000 microseconds), there should be 500,000 / 1,000 (ie, 500) subframes per quarter note. Since you have 96 PPQN in every quarter note, then every PPQN ends up being 500 / 96 subframes long, or 5.2083 milliseconds (ie, there's how we end up with that 5,208.3 microseconds PPQN clock tick just as we did above in discussing PPQN clock). And since 1 millisecond = 1 subframe, every PPQN clock tick also equals 5.2083 subframes at the above tempo and timebase.
Conclusions
BPM = 60,000,000/MicroTempo
MicrosPerPPQN = MicroTempo/TimeBase
MicrosPerMIDIClock = MicroTempo/24
PPQNPerMIDIClock = TimeBase/24
MicrosPerSubFrame = 1000000 * Frames * SubFrames
SubFramesPerQuarterNote = MicroTempo/(Frames * SubFrames)
SubFramesPerPPQN = SubFramesPerQuarterNote/TimeBase
MicrosPerPPQN = SubFramesPerPPQN * Frames * SubFrames
FF 54 05 hr mn se fr ff SMPTE Offset
This event, if present, designates the SMPTE time at which the track chunk is supposed to start. It should be present at the beginning of the track, that is, before any nonzero delta-times, and before any transmittable MIDI events. the hour must be encoded with the SMPTE format, just as it is in MIDI Time Code. In a format 1 file, the SMPTE Offset must be stored with the tempo map, and has no meaning in any of the other tracks. The ff field contains fractional frames, in 100ths of a frame, even in SMPTE-based tracks which specify a different frame subdivision for delta-times.
FF 58 04 nn dd cc bb Time Signature
The time signature is expressed as four numbers. nn and dd represent the numerator and denominator of the time signature as it would be notated. The denominator is a negative power of two: 2 represents a quarter-note, 3 represents an eighth-note, etc. The cc parameter expresses the number of MIDI clocks in a metronome click. The bb parameter expresses the number of notated 32nd-notes in a MIDI quarter-note (24 MIDI clocks). This was added because there are already multiple programs which allow a user to specify that what MIDI thinks of as a quarter-note (24 clocks) is to be notated as, or related to in terms of, something else.
Therefore, the complete event for 6/8 time, where the metronome clicks every three eighth-notes, but there are 24 clocks per quarter-note, 72 to the bar, would be (in hex):
FF 58 04 06 03 24 08
That is, 6/8 time (8 is 2 to the 3rd power, so this is 06 03), 36 MIDI clocks per dotted-quarter (24 hex!), and eight notated 32nd-notes per quarter-note.
FF 59 02 sf mi Key Signature
sf = -7: 7 flats
sf = -1: 1 flat
sf = 0: key of C
sf = 1: 1 sharp
sf = 7: 7 sharps
mi = 0: major key
mi = 1: minor key
FF 7F len data Sequencer Specific Meta-Event
Special requirements for particular sequencers may use this event type: the first byte or bytes of data is a manufacturer ID (these are one byte, or if the first byte is 00, three bytes). As with MIDI System Exclusive, manufacturers who define something using this meta-event should publish it so that others may be used by a sequencer which elects to use this as its only file format; sequencers with their established feature-specific formats should probably stick to the standard features when using this format.
4 - Program Fragments and Example MIDI Files Here are some of the routines to read and write variable-length numbers in MIDI Files. These routines are in Pascal, and use read and write, which read and write single 8-bit characters from/to the files infile and outfile.
procedure WriteVarLen (value:longint);
var buffer:longint;
begin
buffer := value and $7f;
value := value shr 7;
while (value > 0) do begin
buffer := buffer shl 8;
buffer := buffer or $80;
buffer := buffer + (value and $7f);
value := value shr 7;
end;
while (TRUE) do begin
write(outfile,buffer);
if (buffer and $80)<> 0 then
buffer := buffer shl 8;
else
break;
end;
end;
|
function ReadVarLen:cardinal;
var value:cardinal;
c: byte;
begin
read(infile,value);
if (value and $80) <> 0 then begin
value := value and $7f;
repeat
read(infile,c);
value = (value shl 7) + (c and $7f);
until (c and $80) = 0;
end;
ReadVarLen := value;
end; |
As an example, MIDI Files for the following excerpt are shown below. First, a format 0 file is shown, with all information intermingled; then, a format 1 file is shown with all data separated into four tracks: one for tempo and time signature, and three for the notes. A resolution of 96 "ticks" per quarter note is used. A time signature of 4/4 and a tempo of 120, though implied, are explicitly stated.
|\
---- | > ---------------------------------------
|/ ____ O
Channel 1 ---- X --------------------------------|--------
/ |
Preset 5 -- / | --------------------------------|--------
/ ____ |
-| | \ --------------------------------------
\ | |
-- \_|__/ --------------------------------------
_|
|\
---- | > ---------------------------------------
|/ \
Channel 2 ---- X ------------>----------|-----------------
/ / |
Preset 46 -- / | ----------<------------|-----------------
/ ____ \ | .
-| | \ --------->---------O------------------
\ | | (
-- \_|__/ --------\-----------------------------
_| \
--O--
----__ -----------------------------------------
/ \ .
Channel 3 - / | ---------------------------------------
| .
Preset 70 ------ | ---------------------------------------
/ O
---- / -----------------------------------------
/
---/ ---------------------------------------
The contents of the MIDI stream represented by this example are broken down here:
|
Delta Time
(Decimal) |
Event-Code
(hex) |
Other Bytes
(decimal) |
Comment |
|
0 |
FF 58 |
04 04 02 24 08 |
4 bytes; 4/4 time; 24 MIDI clocks/click, 8 32nd notes/24 MIDI clocks |
|
0 |
FF 51 |
03 500000 |
3 bytes: 500,000 usec/quarter note |
|
0 |
C0 |
5 |
Ch.1 Program Change 5 |
|
0 |
C1 |
46 |
Ch.2 Program Change 46 |
|
0 |
C2 |
70 |
Ch.3 Program Change 70 |
|
0 |
92 |
48 96 |
Ch.3 Note On C2, forte |
|
0 |
92 |
60 96 |
Ch.3 Note On C3, forte |
|
96 |
91 |
67 64 |
Ch.2 Note On G3, mezzo-forte |
|
96 |
90 |
76 32 |
Ch.1 Note On E4, piano |
|
192 |
82 |
48 64 |
Ch.3 Note Off C2, standard |
|
0 |
82 |
60 64 |
Ch.3 Note Off C3, standard |
|
0 |
81 |
67 64 |
Ch.2 Note Off G3, Standard |
|
0 |
80 |
76 64 |
Ch.1 Note Off E4, standard |
|
0 |
FF 2F |
00 |
Track End |
The entire format 0 MIDI file contents in hex follow. First, the header chunk:
|
|
40 54 68 64
00 00 00 06
00 00
00 01
00 60 |
|
MThd
Chunk length
Format 0
One track
96 per quarter-note |
Then the track chunk. Its header followed by the events (notice the running status is used in places):
|
|
4D 54 72 6B
00 00 00 3B |
|
MTrk
Chunk length (59) |
|
Delta Time |
Event |
Comments |
|
00
00
00
00
00
00
60
60
81 40
00
00
00
00 |
FF 58 04 04 02 18 08
FF 51 03 07 A1 20
C0 05
C1 46
92 30 60
3C 60
91 43 60
90 4C 20
82 30 40
3C 40
81 43 40
80 4C 40
FF 2F 00 |
Time signature
Tempo
Running status
Two byte delta time
Running status
End of track |
A format 1 representation of the file is slightly different. Its header chunk:
|
|
40 54 68 64
00 00 00 06
00 01
00 04
00 60 |
|
MThd
Chunk length
Format 1
Four tracks
96 per quarter-note |
First, the track chunk for the time signature/tempo track. Its header, followed by the events:
|
|
4D 54 72 6B
00 00 00 14 |
|
MTrk
Chunk length (20) |
|
Delta Time |
Event |
Comments |
|
00
00
83 00 |
FF 58 04 04 02 18 08
FF 51 03 07 A1 20
FF 2F 00 |
Time signature
Tempo
End of track |
Then, the track chunk for the first music track. The MIDI convention for note on/off running status is used in this example:
|
|
4D 54 72 6B
00 00 00 10 |
|
MTrk
Chunk length (16) |
|
Delta Time |
Event |
Comments |
|
00
81 40
81 40
00 |
C0 05
90 4C 20
4C 00
FF 2F 00 |
Running status |
Then, the track chunk for the second music track:
|
|
4D 54 72 6B
00 00 00 0F |
|
MTrk
Chunk length (15) |
|
Delta Time |
Event |
Comments |
|
00
60
82 20
00 |
C1 2E
91 43 40
43 00
FF 2F 00 |
Running status
End of track |
Then, the track chunk for the third music track:
|
|
4D 54 72 6B
00 00 00 15 |
|
MTrk
Chunk length (21) |
|
Delta Time |
Event |
Comments |
|
00
00
00
83 00
00
00 |
C2 46
92 30 60
3C 60
30 00
3C 00
FF 2F 00 |
Running status
Two byte delta time, running status
Running status
End of track |
Appendix A
1. MIDI Event Commands
Each command byte has 2 parts. The left nibble (4 bits) contains the actual command, and the right nIbble contains the midi channel number on which the command will be executed. There are 16 midi channels, and 8 midi commands (the command nibble must have a msb of 1).
In the following table, x indicates the midi channel number. Note that all data bytes will be <128 (msb set to 0).
|
Hex |
Binary |
Data |
Description |
|
8x |
1000xxxx |
nn vv |
Note off (key is released)
nn=note number
vv=velocity |
|
9x |
1001xxxx |
nn vv |
Note on (key is pressed)
nn=note number
vv=velocity |
|
Ax |
1010xxxx |
nn vv |
Key after-touch
nn=note number
vv=velocity |
|
Bx |
1011xxxx |
cc vv |
Control Change
Certain controller numbers (last 4) are reserved for specific purposes. See Channel Mode Messages
cc=controller number
vv=new value |
|
Cx |
1100xxxx |
pp |
Program (patch) change
pp=new program number |
|
Dx |
1101xxxx |
cc |
Channel after-touch
cc=channel number |
|
Ex |
1110xxxx |
bb tt |
Pitch wheel change (2000H is normal or no change)
bb=bottom (least sig) 7 bits of value
tt=top (most sig) 7 bits of value |
Channel Mode Messages (See also Control Change, above)
This the same code as the Control Change (above), but implements Mode control by using reserved controller numbers. The numbers are:
Local Control.
When Local Control is Off, all devices on a given channel will respond only to data received over MIDI. Played data, etc. will be ignored. Local Control On restores the functions of the normal controllers.
c = 122, v = 0: Local Control Off
c = 122, v = 127: Local Control On
All Notes Off.
When an All Notes Off is received, all oscillators will turn off.
c = 123, v = 0: All Notes Off
Mode Commands (See text for description of actual mode commands.)
c = 124, v = 0: Omni Mode Off
c = 125, v = 0: Omni Mode On
c = 126, v = M: Mono Mode On (Poly Off) where M is the number of channels (Omni Off) or 0 (Omni On)
c = 127, v = 0: Poly Mode On (Mono Off) (Note: These four messages also cause All Notes Off)
|
System Common Messages |
|
Hex |
Binary |
Data |
Description |
|
F0 |
11110000 |
0iiiiiii
0ddddddd
...
0ddddddd
11110111 |
System Exclusive
This message makes up for all that MIDI doesn't support. (iiiiiii) is a seven Manufacture's I.D. code. If the device recognizes the I.D. code as its own, it will listes to the rest of the message (ddddddd). Otherwise, the message will be ignored. System Exclusive is used to send bulk dumps such as patch parameters and other non-spec data.
(Note: Real-Time messages ONLY may be interleaved with a System Exclusive.) |
|
F1 |
11110001 |
|
Undefined |
|
F2 |
11110010 |
0lllllll
0mmmmmmm |
Song Position Pointer
This is an internal 14 bit register that holds the number of MIDI beats (1 beat=six MIDI clocks) since the start of the song. l is the LSB, m the MSB. |
|
F3 |
11110011 |
0sssssss |
Song Select
The Song Select specifies which sequence or song is to be played. |
|
F4 |
11110100 |
|
Undefined |
|
F5 |
11110101 |
|
Undefined |
|
F6 |
11110110 |
|
Tune Request
Upon receiving a Tune Request, all analog synthesizers should tune their oscillators. |
|
F7 |
11110111 |
|
End of Exclusive
Used to terminate a System Exclusive dump (see above). |
The following table lists system messages which control the entire system. These have no midi channel number. (these will generally only apply to controlling a midi keyboard, etc.)
|
System Real-Time Messages |
|
Hex |
Binary |
Data |
Description |
|
F8 |
11111000 |
|
Sent 24 times per quarter note when synchronization is required (see text). |
|
F9 |
11111001 |
|
Undefined |
|
FA |
11111010 |
|
Start current sequence. (This message will be followed with Timing Clocks). |
|
FB |
11111011 |
|
Continue a stopped sequence where left off |
|
FC |
11111100 |
|
Stop a sequence |
|
FD |
11111101 |
|
Undefined |
|
FE |
11111110 |
|
Active Sensing.
Use of this message is optional. When initially sent, the receiver will expect to receive another Active Sensing message each 300ms (max), or it will be assume that the connection has been terminated.
At termination, the receiver will turn off all voices and return to normal (non-active sensing) operation. |
|
FF |
11111111 |
|
Reset.
Reset all receivers in the system to power-up status. This should be used sparingly, preferably under manual control. In particular, it should not be sent on power-up. |
The following table lists the numbers corresponding to notes for use in note on and note off commands.
|
Oct |
Note Numbers (Decimal & Hexadecimal) |
|
# |
C |
C# |
D |
D# |
E |
F |
F# |
G |
G# |
A |
A# |
B |
|
0 |
0 |
0 |
1 |
1 |
2 |
2 |
3 |
3 |
4 |
4 |
5 |
5 |
6 |
6 |
7 |
7 |
8 |
8 |
9 |
9 |
10 |
0A |
11 |
0B |
1 |
12 |
0C |
13 |
0D |
14 |
0E |
15 |
0F |
16 |
10 |
17 |
11 |
18 |
12 |
19 |
13 |
20 |
14 |
21 |
15 |
22 |
16 |
23 |
17 |
|
2 |
24 |
18 |
25 |
19 |
26 |
1A |
27 |
1B |
28 |
1C |
29 |
1D |
30 |
1E |
31 |
1F |
32 |
20 |
33 |
21 |
34 |
22 |
35 |
23 |
|
3 |
36 |
24 |
37 |
25 |
38 |
26 |
39 |
27 |
40 |
28 |
41 |
29 |
42 |
2A |
43 |
2B |
44 |
2C |
45 |
2D |
46 |
2D |
47 |
2E |
|
4 |
48 |
30 |
49 |
31 |
50 |
32 |
51 |
33 |
52 |
34 |
53 |
35 |
54 |
36 |
55 |
37 |
56 |
38 |
57 |
39 |
58 |
3A |
59 |
3B |
|
5 |
60 |
3C |
61 |
3D |
62 |
3E |
63 |
3F |
64 |
40 |
65 |
41 |
66 |
42 |
67 |
43 |
68 |
44 |
69 |
45 |
70 |
46 |
71 |
47 |
|
6 |
72 |
48 |
73 |
49 |
74 |
4A |
75 |
4B |
76 |
4C |
77 |
4D |
78 |
4E |
79 |
4F |
80 |
50 |
81 |
51 |
82 |
52 |
83 |
53 |
|
7 |
84 |
54 |
85 |
55 |
86 |
56 |
87 |
57 |
88 |
58 |
89 |
59 |
90 |
5A |
91 |
5B |
92 |
5C |
93 |
5D |
94 |
5E |
95 |
5F |
|
8 |
96 |
60 |
97 |
61 |
98 |
62 |
99 |
63 |
100 |
64 |
101 |
65 |
102 |
66 |
103 |
67 |
104 |
68 |
105 |
69 |
106 |
6A |
107 |
6B |
|
9 |
108 |
6C |
109 |
6D |
110 |
6E |
111 |
6F |
112 |
70 |
113 |
71 |
114 |
72 |
115 |
73 |
116 |
74 |
117 |
75 |
118 |
76 |
119 |
77 |
|
10 |
120 |
78 |
121 |
79 |
122 |
7A |
123 |
7B |
124 |
7C |
125 |
7D |
126 |
7E |
127 |
7F |
|
|
|
|
|
|
|
RMID Files
The method of saving data in chunks (ie, where the data is preceded by an 8 byte header consisting of a 4 char ID and a 32-bit size field) is the basis for Interchange File Format. You should now read the article About Interchange File Format for background information.
As mentioned, MIDI File format is a "broken" IFF. It lacks a file header at the start of the file. One bad thing about this is that a standard IFF parsing routine will choke on a MIDI file (because it will expect the first 12 bytes to be the group ID, filesize, and type ID fields). In order to fix the MIDI File format so that it strictly adheres to IFF, Microsoft simply made up a 12-byte header that is prepended to MIDI files, and thereby came up with the RMID format. An RMID file begins with the group ID (4 ascii chars) of 'R', 'I', 'F', 'F', followed by the 32-bit filesize field, and then the type ID of 'R', 'M', 'I', 'D'. Then, the chunks of a MIDI file follow (ie, the MThd and MTrk chunks). If you chop off the first 12 bytes of an RMID file, then you end up with a standard MIDI file.
Note that chunks within a MIDI file are not padded out (with an extra 0 byte) to an even number of bytes. I don't know as if the RMID format corrects this aberration of the MIDI file format too.
|