::| SUBSYNTH

lowlevel audio synthesis subsystem. modular analog synthesizer in software

 : oss.notes : 

Notes from the OSS programming guide found at:

http://www.opensound.com/pguide/oss.pdf
OSS (formerly called VoxWare)
- c api
- device driver for sound cards and other sound devices under unix
- derived from original linux sound driver
- runs on more than a dozen OSs, supporting many sound cards/devices.

sound cards
- have several devices or ports that produce or record sound
- digitized voice device /dev/dsp
  - codec
  - pcm
  - dsp
  - adc/dac
- mixer device
  - handles I/O volume levels.
- synthesizer device
  - plays music
  - sound effects
  - 2 kinds typically
    - Yamaha FM synth chip
      - OPL-2, two operator
      - OPL-3, four operator
    - wave table
      - plays back pre recorded samples
      - very realistic
  - MIDI interface
    - standard
    - serial interface 
    - 31.5 kbps
    - designed to work with on-stage equipment
      - synths
      - keyboards
      - stage props
      - lighting controllers
    - communicate through a MIDI cable  
- joystick port
  - not controlled by OSS
- CD-ROM interface
  - not controlled by OSS
  

programming OSS
- c header:  in -I/usr/lib/oss/include
- OSS provides/supports these devices:
  - /dev/mixer
    - access to built-in mixer circuit on sound card
    - adjust playback and recording levels for sound channels
      - linein
      - master
      - cd
      - wav
      - synth
      - mic
    - supports several mixers on the same system
      - named /dev/mixer0, /dev/mixer1
      - /dev/mixer is just a sym link to one of them (usually /dev/mixer0)
  
  - /dev/sndstat
    - for diagnostic purposes
    - human readable data
    - info for all ports and devices detected by OSS
    - cat /dev/sndstat
    
  - /dev/dsp and /dev/audio
    - main device files for digitized voice apps
    - data written here is played with the 
      DAC/PCM/DSP device on the sound card
    - data read here comes from the input source (default microphone)
    - difference?
      - /dev/audio is logarithmic, provided for SunOS compatibility
      - /dev/dsp is 8-bit unsigned linear
    - use ioctl to select encoding method (affects both)
    - several are supported: 
      - /dev/dsp0 /dev/dsp1
      - /dev/dsp is a symlink to one of them.
      - /dev/audio works similarly
  
  - /dev/sequencer
    - electronic music
    - sound effects (i.e. in games)
    - access synth located on sound cards, or external music synth 
      connected to MIDI port.
    - allows control of up to 15 synth chips and up to 16 MIDI ports
    - allows more precise control than /dev/music
    
  - /dev/music
    - similar to dev sequencer
    - easier than /dev/sequencer
    - handles synth and MIDI devices in the same way
    - enables device independant programming
    - based purely on MIDI protocols
    
  - /dev/midi
    - low level interface to MIDI bus ports
    - similar to TTY (character terminal) devices in raw mode
    - not intended for realtime use (timing not guarenteed)
    - everything sent to MIDI port immediately
    - useful by MIDI SysEx and sample librarian applications (data dumps)
    - /dev/midi00, /dev/midi01, 
    - /dev/midi is a sym link to one of the others.
    
  - /dev/dmfm
    - low level register access to FM chip
  
  - /dev/dmmidi
    - raw interface to MIDI devices
    - direct TTY-like access to the MIDI bus
    - for special applications
    
    
    
- portability considerations
  - gui
  - endian issues
    - when system and audio device differ
    - i.e. big endian RISC system and little endian PCI card
  - OSS hides device specific features behind its API
  - the API is based on universal properties of sound and music, not hardware
  - design your application so the user can select the /dev/ file
    - default /dev/dsp0 file might be broken
    - /dev/dsp1 might be preferred to the user
  - don't assume undocumented values
    - need to query default values
    - i.e. /dev/sequencer timer is usually 100hz, but can be 60 or 1024hz
  - don't open a device twice
  - not all sound cards have a mixer... :)
    - older cards
    - unsupported cards
    - some high end digital only cards
  - not all mixers have a master volume control
    - your app should query the available channels from the driver
  - don't use a device before checking that it exists.

- mixer programming
  - check if it exists: ioctl will fail and set errno to ENXIO if no mixer
  - based on channels.  
  - each channel represents the physical slider
  - vary betwen 0 (off) and 100 (max vol)
  - most channels are stereo (set them separately for balance)
  - usually only one input source is available for recording
  - query the capabilities with ioctl
    - see what channels are actually present
      - channels vary from card to card.
      - 30 channels
      
      
- general sound background info
  - Wavetable Synthesis
    - a method of sound generation that uses digital sound samples stored 
      in memory

  - FM synthesis
    - frequency modulation synthesis
    - procedural method of sound generation
    - uses wave generators/modulators sometimes in combination to produce 
      sound.
    - small memory footprint

  - operator
    - waveform oscillator used to produce sound in FM synthesis
      more operators produce more realistic sounds

  - voice 
    - an independant sound generator

  - sequencer
    - device (hardware or software) which controls (sequences) the playing
      of notes on a music synthesizer

  - patch
    - the device settings for a sound generator (i.e. piano)

  - db (decible)
    - unit to  measure volume of sound.  the scale is logarithmic since the 
      human ear has logarithmic responce
       
      
- audio programming
  - sound is stored as a sequence of samples taken from an 
    audio signal at constant time intervals.
  - each sample represents the volume of the signal at the moment 
    it was measured
  - each sample requires one or more bytes of storage.
  - the number of bytes in a sample depends on channels (mono/stereo) and format
    (8 or 16 bits)
  - the time interval between each sample gives the sampling rate
    - expressed in samples per second (hertz)
    - common: 8khz (telephone) to 48khz (DAT tape) to 96khz (DVD audio)
  - hardware
    - ADC (analog to digital converter)
    - DAC (Digital to analog converter)
    - codec (contains both ADC and DAC)
      - refered to as a "dsp"
  
  - fundamental parameters that affect sound quality
     - samp rate
       - expressed in samp per second or herz
       - limits the highest frequency that can be stored (Nyquist Frequency)
       - Nyquist's Sampling Theorem states: 
         - highest freq that can be reproduced is at most half of the samp freq
         - i.e. if samp rate == 44khz, then the highest freq is 22khz  
     - format / samp size
       - expressed in bits
       - affects the dynamic range
       - i.e. 8bit gives a range of 48db,  16bit gives 96db
    
  - in pratice
    - record and play using the standard system calls
       - open
       - close
       - read
       - write
       - default params are usually very low (speech quality)
    - change device parametrs with ioctl
    - all dev files support read/write, but some devices that can't record
    - most devices can work in half duplex mode (O_RDONLY or O_WRONLY)
      - record and play but not at the same time
    - some devices work in full duplex  (O_RDWR)
    - simplest way to record audio is to use UNIX commands
      - cat /dev/dsp > recorded.file.raw
    - devices are exclusive, if in use they return EBUSY
    - include:
      - 
      - 
      - 
      - 
    - mandatory data:
      - int audio_filedescriptor;
      - unsigned chat audio_buffer[BUF_SIZE]
    - for real time performance, keep the buffer short.  1024 - 4096 is 
      a good range.  Choose a value that is a multiple of your sample size.
      - samp_size = channels * format
    
    - three parameters are needed to describe sampled audio data.
      - sample format (num of bits)
      - num of channels (mono, stereo)
      - samp rate (speed, sampling frequency)
      - for stereo data
        - two samples for each time slot.
        - left channel sample is always stored before the right channel sample
        - this extends for more than 2 samples
    - the device must be reset before setting them:  ioctl( SNDCTL_DSP_RESET
    
    - OSS programming:
       - OSS supports multichannels (i.e. more than 2)
         - professional multichannel audio devices
         - 16 or more mono channels (8 stereo pairs)
         - how to encode multiple channels:
           - interleaved (like with stereo sound data)
           - multiple /dev/dsp devices
           - mixed (dsp1 in 2 channel more, dsp2 in mono, dsp3 in quad, etc...)
       - to access the mixer
         - don't access it directly through /dev/mixer
           - can't guarentee that it is accociated with the /dev/dsp you need
         - instead access the /dev/dsp's ioctl mixer settings.

      
    - for quality sound (avoid clicking)
      - when using a single audio buffer, you cannot write to it while it 
        is being read, this produces a click while writing to the buffer. 
        - the click is your audio hardware starving.
        - forces your application to do only one thing to ensure minimal popping
      - to fix, use the double buffering method
        - 2 buffers, 
          - one is being read while,,,
          - one is being written to
          - then swap, repeat for as long as the device is in use.

    - buffer overruns
      - too much data (results in data loss)

    - buffer underrun
      - not enough data

    - improving latency (for generated sound with realtime requirements)
      - use smaller buffer (fragment) size to improve latency
        - useful to sync sound with the graphics.
      - use select
      - if writing an effect processor (read/write simultaneously) 
        - use full duplex, 
        - or use two devices together...
     
     
   

     
- MIDI
  - what is it
    - Musical Instrument Digital Interface
    - communication protocol
    - hardware level interface
    - communication protocol between devices using the MIDI hardware 
      level interface.
    - doesn't produce the audio, it controls some kind of external synthesizer
      which performs the sound generation
  - hardware details
    - interface
      - asyncronous 
      - serial 
      - byte-oriented
      - similar to RS-232 (serial port)
    - transfer rate is 31250 bits per second
    - MIDI cables
      - connect devices 
      - 5-pin DIN connector at each end
      - single
    - MIDI ports (called MPU401 device - developed by Roland corp.)
      - there are dedicated (professional) MIDI only cards, without audio
      - dumb serial post
      - no sound generation capabilities
      - used to connect to an external MIDI device using MIDI cabling
    - MIDI external device 
      - MIDI keyboard
      - rack mounted tone generator w/out keyboard
      - audio mixer
      - flamethrower, washing machine, etc...
    - connection
      - possible to have unlimited number of devices
        - through daisey chaining
        - MIDI multiplexers (like a Y-cable, but more expensive)
        - one command on the MIDI cable may be processed by unlimited number 
          of devices.  each of them can react to the command as they wish.
        
  - protocol
    - MIDI simply sends and receives bytes
    - you don't care what device is attached, you just see a MIDI port
      - it is even possible that there is nothing connected to the port.
    - each port has 16 possible channels
    - no sound transmitted, only control messages
      - instrument change messages
      - trigger, release (key was pressed)
      - note number (which key)
      - velocity (how hard)
    
  - what is a synthesizer?
    - a synth is a tone generator.
    - hardware tone generation/mixing
      - internal to a computer
        - mounted directly to sound card or motherboard.
        - Yamaha OPL2/3 FM synth
          - OPL2 was used in the adlib (late 80's)
          - OPL3 was used in sound blaster pro
          - OPL4 is FM combined with wave table (OPL3 compatible)
        - Gravis UltraSound (GUS)
          - first wave table sound card on the market
          - 32 simultaneous voices
          - wave samples stored in a table in card's internal RAM
          - 512k to 8MB
          - limited memory, so the aplpication needed to manage patch loading
            and caching.
          - defacto API supported by many wave table device drivers
        - Emu8000 
          - the chip in the SoundBlaster 32/64/AWE cards.
          - similar to GUS, but provides the GM patch set in ROM
          - patch load/cache is not nessesary (but possible)
      - or external
        - Roland Sound Canvas
        - etc...
      - a sound chip can have capabilities beyond what MIDI defines
        - but caps are unavailable if you're only using MIDI to control it
    - software tone generation/mixing
      - www.fruityloops.com
      - www.propellerheads.se
      - SoftOSS
        - software based wavetable engine by 4front technologies.
        - does mixing in software
        - can use any audio card (without wave table) to play MIDI with wave 
          table quality.
    - usually what is connected to a MIDI port device
      - allows standard interface to the synth
      - allows portable code, etc...
    
  - MIDI file format
    - file extension .mid
    - contain only control data, no instrument data (unlike MOD format)
      - instrument timbres are asigned by the playing system.
   
  - MIDI in OSS
    - use /dev/midi, or /dev/midi00, /dev/midi01, /dev/midi02
    - data is sent ASAP (midi data is queued)
    - queue can hold about 60 bytes 
    - you can use Midilib which comes with OSS to read MIDI files.
    - uses the MIDI 1.0 spec
    - follows General MIDI (GM) and Yamaha XG specifications
    - see also www.midi.org, and www.ysba.com

  - MIDI
    - is a highly real time process
    - timing errors can be very noticeable by an experienced listener
    - /dev/music and /dev/sequencer can hold enough data for several seconds.
      - simply need to write more data before the queue becomes empty.
    - queuing is the central part of the API
      - queue for playback and for recording
      - playback happens in the background (async)
      - queue is non-blocking (application never waits)
      - queue is organized as a stream of events
        - events are records of 8 or 4 bytes
          - command code
          - parameter data
        - two main types of event:
          - timing
            - time stamp
              - included before input events
              - used to delay playback
            - input
            - i.e. play a note, volume change, etc...
          - instantaneous
            - when no timing events, the hardware tried to play al inputs
              as fast as possible (simultaneously)
      - events are processed in the order written to the device (FIFO queue)
      - for realtime events, you can send an immediate event ahead of the queue.
        - good for playing real time events (live performance, etc..)

 
   - MIDI Instruments
     - emulate acoustic and artificial sounds
     - MIDI devices are generally multitimbral
       - can emulate more than one instrument
       - to change instrument, send the MIDI port a "program change" message
         - programs (instruments) numbered between 0 and 127 (7 bit addressing)
     - modern devices support the GM (general midi) specification
       - GM maps instruments to defined program numbers
         - i.e. piano is 0
       - numbering starts at 0, some books list them starting at 1 (wrong)
     - usually support other device specific numbering schemes.
     - the MIDI device implements these instruments (timbres)
       usually using:
       - procedural methods such as FM
       - explicit (canned) methods like wave table (which uses recorded samples)
   
   - MIDI Notes
     - playing notes is the main task
     - there are 2 messages in MIDI
       - note on
         - this msg signals a key press
         - it contains info about the key that was pressed  (contols inst pitch)
         - it also contains velocity (controls inst volume and envelope)
       - note off
         - this msg signals a key release
         - after this msg, the sound will decay according to the instrument 
           characteristics
       - each message signals the note number (0 to 127)
         - number of the key on the keyboard.
         - middle C is 60
         
   - Voices and Channels
     - to play a note, the device usually needs one or more voices
     - some notes use many voices (for layering)
     - the number of possible voices played at one time is limited by the device
       - 9 with OPL2
       - 18 with OPL3
       - currently most support 30 or 32
       - future trend is to support 64 or 128
       


intro | documentation | design| requirements| implementation.notes| lit.search| publications

SourceForge Logo