PHYSICS

SOUND is defined as any disturbance that travels through an elastic medium such as air, ground, or water to be heard by the human ear. When a body vibrates, or moves back and forth (see vibration ), the oscillation causes a periodic disturbance of the surrounding air or other medium that radiates outward in straight lines in the form of a pressure wave . The effect these waves produce upon the ear is perceived as sound. From the point of view of physics, sound is considered to be the waves of vibratory motion themselves, whether or not they are heard by the human ear.

Audible frequency range is considered to be between 16-20,000 Hz. Although humans are perceived to be most sensitive to frequencies between 0.2 and 3 KHz. In this range they can determine the direction of the sound. Anything outside of it they can't tell if it came from the right or the left. The human ear is also considered to be able to register the amplitude changes of intensity in the range from 0 to 130 dB. The sound travels through the air at a speed of approximately 342 m/s (at 20 deg. C and 1 atm). 

Musical sounds are distinguished from noises in that they are composed of regular, uniform vibrations, while noises are irregular and disordered vibrations. Composers, however, frequently use noises as well as musical sounds. One musical tone is distinguished from another on the basis of pitch, intensity, or loudness, and quality, or timbre. Pitch describes how high or low a tone is and depends upon the rapidity with which a sounding body vibrates, i.e., upon the frequency of vibration. The higher the frequency of vibration, the higher the tone; the pitch of a siren gets higher and higher as the frequency of vibration increases. The apparent change in the pitch of a sound as a source approaches or moves away from an observer is described by the Doppler effect . The intensity or loudness of a sound depends upon the extent to which the sounding body vibrates, i.e., the amplitude of vibration. A sound is louder as the amplitude of vibration is greater, and the intensity decreases as the distance from the source increases. Loudness is measured in units called decibels (measure of sound intensity as a function of power ratio, with the difference in decibels between two sounds being given by dB=10 log 10 (P 1 /P 2 ), where P 1 and P 2 are the power levels of the two sounds). The sound waves given off by different vibrating bodies differ in quality, or timbre. A note from a saxophone, for instance, differs from a note of the same pitch and intensity produced by a violin or a xylophone; similarly vibrating reeds, columns of air, and strings all differ. Quality is dependent on the number and relative intensity of overtones produced by the vibrating body (see harmonic ), and these in turn depend upon the nature of the vibrating body.

*"When we hear a sound the sound is first turned into mechanical impulses by a membrane connected to three small bones in the inner ear. The tympanic membrane, as it is called and the three small bones - the hammer, anvil, and stirrup - lead the vibrations to a spiral shaped liquid filled organ called the cochlea. The cochlea contains many very fine hairs, which are connected to nerves leading to the hearing center in the brain. (Small hairs in different locations [depth] of the cochlea are stimulated at different frequencies.) The actual perception and understanding of the audible information takes place in the hearing center.

An interesting phenomenon of the human sense of hearing is masking. When a loud tone at a specific frequency stimulates the hairs of the cochlea, the frequencies close to the first powerful tone are not heard if they are less powerful. This is also called the "masking effect." Say, there is a powerful frequency at 1.2 KHz. Even though there are many other frequencies present and close to this dominant tone, it masks them, and our hearing does not perceive, for instance, the tone at 1.1 KHz, which is 18 dB weaker. The powerful 1.2 KHz tone cannot, however mask the tone at 2 KHz, which is also 18 dB lower, as it is relatively far from the 1.2 KHz tone. The 2 KHz tone also has a masking effect on nearby frequencies. Masking from the 1.2 and the 2 KHz tones are added so that a fair curve stretches between the frequencies masking everything laying below it (ie. we hear nothing).

Another phenomenon is of a temporal masking. "There is also a masking effect over time at powerful transients (a shift in 30 - 40 dB) If one hears the shot of a gun for instance, it is not possible to hear anything just after the shot. Interestingly enough, it is also not possible to hear anything just before. This is called pre- and post-masking. Pre-masking is of a short duration, 2-5 ms; post-masking can last up to 100 ms. In this period of time, very little other than the transient that causes the masking will be perceived."*


you can find more or less information here. Also from *"ATM & MPEG-2, INTEGRATING DIGITAL VIDEO INTO BROADBAND NETWORKS" by Michael Orzessek and Peter Sommer