Intercom Microphone & Audio Guide
A technical guide to audio specification for IP intercoms and door stations. Covers microphone types (MEMS, electret, beamforming arrays), echo cancellation and noise suppression, outdoor audio hardening, SIP integration, PoE microphone options, and the spec-sheet details that determine whether a door station delivers intelligible speech in a real commercial environment.
In This Guide
- Microphone Types
- DSP: Echo Cancellation, Noise Suppression, AGC
- Outdoor Audio Hardening
- Featured Intercoms & Door Stations
- SIP Integration & Network Audio
- PoE Microphones & Powered Audio
- Featured Paging & Speaker Systems
- Microphone Placement & Installation
- Common Mistakes
- Quick Comparison: Microphone Tiers
- Related Resources
The difference between a door station that delivers clear speech on the first try and one that requires repeat calls is almost entirely in the audio subsystem — not the camera, not the network, not the VoIP stack. Audio hardware and DSP processing decisions made at specification time determine whether a system works in a noisy urban entrance or fails to pass a speech intelligibility test.
Microphone Types
MEMS Microphones
Micro-electromechanical system microphones dominate modern IP intercoms. Small (under 5 mm square), consistent in sensitivity, and immune to the position-related variations of electret capsules. Digital MEMS microphones output PDM (pulse density modulation) directly to the DSP, eliminating an analog chain that would otherwise pick up noise. Modern door stations use 2-4 MEMS microphones in a beamforming array for directional pickup.
Electret Condenser
Traditional analog microphone, still common in legacy intercoms. Lower cost per unit than digital MEMS. Subject to sensitivity drift over 5-10 years and moisture intrusion in outdoor deployments. Adequate for single-microphone indoor stations; less suitable for beamforming or for outdoor use without aggressive sealing.
Beamforming Microphone Array
Two or more microphones processed by DSP to focus pickup on the talker's direction and reject off-axis noise. A 2-mic array gives basic front/side rejection. A 4-mic array enables directional tracking — the beam follows the talker as they move. Beamforming significantly improves intelligibility in noisy environments (street-facing door stations, loading docks, industrial floors) but requires processing the signals fast enough to keep pace with movement.
Boundary & Pressure-Zone Microphones
Flat microphones designed to mount on a wall or ceiling surface, using the boundary to reinforce sound pickup. Used in conference intercoms and large-room paging pickup. Less common for door stations but occasionally used for interior vestibule applications. Wide pickup pattern picks up everything — including noise — unless paired with aggressive DSP.
DSP: Echo Cancellation, Noise Suppression, AGC
Every IP intercom includes some level of digital signal processing. The quality of the DSP is often more important than the microphone itself. Three DSP functions are critical:
Acoustic Echo Cancellation (AEC)
Any full-duplex intercom plays speaker audio and picks up microphone audio simultaneously. Without AEC, the speaker output leaks into the microphone and creates feedback or echo for the remote talker. Modern AEC uses adaptive filters with 200+ ms tail length and double-talk detection. Spec sheets reference AEC tail length; 64 ms is adequate for small enclosures, 128-200 ms for wall-mounted stations, 256+ ms for industrial environments. Full-duplex performance depends primarily on AEC quality.
Noise Suppression (NS)
Suppresses steady-state background noise (HVAC, traffic, machine hum) without distorting speech. Modern NS uses spectral subtraction or machine-learning models trained on speech separation. Adjustable aggressiveness (typically 3-5 levels). Over-aggressive NS cuts into speech; under-aggressive leaves noise. For outdoor deployments, specify aggressive noise suppression with manual tuning capability.
Automatic Gain Control (AGC)
Compensates for varying distance between the talker and microphone by raising gain on distant talkers and reducing gain on close talkers. Standard on all modern intercoms. Spec sheets rarely reveal AGC tuning detail, but its effectiveness determines whether a shouting near-talker and a soft distant-talker both sound consistent to the remote party.
Wideband Audio & Codec Choice
Wideband codecs (G.722, Opus) carry 50-7000 Hz audio versus narrowband (300-3400 Hz) of G.711. Wideband delivers substantially improved intelligibility at the same bitrate. Modern SIP door stations support wideband natively. Verify your VoIP platform or PBX accepts the wideband codec; transcoding to narrowband at a gateway negates the benefit.
Outdoor Audio Hardening
Wind Noise Rejection
Wind on an unprotected microphone produces low-frequency roar that saturates the input. Outdoor intercoms use a mechanical wind shield (foam, mesh, or a labyrinthine port) plus DSP high-pass filtering. Spec sheets reference wind noise rejection performance in dB. At least 15 dB of rejection is needed for urban street-facing deployments.
Water Ingress
IP65 or higher is the minimum for outdoor intercoms. IP66 handles powerful water jets. IP67 handles brief immersion. Verify the microphone port is specifically sealed — some IP66-rated housings have an unsealed microphone port relying on the port's labyrinth to block water. Inspection of the manufacturer's mechanical spec is more reliable than the IP number alone.
Temperature Operating Range
Outdoor intercoms face -30 to +70°C in extreme climates. Verify the published operating temperature matches your lowest winter and highest summer exposure. Sealed intercoms with integrated heaters are available for arctic installations; spec sheet heater power draw affects PoE class requirements.
Vandal Resistance (IK)
IK08 handles tampering (5 J impact). IK10 handles deliberate attack with tools (20 J). Public-facing and low-mount door stations in urban environments benefit from IK10 as standard. Reinforced microphone grille prevents screwdriver penetration that defeats IP66 sealing.
Featured Intercoms & Door Stations
IP intercoms, SIP door stations, and related audio endpoints for commercial entry, multi-tenant, and secured-area applications. Verify SIP platform compatibility and PoE class before deployment.
SIP Integration & Network Audio
SIP Registration Modes
SIP door stations register to a PBX (on-premises like 3CX, Asterisk-based FreePBX, or cloud like RingCentral, 8x8) or work peer-to-peer between stations without a PBX. On-premises PBX integration enables routing to any IP phone in the organization, after-hours call trees, and recording. Verify the specific PBX is on the manufacturer's tested list.
SRTP & TLS Security
Modern SIP door stations support SRTP (encrypted audio stream) and TLS (encrypted signaling). Required for deployments that carry SIP traffic over the public internet or across untrusted networks. Verify certificate management — self-signed certificates are acceptable for closed networks; CA-signed certificates are needed for cross-organization SIP.
Multicast for Paging
For one-to-many paging across multiple paging speakers, multicast audio reduces network load versus per-endpoint unicast. Most commercial door station manufacturers support multicast paging alongside SIP. Confirm your network switches support IGMP snooping to contain multicast traffic.
VLAN & QoS
Audio traffic is sensitive to packet loss and jitter. Place intercoms on a voice VLAN with DSCP 46 (EF) marking. Confirm switch QoS policies prioritize the voice VLAN. Mixing intercom traffic with general data traffic produces dropped words during congestion — a problem that is hard to diagnose after go-live.
PoE Microphones & Powered Audio
PoE Power Classes
802.3af (15.4W) powers most indoor SIP intercoms. 802.3at (30W) is needed for door stations with integrated cameras, heaters, or high-output speakers. 802.3bt (60W+) for specialty stations with IR illumination and heaters. Confirm the switch delivers the required class at the port; oversubscribed PoE switches downgrade to a lower class silently. Use the PoE Power Budget Calculator.
PoE-Powered Ceiling Microphones
For conference intercom pickup or wide-area audio (e.g., building-wide intercom paging), PoE-powered ceiling microphones capture audio at distributed positions. Specify a microphone with on-device DSP (not just a passive capsule streaming raw audio), because the network bandwidth penalty of unprocessed streams multiplied across a large deployment is substantial.
Speakers & Amplified Output
Door station speakers are integrated, but external paging speakers range from small ceiling units (2-6W) to public address horns (20-100W). PoE-powered speakers in the 2-30W range are common; higher-power external speakers need AC power or dedicated amplifiers. See paging speakers.
Powered Intercom Accessories
Secondary devices — stations, relay boards for door strike control, talk-back stations — may require their own power. Plan PoE or local power for each station, confirm whether the intercom platform supports a single-station multiple-accessory configuration, or whether each accessory needs a separate station license.
Featured Paging & Speaker Systems
Paging speakers, horns, and amplified audio endpoints for building-wide paging, emergency notification, and background music. Pair with SIP door stations for unified audio.
Microphone Placement & Installation
Height & Angle for Door Stations
Mount door stations at 120-150 cm from the ground so an average adult's mouth is within 30 cm of the microphone. Too high produces distant, echoey audio. Too low catches foot traffic noise. For wheelchair-accessible installations, the button operating height is typically 110 cm; the microphone should be within pickup range at that height.
Proximity to Reflective Surfaces
Large glass storefronts, polished stone lobbies, and metal panel cladding create audio reflections that echo back into the microphone. Position the door station at least 1 m from large reflective surfaces, or specify DSP tuned for reflective environments.
Wind Direction & Airflow
Outdoor positions exposed to constant wind (vestibule entrances, roof-access doors) need additional wind protection. Mount with prevailing wind hitting the side of the enclosure rather than the front, or add supplemental external wind shield if the manufacturer offers one.
Multi-Station Building Coverage
Buildings with multiple entrances, loading docks, or secure zones need multiple intercoms. Use a common SIP platform with extension-based routing so calls from any station reach the correct answering position. Plan extension numbering scheme early. See the Intercom & Door Station Buying Guide.
Common Mistakes
- Specifying on camera alone. A door station with a 4K camera and a single electret microphone fails the intelligibility test in a noisy street-facing position. Audio specification is the primary driver of user experience.
- Ignoring AEC tail length. Short AEC tail (under 128 ms) produces echo in wall-mounted installations. Specify at least 128 ms for indoor stations and 200+ ms for outdoor.
- Using narrowband codec. G.711 narrowband loses intelligibility in noisy environments. Specify G.722 or Opus wideband with end-to-end support.
- Skipping QoS configuration. Intercoms sharing a data VLAN with other traffic experience audio glitches during peak load. Configure voice VLAN with DSCP 46 before go-live.
- Mounting too high. A door station at 180 cm (shoulder height for a tall talker) creates 30+ cm microphone distance for most users and produces distant audio. 120-150 cm is the correct range.
- Forgetting heater power. Cold-climate installations need heated stations. Heater power consumption pushes PoE class from 802.3af to 802.3at or bt.
- No plan for after-hours calls. A door station registered only to the main PBX extension rings unanswered after hours. Specify a call tree with an answering service or after-hours fallback.
- Ignoring multicast for paging. One-to-many paging over unicast scales poorly. Use multicast with IGMP snooping on the switches.
Quick Comparison: Microphone Tiers
| Specification | Budget | Mid-Range | Premium |
|---|---|---|---|
| Microphone | Single electret | 2-mic MEMS array | 4-mic beamforming array |
| AEC Tail | 64 ms | 128 ms | 256 ms |
| Noise Suppression | Basic spectral | Advanced spectral | ML-based |
| Wideband Codec | G.711 only | G.722 | Opus + G.722 |
| IP / IK Rating | IP54 / IK07 | IP65 / IK08 | IP66+ / IK10 |
| PoE Class | 802.3af | 802.3af/at | 802.3at/bt |
| SIP / TLS Security | SIP only | SIP + SRTP | SIP + SRTP + TLS |
| Best For | Indoor quiet | Standard commercial | Outdoor noisy |
Ready to Specify Intercom Audio?
Share your installation environment (indoor/outdoor, noise level, climate), SIP platform, and station count. We will recommend a microphone class, DSP tier, PoE class, and mount plan matched to your intelligibility target.







