|
Network Working Group Request for Comments: 4184 Category: Standards Track |
B. Link T. Hager Dolby Laboratories J. Flaks Microsoft Corporation October 2005 |
This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.
Copyright © The Internet Society (2005).
This document describes an RTP payload format for transporting audio data using the AC-3 audio compression standard. AC-3 is a high quality, multichannel audio coding system that is used for United States HDTV, DVD, cable television, satellite television and other media. The RTP payload format presented in this document includes support for data fragmentation.
AC-3 [ATSC] is a high-quality audio codec (audio coding format) designed to encode multiple channels of audio into a low bit-rate format. AC-3 achieves its large compression ratios via encoding a multiplicity of channels as a single entity. Dolby Digital, which is a branded version of AC-3, encodes up to 5.1 channels of audio.
AC-3 has been adopted as an audio compression scheme for many consumer and professional applications. It is a mandatory audio codec for DVD-video, Advanced Television Standards Committee (ATSC) digital terrestrial television and Digital Living Network Alliance (DLNA) home networking, as well as an optional multichannel audio format for DVD-audio.
There is a need to stream AC-3 data over IP networks. The Internet Real Time Protocol (RTP) provides a mechanism for stream
synchronization and hence serves as the best transport solution for AC-3, which is primarily used in audio-for-video applications. Applications for streaming AC-3 include streaming movies from a home media server to a display, video on demand, and multichannel Internet radio.
Section 2 gives a brief overview of the AC-3 algorithm. Section 3 specifies values for fields in the RTP header, while Section 4 specifies the AC-3 payload format. Section 5 discusses media types and SDP usage. Security considerations are covered in Section 6, congestion control in Section 7, and IANA considerations in Section 8. References are given in Sections 9 and 10.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119 [RFC2119].
AC-3 can deliver up to 5.1 channels of audio at data rates approximately equal to half of one PCM channel [ATSC], [1994AC3], [1996AC3]. The ".1" refers to a band-limited, optional, low- frequency effects (LFE) channel. AC-3 was designed for signals sampled at rates of 32, 44.1, or 48 kHz. Data rates can vary between 32 kbps and 640 kbps, depending on the number of channels and the desired quality.
AC-3 exploits psycho-acoustic phenomena that cause a significant fraction of the information contained in a typical audio signal to be inaudible. Substantial data reduction occurs via the removal of inaudible information contained in an audio stream. Source coding techniques are further used to reduce the data rate.
Like most perceptual coders, AC-3 operates in the frequency domain. A 512-point TDAC transform is taken with 50% overlap, providing 256 new frequency samples. Frequency samples are then converted to exponents and mantissas. Exponents are differentially encoded. Mantissas are allocated a varying number of bits depending on the audibility of the associated spectral components. Audibility is determined via a masking curve. Bits for mantissas are allocated from a global bit pool.
AC-3 bit streams are organized into synchronization frames. Each AC-3 frame contains a Synchronization Information (SI) field, a Bit Stream Information (BSI) field, and 6 audio blocks (ABs) that each represent 256 PCM samples for all channels. The frame ends with an
optional auxiliary data field (AUX) and an error correction field (CRC). The entire frame represents the time duration of 1536 PCM samples across all coded channels [ATSC]. AC-3 encodes audio sampled at 32 kHz, 44.1 kHz, and 48 kHz. From Annex A, Part 2, of [ATSC], the time duration of an AC-3 frame varies with the sampling rate as follows:
Sampling rate Frame duration
_____________________________________
48 kHz 32 ms
44.1 kHz approx. 34.83 ms
32 kHz 48 ms
Figure 1 shows the AC-3 frame format.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |SI |BSI| AB0 | AB1 | AB2 | AB3 | AB4 | AB5 |AUX|CRC| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 1. AC-3 Frame Format
The Synchronization Information field contains information needed to acquire and maintain codec synchronization. The Bit Stream Information field contains parameters that describe the coded audio service [ATSC]. Each audio block contains fields that indicate the use of various coding tools: block switching, dither, coupling, and exponent strategy. They also contain metadata, optionally used to enhance the playback, such as dynamic range control. Finally, the exponents and bit allocation data needed to decode the mantissas into audio data, and the mantissas themselves, are included. The placement of these fields in an AC-3 audio block is shown in Figure 2. The fields shown in this figure are described in detail in [ATSC]. Note that field sizes vary depending on the coded data.
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Block |Dither |Dynamic |Coupling |Coupling |Exponent | | Switch |Flags |Range Ctrl |Strategy |Coordinates |Strategy | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Exponents | Bit Allocation | Mantissas | | | Parameters | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 2. AC-3 Audio Block Format
This payload format is defined for AC-3, as defined in the main part and Annex D of [ATSC]. It is not defined for E-AC-3, as defined in Annex E of [ATSC], and it MUST NOT be used to carry E-AC-3.
According to [RFC2736], RTP payload formats should contain an integral number of application data units (ADUs). The AC-3 frame corresponds to an ADU, in the context of this payload format. Each RTP payload MUST start with the two-byte payload header followed by an integral number of complete AC-3 frames or by a single fragment of an AC-3 frame.
If an AC-3 frame exceeds the MTU for a network, it SHOULD be fragmented for transmission within an RTP packet. Section 4.2 provides guidelines for creating frame fragments.
There is a two-octet Payload Header at the beginning of each payload.
Each AC-3 RTP payload MUST begin with the payload header shown in Figure 3.
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MBZ | FT| NF |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 3. AC-3 RTP Payload Header
0 - One or more complete frames.
1 - Initial fragment of frame which includes the first 5/8ths of
the frame. (See Section 4.2.)
2 - Initial fragment of frame, which does not include the first
5/8ths of the frame.
3 - Fragment of frame other than initial fragment. (Note that M
bit in RTP header is set for final fragment).
Figure 4 shows the full AC-3 RTP payload format.
+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+
| Payload | Frame | Frame | | Frame |
| Header | (1) | (2) | | (n) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+- .. +-+-+-+-+
Figure 4. Full AC-3 RTP payload
When receiving AC-3 payloads with FT = 0 and more than a single frame (NF > 1), a receiver needs to use the "frmsizecod" field in the Synchronization Information (syncinfo) block in each AC-3 frame to determine the frame's length. That way a receiver can determine the boundary of the next frame. Note that the frame length may vary from frame to frame.
The size of an AC-3 frame depends on the sample rate of the audio and the data rate used by the encoder (which are indicated in the "Synchronization Information" header in the AC-3 frame). The size of a frame, for a given sample rate and data rate, is specified in Table 5.18 ("Frame Size Code Table") of [ATSC]. This table shows that AC-3 frames range in size from a minimum of 128 bytes to a maximum of 3840 bytes. If the size of an AC-3 frame exceeds the MTU size, the frame SHOULD be fragmented at the RTP level. The fragmentation MAY be performed at any byte boundary in the frame. RTP packets containing fragments of the same AC-3 frame SHALL be sent in consecutive order, from first to last fragment. This enables a receiver to assemble the fragments in correct order.
When an AC-3 frame is fragmented, it MAY be fragmented such that at least the first 5/8ths of the frame data is in the first fragment. This provides greater resilience to packet loss. This initial portion of any frame is guaranteed to contain the data necessary to decode the first two blocks of the frame. Any frame fragments other than those containing the first 5/8ths of frame data are only decodable once the complete frame is received. The 5/8ths point of the frame is defined in Table 7.34 ("5/8_frame Size Table") of [ATSC].
This registration uses the template defined in [DRAFT-FREED] and follows RFC 3555 [RFC3555].
rate: The RTP timestamp clock rate that is equal to the audio sampling rate. Permitted rates are 32000, 44100, and 48000.
channels: From a sender, the maximum number of channels present in the AC3 stream. From a receiver, the maximum number of output channels the receiver will deliver. This MUST be a number between 1 and 6. The LFE (".1") channel MUST be counted as one channel. Note that the channel order used in AC-3 differs from the channel order scheme in [RFC3551]. The AC-3 channel order scheme can be found in Table 5.8 of [ATSC].
ptime: See RFC 2327 [RFC2327].
maxptime: See RFC 3267 [RFC3267].
This media type is framed (see section 4.8 in [DRAFT-FREED]) and contains binary data.
See Section 6 of this document.
None
This payload format specification and see [ATSC].
Multichannel audio compression of audio and audio for video.
Magic number(s):
The first two octets of an AC-3 frame are always the
synchronization word, which has the hex value 0x0B77.
Brian Link <bdl@dolby.com>
IETF AVT working group.
COMMON
This media type depends on RTP framing, and hence is only defined for transfer via RTP [RFC3550]. Transport within other framing protocols is not defined at this time.
Author/Change controller:
IETF Audio/Video Transport Working Group delegated from the IESG.
The information carried in the MIME media type specification has a specific mapping to fields in the Session Description Protocol (SDP) [RFC2327], which is commonly used to describe RTP sessions. When SDP is used to specify sessions employing AC-3, the mapping is as follows:
An example of the SDP data for AC-3:
m=audio 49111 RTP/AVP 100
a=rtpmap:100 ac3/48000/6
Certain considerations are needed when SDP is used to perform offer/answer exchanges [RFC3264].
The payload format described in this document is subject to the security considerations defined in the RTP specification [RFC3550] and in any applicable RTP profile (e.g., [RFC3551]). To protect the user's privacy and any copyrighted material, confidentiality protection would have to be applied. To also protect against modification by intermediate entities and ensure the authenticity of the stream, integrity protection and authentication would be required. Confidentiality, integrity protection, and authentication have to be provided by a mechanism external to this payload format, e.g., SRTP [RFC3711].
The AC-3 format is designed so that the validity of data frames can determined by decoders. A decoder that encounters a malformed frame discards the malformed data and conceals the errors in the audio output until a valid frame is detected and decoded. This is expected to prevent crashes and other abnormal decoder behavior in response to errors or attacks.
The general congestion control considerations for transporting RTP data apply to AC-3 audio over RTP as well. See the RTP specification [RFC3550] and any applicable RTP profile (e.g., [RFC3551]).
AC-3 encoders may use a range of bit rates to encode audio data, so it is possible to adapt network bandwidth by adjusting the encoder bit rate in real time or by having multiple copies of content encoded at different bit rates. Additionally, packing more frames in each RTP payload can reduce the number of packets sent and hence the overhead from IP/UDP/RTP headers, at the expense of increased delay and reduced robustness against packet losses.
A new media subtype has been assigned for AC-3; see Section 5.1.
[RFC2119] Bradner, S., "Key Words for use in RFCs to Indicate
Requirement Levels", RFC 2119, March 1997.
[ATSC] U.S. Advanced Television Systems Committee (ATSC),
"ATSC Standard: Digital Audio Compression (AC-3),
Revision B," Doc A/52B, June 2005.
[RFC2327] Handley, M. and V. Jacobson, "SDP: Session Description
Protocol", RFC 2327, April 1998.
[RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and V.
Jacobson, "RTP: A Transport Protocol for Real-Time
Applications", STD 64, RFC 3550, July 2003.
[RFC3264] Rosenberg, J. and H. Schulzrinne, "An Offer/Answer
Model with Session Description Protocol (SDP)", RFC
3264, June 2002.
[RFC3267] Sjoberg, J., Westerlund, M., Lakaniemi, A., and Q. Xie,
"Real-Time Transport Protocol (RTP) Payload Format and
File Storage Format for the Adaptive Multi-Rate (AMR)
and Adaptive Multi-Rate Wideband (AMR-WB) Audio
Codecs", RFC 3267, June 2002.
[RFC3555] Casner, S. and P. Hoschka, "MIME Type Registration of
RTP Payload Formats", RFC 3555, July 2003.
[RFC2736] Handley, M. and C. Perkins, "Guidelines for Writers of
RTP Payload Format Specifications", BCP 36, RFC 2736,
December 1999.
[RFC3551] Schulzrinne, H. and S. Casner, "RTP Profile for Audio
and Video Conferences with Minimal Control", STD 65,
RFC 3551, July 2003.
[1994AC3] Todd, C., et al., "AC-3: Flexible Perceptual Coding for
Audio Transmission and Storage," Preprint 3796,
Presented at the 96th Convention of the Audio
Engineering Society, May 1994.
[1996AC3] Fielder, L., et al., "AC-2 and AC-3: Low-Complexity
Transform-Based Audio Coding," Collected Papers on
Digital Audio Bit-Rate Reduction, pp. 54-72, Audio
Engineering Society, September 1996.
[RFC3711] Baugher, M., et al., "The Secure Real-time Transport
Protocol (SRTP)", RFC 3711, March 2004.
[DRAFT-FREED] Freed, N. and Klensin, J., "Media Type Specifications and Registration Procedures", Work in Progress, April 2005.
Brian Link
Dolby Laboratories
100 Potrero Ave
San Francisco, CA 94103
Phone: +1 415 558 0200
EMail: bdl@dolby.com
Todd Hager
Dolby Laboratories
100 Potrero Ave
San Francisco, CA 94103
Phone: +1 415 558 0136
EMail: thh@dolby.com
Jason Flaks
Microsoft Corporation
1 Microsoft Way
Redmond, WA 98052
Phone: +1 425 722 2543
EMail: jasonfl@microsoft.com
Copyright © The Internet Society (2005).
This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.
This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.
Copies of IPR disclosures made to the IETF Secretariat and any
assurances of licenses to be made available, or the result of an
attempt made to obtain a general license or permission for the use of
such proprietary rights by implementers or users of this
specification can be obtained from the IETF on-line IPR repository at
http://www.ietf.org/ipr.
The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org.
Funding for the RFC Editor function is currently provided by the Internet Society.