Audio/Video Transport WG T. Schierl Internet Draft Fraunhofer HHI Intended status: Standards track Expires: July 2009 January 15, 2009 RTP NTP header extension for decoding order recovery in layered codecs draft-schierl-avt-rtp-ntp-for-layered-codecs-00.txt Status of this Memo This Internet-Draft is submitted to IETF in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on July 15, 2009. Copyright Notice Copyright (c) 2009 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Schierl Expires July 15, 2009 [Page 1] Internet-Draft RTP synchronization for layered codecs January 2009 Abstract This memo describes an RTP header extension mechanism to be used with timestamp-based decoding order recovery of RTP flows containing layered codecs. The header extension may be most useful in the presence of clock skew as well as for early decoding order recovery. The RTP header extension is based on [RFC5285] and extends the RTP header by the lower 56bit part of the NTP timestamp corresponding to the RTP timestamp of the same packet as defined in [RFC3550] for the RTCP sender reports. This memo further gives guidance on how decoding order is recovered in RTP flows using the NTP timestamp information when parts of a layered, multi-view or multi- descriptions coding media are transported in different RTP flows. Table of Contents 1. Introduction...................................................3 2. Conventions....................................................3 3. RTP NTP header extension for timestamp-based decoding order recovery..........................................................3 4. Usage for timestamp-based decoding order recovery in layered codecs............................................................5 5. Signaling the RTP NTP header extension for timestamp-based decoding order recovery..........................................10 6. Security Considerations.......................................11 7. IANA Considerations...........................................11 8. References....................................................11 8.1. Normative References.....................................11 8.2. Informative References...................................11 9. Changes Log...................................................11 Author's Addresses...............................................12 Schierl Expires July 15, 2009 [Page 2] Internet-Draft RTP synchronization for layered codecs January 2009 1. Introduction This memo specifies the RTP NTP header extension for timestamp-based decoding order recovery based on [RFC5285] for RTP [RFC3550] flows containing layered codecs. The header extension extends the RTP header by the lower 56 bit part of the NTP timestamp corresponding to the RTP timestamp of the same packet as defined in [RFC3550] for the RTCP sender reports. The NTP timestamps included within the RTP header extension may be used for decoding order recovery of RTP flows containing layered codecs. One option for decoding order recovery in layered codecs is to use the NTP (sample presentation) timestamps to reorder media data contained in different RTP flows to sample decoding order. Such a timestamp-based reordering process is described in section 4 of this memo and has been included as NI-T mode in [I-D.draft-ietf-avt-rtp-svc]. The optional header extension mechanism may be also used for any other usage out of the scope of this memo. 2. Conventions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, RFC 2119 [RFC2119]. 3. RTP NTP header extension for timestamp-based decoding order recovery The RTP header extension mechanism defined in [RFC5285] can be adopted to carry an OPTIONAL NTP timestamp format [RFC1305] wall clock timestamp in RTP data packets. The RTP NTP header extension for timestamp-based decoding order recovery defined in this memo carries the lower 24bit part of the Seconds of a NTP timestamp format timestamp and the 32bit of the Fraction of a NTP timestamp format timestamp. If such part of a NTP timestamp is included, it MUST correspond to the same time instant as the RTP timestamp in the packet's header, and MUST be derived from the same clock used to generate the NTP format timestamps included in RTCP SR packets. The formats of the RTP NTP header extension for timestamp-based decoding order recovery (in the following only: NTP header extension) is shown below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|1| CC |M| PT | sequence number | Schierl Expires July 15, 2009 [Page 3] Internet-Draft RTP synchronization for layered codecs January 2009 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp |RTP +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Hdr | synchronization source (SSRC) identifier | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | 0xBE | 0xDE | length=2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Hd-ID | L=6 | NTP timestamp format - Seconds (bit 8-31) |Extn +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+Hdr | NTP timestamp format - Fraction (bit 0-31) | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ | payload data |RTP | .... |Data +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1: RTP NTP header extension for timestamp-based decoding order recovery Insertion of the RTP NTP header extension for timestamp-based decoding order recovery: The NTP header extension MAY be used with a layered, multi- description, or multi-view codec, to provide exact matching of NTP timestamps between layers, descriptions, or views in different RTP flows for timestamp-based decoding order recovery. If the NTP header extension is inserted for timestamp-based decoding order recovery in a packet of a particular sampling time instance, the NTP header extension SHALL be included at least once in each of the RTP flows of the same media for this sampling time instance and the inserted NTP header extensions SHALL contain the same NTP timestamp. The frequency of inserting NTP header extensions in the RTP flows is up to the sender. If the decoding order of RTP flows is given by any means (as e.g., by mechanism defined in [I-D.draft-ietf-mmusic-decoding-dependency]), the NTP timestamp provided by the header extension MAY be used to collect data of the same sample from the RTP flows forming the sample decoding order. Section 4 gives further details about the timestamp-based decoding order recovery. Note: The NTP header insertion as described above allows the receiver to find the corresponding sample of the layered media or parts thereof in all the RTP flows at the point of the NTP header extension insertion. This guarantees that any clock skew present in the NTP timestamp generation process based on RTCP sender reports is avoided, thus this approach allows directly comparing NTP timestamps of the RTP flows. Furthermore, this approach Schierl Expires July 15, 2009 [Page 4] Internet-Draft RTP synchronization for layered codecs January 2009 solves the possible problem of clock skews identified for the NI-T mode as defined in [I-D.draft-ietf-avt-rtp-svc]. Such an NTP header extension insertion is only effective for clock skew elimination, if such insertion is applied in all RTP flows of the layered media at the same time. This may require the insertion of extra packets in some of the RTP flows, since in layered video codecs not all sampling instances may be present in all the flows. If such a header extension is included in all flows at a sampling time instance, the NTP timestamps for samples following in decoding order the NTP header insertion point can be constructed using the RTP timestamps and identical reference NTP timestamps in the NTP header extension in all RTP flows. It should be noted that the frequency of inserting the NTP header extension is crucial in presence of clock skew, since the points of insertion may be the only points for a receiver to start the decoding order recovery. If the NTP header extension is included, regular RTCP SR packets MUST be sent to provide backwards compatibility with receivers that synchronize RTP flows according to [RFC3550]. The sender reports are also required to receive the upper 8bit of the Seconds of the NTP timestamp format timestamp not included in the NTP header extension. [Ed. Note: TBD - define Hd-ID] 4. Usage for timestamp-based decoding order recovery in layered codecs If parts or complete samples of a layered codec are transported as different RTP flows in different RTP streams and/or as different RTP sessions, typically a decoding order recovery process is required to reorder the samples or parts of samples received. Such mechanism may be based on the NTP presentation timestamp which can be derived from the RTP timestamp using the NTP wallclock provided in the RTCP sender report packets [RFC3550]. The header extension defined in this memo allows the receiver to tune in before the reception of such a sender report if the header extension is earlier provided in the RTP flow or it may be the only way to allow correct decoding order recovery based on exact matching of NTP timestamps in case of the presence of clock skew in timestamps used for generating RTCP sender report packets. Since typically for layered video codecs as, e.g. SVC [I-D.draft-ietf-avt-rtp-svc], the decoding order is not equal to the presentation order of the media samples, media samples or parts of media samples cannot be simply ordered according to the presentation timestamp order. For this reason, if transporting media samples or parts of media samples of a layered, multi-view or multi description Schierl Expires July 15, 2009 [Page 5] Internet-Draft RTP synchronization for layered codecs January 2009 codec in different RTP flows, the following rules SHOULD be kept for sending such flows: Note: The following rules are typically kept for layered audio codecs, which allows using the same algorithm for decoding order recovery of audio samples. o The decoding order of media samples or part of the media samples transported in different RTP flows SHOULD be derivable by any means. This can be accomplished, e.g. by using the mechanisms defined in [I-D.draft-ietf-mmusic-decoding-dependency] if the sample data or parts of the sample data are transported in different RTP sessions or by any other means. o Following the decoding order of RTP flows as described above, an RTP flow containing sample data which is required to be accessed and/or decoded before decoding a second sample data of another RTP flow is called a lower RTP flow with respect to the second RTP flow. A second RTP flow, which requires for the decoding process accessing and/or decoding the sample data of the lower RTP flow is called the higher RTP flow. The lowest RTP flow is the flow, which does not require the presence of any other data. For each two RTP flows the following rules SHOULD be true in order to allow decoding order recovery based on matching NTP timestamps present in the different RTP flows: - The order of the RTP samples within an RTP flow is equal to the decoding order. - A higher RTP flow contains all the same sampling instances of the lower RTP flow. A higher RTP flow may contain additional sampling instances. Note: In some cases, it may be required to add packets in higher RTP flows in order to satisfy the second rule above. This may be achieved by placing empty RTP packets (containing padding data only) or by other payload means as, e.g. the Empty NAL unit packet as defined in [I-D.draft-ietf-avt-rtp-svc]. If a packet must be inserted for satisfying the above rule, the NTP timestamp of such an inserted packet must be set equal to the NTP timestamp of a packet of the access unit present in any lower RTP flow and the lowest RTP flow. This is easy to accomplish if the packet can be inserted at the time of the RTP stream generation, since the media timestamp (NTP timestamp) must be the same for the inserted packet and the packet of the Schierl Expires July 15, 2009 [Page 6] Internet-Draft RTP synchronization for layered codecs January 2009 corresponding sample. If there is no knowledge of the media time at RTP flow generation or if the RTP flows are not generated at the same instance, this can be also applied later in the transmission process. In this case the NTP timestamp of the inserted packet can be calculated as follows. Assume that a packet A2 of an access unit with RTP timestamp TS_A2 is present in lowest RTP flow A, and that no packet of that access unit is present in RTP flow B, as shown in Figure 2. Thus a packet B2 must be inserted into session B following the rule above. The most recent RTCP sender report in session A carries NTP timestamp NTP_A and the RTP timestamp TS_A. The sender report in session B with a lower NTP timestamp than NTP_A is NTP_B, and carries the RTP timestamp TS_B. RTP flow B:..B0........B1........(B2)...................... RTCP flow B:......SR(NTP_B,TS_B)............................ RTP flow A:..A0........A1........A2........................ RTCP flow A:..................SR(NTP_A,TS_A)................ -----------------|--x------|-----x---|------------------------> NTP time --------------------+<---------->+<->+------------------------> t1 t2 RTP TS(B) time Figure 2 Example calculation of RTP timestamp for packet insertion in a higher RTP flow The vertical bars ("|") in the NTP timeline in the figure above indicate that sample data is present in at least one of the flows. The "x" marks indicate the times of the sender reports. The RTP timestamp time line for flow B, shown right below the NTP time line, indicates two time segments, t1 and t2. t1 is the time difference between the sender reports between the two sessions, expressed in RTP timestamp clock ticks, and t2 is the time difference from the flow A sender report to the A2 packet, again expressed in RTP timestamp clock ticks. The sum of these differences is added to the RTP timestamp of the session report from flow B in order to derive the correct RTP timestamp for the inserted packet B2. In other words: Schierl Expires July 15, 2009 [Page 7] Internet-Draft RTP synchronization for layered codecs January 2009 TS_B2 = TS_B + t1 + t2 Let toRTP() be a function that calculates the RTP time difference (in clock ticks of the used clock) given an NTP timestamp difference, and effRTPdiff() be a function that calculates the effective difference between two timestamps, including wraparounds: effRTPdiff( ts1, ts2 ): if( ts1 <= ts2 ) then effRTPdiff := ts1-ts2 else effRTPDiff := (4294967296 + ts2) - ts1 We have: t1 = toRTP(NTP_A - NTP_B) and t2 = effRTPdiff(TS_A2, TS_A) Hence in order to generate the RTP timestamp TS_B2 for the inserted packet B2, the RTP timestamp for packet B2 TS_B2 can be calculated as follows. TS_B2 = TS_B + toRTP(NTP_A - NTP_B) + effRTPdiff(TS_A2, TS_A) [Ed. Note: TBD - Add similar text as shown above on inserting NTP timestamps in NTP header extensions.] The above rules allow the receiver to process the data of the RTP flows as follows: o Go through all received RTP flows starting with the highest RTP flow and aggregate the sample data or parts of the sample data with the same NTP timestamp in the order of RTP flows, starting from the lowest RTP flow up to the highest RTP flow received, to the sample with the NTP timestamp present in the highest RTP flow. The NTP timestamps MAY be derived using RTCP sender reports or MAY be directly taken from the NTP header extension. The order of RTP flows may e.g. be indicated by mechanisms as defined in [I-D.draft-ietf-mmusic-decoding-dependency] or any other implicit or Schierl Expires July 15, 2009 [Page 8] Internet-Draft RTP synchronization for layered codecs January 2009 explicit means. Repeat the aforementioned process for each different NTP timestamp present in the highest RTP flow. Informative example: The example shown in Figure 3 refers to three RTP flows A, B and C containing a layered, a multi-view or a multi-description media stream. In the example, the dependency signaling as defined in [I-D.draft-ietf-mmusic-decoding-dependency] indicates that flow A is the lowest RTP flow, B is the first higher RTP flow and depends on A, and C is the second higher RTP flow corresponding to flow A and depends on A and B. A picture coding prediction structure is used that results in samples present in higher flows but not present in all lower flows. Flow A has the lowest frame rate and Flow B and C have the same but higher frame rate. The figure shows parts of video samples contained in RTP packets which are stored in the de-jittering buffer at the receiver for de- packetization. The parts of the video samples are already re-ordered according to their RTP sequence number order. The figure indicates for the received sample parts the decoding order within the sessions, as well as the associated media (NTP) timestamps ("TS[..]"). Parts share the same media timestamp TS, which is shown at the bottom of the figure. Note that the timestamps are not in increasing order since, in this example, the decoding order is different from the output/display order. The process first proceeds to the sample parts associated with the first media timestamp TS[1] present in the highest flow C and removes/ignores all preceding (in decoding order) sample parts to sample parts with TS[1] in each of the de-jittering buffers of RTP flows A, B, and C. Then, starting from flow C, the first media timestamp available in decoding order (TS [1]) is selected and sample parts starting from RTP flow A, and flow B and C are placed in order of the RTP flow dependency as indicated by mechanisms defined in [I-D.draft-ietf-mmusic-decoding-dependency] (in the example for TS[1]: first flow B and then flow C into the video sample AU(TS[1]) associated with media timestamp TS[1]. Then the next media timestamp TS[3] in order of appearance in the highest RTP flow C is processed and the process described above is repeated. Note that there may be video samples with no sample parts present, e.g., in the lowest RTP flow A (see, e.g., TS[1]). With TS[8], the first video sample with sample parts present in all the RTP flows appears in the buffers. Schierl Expires July 15, 2009 [Page 9] Internet-Draft RTP synchronization for layered codecs January 2009 C: ------------(1)----(2)---(3)---(4)----(5)--(6)---(7)----(8)---- | | | | | | | | | | B: -(1)---(2)--(3)----(4)---(5)---(6)----(7)--(8)---(9)---(10)---- | | | | | | A: -------(1)---------------(2)---(3)---------------(4)----(5)---- ---------------------------------------------------decoding order--> TS: [4] [2] [1] [3] [8] [6] [5] [7] [12] [10] Figure 3 Example of decoding order recovery with multiple RTP flows. Key: A, B, C - RTP sessions Integer values in "()" - Video sample / part of video sample decoding order within RTP session "|" - indicates corresponding samples / parts of sample of the same video sample AU(TS[..]) in the RTP flows Integer values in "[]" - media timestamp TS, sampling time as derived, e.g., from NTP timestamp associated with the video sample AU(TS[..]), consisting of sample parts in the sessions above. 5. Signaling the RTP NTP header extension for timestamp-based decoding order recovery The signaling of using the NTP header extension MUST be applied as defined in [RFC5285]. Schierl Expires July 15, 2009 [Page 10] Internet-Draft RTP synchronization for layered codecs January 2009 [Ed. Note: TBD - URI, Hd-ID for the NTP header extension needs to be defined, e.g. URI: "a=extmap:Hd-ID urn:ietf:params:rtp-hdrext:ntp-lay"] 6. Security Considerations TBD. 7. IANA Considerations TBD. 8. References 8.1. Normative References [RFC1305] Mills, D., "Network Time Protocol (Version 3) Specification, Implementation", RFC 1305, March 1992. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3550] Schulzrinne, H., Casner, S., Frederick, R., and Jacobson, V., "RTP: A Transport Protocol for Real-Time Applications", STD 64, RFC 3550, July 2003. [RFC5285] Singer, D. and H. Desineni, "A General Mechanism for RTP Header Extensions", RFC 5285, July 2008. 8.2. Informative References [I-D.draft-ietf-avt-rtp-svc] Wenger, S., Wang, Y. -K., Schierl, T., and Eleftheriadis, A., "RTP payload format for SVC video", draft-ietf-avt-rtp-svc-16 (work in progress), December 2008. [I-D.draft-ietf-mmusic-decoding-dependency] Schierl, T. and S. Wenger, "Signaling media decoding dependency in Session Description Protocol (SDP)", draft-ietf-mmusic-decoding- dependency-05 (work in progress), November 2008. 9. Changes Log Initial version 00 Schierl Expires July 15, 2009 [Page 11] Internet-Draft RTP synchronization for layered codecs January 2009 15 January 2009: Initial version Author's Addresses Thomas Schierl Fraunhofer HHI Einsteinufer 37 D-10587 Berlin Germany Phone: +49-30-31002-227 EMail: thomas.schierl@hhi.fraunhofer.de Schierl Expires July 15, 2009 [Page 12]