Identifiers for WebRTC's Statistics API

trackIdentifier of type DOMString

The value of the MediaStreamTrack's id attribute.

mid of type DOMString

If the RTCRtpTransceiver owning this stream has a mid value that is not null, this is that value, otherwise this member MUST NOT be present.

remoteId of type DOMString

The remoteId is used for looking up the remote RTCRemoteOutboundRtpStreamStats object for the same SSRC.

framesDecoded

MUST NOT exist for audio. It represents the total number of frames correctly decoded for this RTP stream, i.e., frames that would be displayed if no frames are dropped.

keyFramesDecoded of type unsigned long

MUST NOT exist for audio. It represents the total number of key frames, such as key frames in VP8 [RFC6386] or IDR-frames in H.264 [RFC6184], successfully decoded for this RTP media stream. This is a subset of framesDecoded. framesDecoded - keyFramesDecoded gives you the number of delta frames decoded.

framesRendered

MUST NOT exist for audio. It represents the total number of frames that have been rendered. It is incremented just after a frame has been rendered.

framesDropped of type unsigned long

MUST NOT exist for audio. The total number of frames dropped prior to decode or dropped because the frame missed its display deadline for this receiver's track. The measurement begins when the receiver is created and is a cumulative metric as defined in Appendix A (g) of [RFC7004].

frameWidth of type unsigned long

MUST NOT exist for audio. Represents the width of the last decoded frame. Before the first frame is decoded this member MUST NOT exist.

frameHeight of type unsigned long

MUST NOT exist for audio. Represents the height of the last decoded frame. Before the first frame is decoded this member MUST NOT exist.

framesPerSecond of type double

MUST NOT exist for audio. The number of decoded frames in the last second.

qpSum of type unsigned long long

MUST NOT exist for audio. The sum of the QP values of frames decoded by this receiver. The count of frames is in framesDecoded.

The definition of QP value depends on the codec; for VP8, the QP value is the value carried in the frame header as the syntax element y_ac_qi, and defined in [RFC6386] section 19.2. Its range is 0..127.

Note that the QP value is only an indication of quantizer values used; many formats have ways to vary the quantizer value within the frame.

totalDecodeTime of type double

MUST NOT exist for audio. Total number of seconds that have been spent decoding the framesDecoded frames of this stream. The average decode time can be calculated by dividing this value with framesDecoded. The time it takes to decode one frame is the time passed between feeding the decoder a frame and the decoder returning decoded data for that frame.

totalInterFrameDelay of type double

MUST NOT exist for audio. Sum of the interframe delays in seconds between consecutively rendered frames, recorded just after a frame has been rendered. The interframe delay variance be calculated from totalInterFrameDelay, totalSquaredInterFrameDelay, and framesRendered according to the formula: (totalSquaredInterFrameDelay - totalInterFrameDelay^2/ framesRendered)/framesRendered.

totalSquaredInterFrameDelay of type double

MUST NOT exist for audio. Sum of the squared interframe delays in seconds between consecutively rendered frames, recorded just after a frame has been rendered. See totalInterFrameDelay for details on how to calculate the interframe delay variance.

pauseCount of type unsigned long

MUST NOT exist for audio. Count the total number of video pauses experienced by this receiver. Video is considered to be paused if time passed since last rendered frame exceeds 5 seconds. pauseCount is incremented when a frame is rendered after such a pause.

totalPausesDuration of type double

MUST NOT exist for audio. Total duration of pauses (for definition of pause see pauseCount), in seconds. This value is updated when a frame is rendered.

freezeCount of type unsigned long

MUST NOT exist for audio. Count the total number of video freezes experienced by this receiver. It is a freeze if frame duration, which is time interval between two consecutively rendered frames, is equal or exceeds Max(3 * avg_frame_duration_ms, avg_frame_duration_ms + 150), where avg_frame_duration_ms is linear average of durations of last 30 rendered frames.

totalFreezesDuration of type double

MUST NOT exist for audio. Total duration of rendered frames which are considered as frozen (for definition of freeze see freezeCount), in seconds. This value is updated when a frame is rendered.

lastPacketReceivedTimestamp of type DOMHighResTimeStamp

Represents the timestamp at which the last packet was received for this SSRC. This differs from timestamp, which represents the time at which the statistics were generated by the local endpoint.

of type unsigned long long

Total number of RTP header and padding bytes received for this SSRC. This includes retransmissions. This does not include the size of transport layer headers such as IP or UDP. headerBytesReceived + bytesReceived equals the number of bytes received as payload over the transport.

packetsDiscarded of type unsigned long long

The cumulative number of RTP packets discarded by the jitter buffer due to late or early-arrival, i.e., these packets are not played out. RTP packets discarded due to packet duplication are not reported in this metric [XRBLOCK-STATS]. Calculated as defined in [RFC7002] section 3.2 and Appendix A.a.

fecBytesReceived of type unsigned long long

Total number of RTP FEC bytes received for this SSRC, only including payload bytes. This is a subset of bytesReceived. If a FEC mechanism that uses a different ssrc was negotiated, FEC packets are sent over a separate SSRC but is still accounted for here.

fecPacketsReceived of type unsigned long long

Total number of RTP FEC packets received for this SSRC. If a FEC mechanism that uses a different ssrc was negotiated, FEC packets are sent over a separate SSRC but is still accounted for here. This counter can also be incremented when receiving FEC packets in-band with media packets (e.g., with Opus).

fecPacketsDiscarded of type unsigned long long

Total number of RTP FEC packets received for this SSRC where the error correction payload was discarded by the application. This may happen 1. if all the source packets protected by the FEC packet were received or already recovered by a separate FEC packet, or 2. if the FEC packet arrived late, i.e., outside the recovery window, and the lost RTP packets have already been skipped during playout. This is a subset of fecPacketsReceived.

bytesReceived of type unsigned long long

Total number of bytes received for this SSRC. This includes retransmissions. Calculated as defined in [RFC3550] section 6.4.1.

firCount of type unsigned long

MUST NOT exist for audio. Count the total number of Full Intra Request (FIR) packets, as defined in [RFC5104] section 4.3.1, sent by this receiver. Does not count the RTCP FIR indicated in [RFC2032] which was deprecated by [RFC4587].

pliCount of type unsigned long

MUST NOT exist for audio. Count the total number of Picture Loss Indication (PLI) packets, as defined in [RFC4585] section 6.3.1, sent by this receiver.

totalProcessingDelay of type double

It is the sum of the time, in seconds, each audio sample or video frame takes from the time the first RTP packet is received (reception timestamp) and to the time the corresponding sample or frame is decoded (decoded timestamp). At this point the audio sample or video frame is ready for playout by the MediaStreamTrack. Typically ready for playout here means after the audio sample or video frame is fully decoded by the decoder.

Given the complexities involved, the time of arrival or the reception timestamp is measured as close to the network layer as possible and the decoded timestamp is measured as soon as the complete sample or frame is decoded.

In the case of audio, several samples are received in the same RTP packet, all samples will share the same reception timestamp and different decoded timestamps. In the case of video, the frame is received over several RTP packets, in this case the earliest timestamp containing the frame is counted as the reception timestamp, and the decoded timestamp corresponds to when the complete frame is decoded.

This metric is not incremented for frames that are not decoded, i.e. framesDropped. The average processing delay can be calculated by dividing the totalProcessingDelay with the framesDecoded for video (or provisional stats spec totalSamplesDecoded for audio).

nackCount of type unsigned long

Count the total number of Negative ACKnowledgement (NACK) packets, as defined in [RFC4585] section 6.2.1, sent by this receiver.

estimatedPlayoutTimestamp of type DOMHighResTimeStamp

This is the estimated playout time of this receiver's track. The playout time is the NTP timestamp of the last playable audio sample or video frame that has a known timestamp (from an RTCP SR packet mapping RTP timestamps to NTP timestamps), extrapolated with the time elapsed since it was ready to be played out. This is the "current time" of the track in NTP clock time of the sender and can be present even if there is no audio currently playing.

This can be useful for estimating how much audio and video is out of sync for two tracks from the same source, audioInboundRtpStats.estimatedPlayoutTimestamp - videoInboundRtpStats.estimatedPlayoutTimestamp.

jitterBufferDelay of type double

The purpose of the jitter buffer is to recombine RTP packets into frames (in the case of video) and have smooth playout. The model described here assumes that the samples or frames are still compressed and have not yet been decoded. It is the sum of the time, in seconds, each audio sample or a video frame takes from the time the first packet is received by the jitter buffer (ingest timestamp) to the time it exits the jitter buffer (emit timestamp). In the case of audio, several samples belong to the same RTP packet, hence they will have the same ingest timestamp but different jitter buffer emit timestamps. In the case of video, the frame maybe is received over several RTP packets, hence the ingest timestamp is the earliest packet of the frame that entered the jitter buffer and the emit timestamp is when the whole frame exits the jitter buffer. This metric increases upon samples or frames exiting, having completed their time in the buffer (and incrementing jitterBufferEmittedCount). The average jitter buffer delay can be calculated by dividing the jitterBufferDelay with the jitterBufferEmittedCount.

jitterBufferTargetDelay of type double

This value is increased by the target jitter buffer delay every time a sample is emitted by the jitter buffer. The added target is the target delay, in seconds, at the time that the sample was emitted from the jitter buffer. To get the average target delay, divide by jitterBufferEmittedCount.

jitterBufferEmittedCount of type unsigned long long

The total number of audio samples or video frames that have come out of the jitter buffer (increasing jitterBufferDelay).

jitterBufferMinimumDelay of type double

There are various reasons why the jitter buffer delay might be increased to a higher value, such as to achieve AV synchronization or because a jitterBufferTarget was set on a RTCRtpReceiver. When using one of these mechanisms, it can be useful to keep track of the minimal jitter buffer delay that could have been achieved, so WebRTC clients can track the amount of additional delay that is being added.

This metric works the same way as jitterBufferTargetDelay, except that it is not affected by external mechanisms that increase the jitter buffer target delay, such as jitterBufferTarget (see link above), AV sync, or any other mechanisms. This metric is purely based on the network characteristics such as jitter and packet loss, and can be seen as the minimum obtainable jitter buffer delay if no external factors would affect it. The metric is updated every time jitterBufferEmittedCount is updated.

totalSamplesReceived of type unsigned long long

MUST NOT exist for video. The total number of samples that have been received on this RTP stream. This includes concealedSamples.

concealedSamples of type unsigned long long

MUST NOT exist for video. The total number of samples that are concealed samples. A concealed sample is a sample that was replaced with synthesized samples generated locally before being played out. Examples of samples that have to be concealed are samples from lost packets (reported in packetsLost) or samples from packets that arrive too late to be played out (reported in packetsDiscarded).

silentConcealedSamples of type unsigned long long

MUST NOT exist for video. The total number of concealed samples inserted that are "silent". Playing out silent samples results in silence or comfort noise. This is a subset of concealedSamples.

concealmentEvents of type unsigned long long

MUST NOT exist for video. The number of concealment events. This counter increases every time a concealed sample is synthesized after a non-concealed sample. That is, multiple consecutive concealed samples will increase the concealedSamples count multiple times but is a single concealment event.

insertedSamplesForDeceleration of type unsigned long long

MUST NOT exist for video. When playout is slowed down, this counter is increased by the difference between the number of samples received and the number of samples played out. If playout is slowed down by inserting samples, this will be the number of inserted samples.

removedSamplesForAcceleration of type unsigned long long

MUST NOT exist for video. When playout is sped up, this counter is increased by the difference between the number of samples received and the number of samples played out. If speedup is achieved by removing samples, this will be the count of samples removed.

audioLevel of type double

MUST NOT exist for video. Represents the audio level of the receiving track. For audio levels of tracks attached locally, see RTCAudioSourceStats instead.

The value is between 0..1 (linear), where 1.0 represents 0 dBov, 0 represents silence, and 0.5 represents approximately 6 dBSPL change in the sound pressure level from 0 dBov.

The audioLevel is averaged over some small interval, using the algorithm described under totalAudioEnergy. The interval used is implementation-defined.

totalAudioEnergy of type double

MUST NOT exist for video. Represents the audio energy of the receiving track. For audio energy of tracks attached locally, see RTCAudioSourceStats instead.

This value MUST be computed as follows: for each audio sample that is received (and thus counted by totalSamplesReceived), add the sample's value divided by the highest-intensity encodable value, squared and then multiplied by the duration of the sample in seconds. In other words, duration * Math.pow(energy/maxEnergy, 2).

This can be used to obtain a root mean square (RMS) value that uses the same units as audioLevel, as defined in [RFC6464]. It can be converted to these units using the formula Math.sqrt(totalAudioEnergy/totalSamplesDuration). This calculation can also be performed using the differences between the values of two different getStats() calls, in order to compute the average audio level over any desired time interval. In other words, do Math.sqrt((energy2 - energy1)/(duration2 - duration1)).

For example, if a 10ms packet of audio is produced with an RMS of 0.5 (out of 1.0), this should add 0.5 * 0.5 * 0.01 = 0.0025 to totalAudioEnergy. If another 10ms packet with an RMS of 0.1 is received, this should similarly add 0.0001 to totalAudioEnergy. Then, Math.sqrt(totalAudioEnergy/totalSamplesDuration) becomes Math.sqrt(0.0026/0.02) = 0.36, which is the same value that would be obtained by doing an RMS calculation over the contiguous 20ms segment of audio.

If multiple audio channels are used, the audio energy of a sample refers to the highest energy of any channel.

totalSamplesDuration of type double

MUST NOT exist for video. Represents the audio duration of the receiving track. For audio durations of tracks attached locally, see RTCAudioSourceStats instead.

Represents the total duration in seconds of all samples that have been received (and thus counted by totalSamplesReceived). Can be used with totalAudioEnergy to compute an average audio level over different intervals.

framesReceived of type unsigned long

MUST NOT exist for audio. Represents the total number of complete frames received on this RTP stream. This metric is incremented when the complete frame is received.

decoderImplementation of type DOMString

MUST NOT exist unless exposing hardware is allowed. (This is a fingerprinting vector.)

MUST NOT exist for audio. Identifies the decoder implementation used. This is useful for diagnosing interoperability issues.

playoutId of type DOMString

MUST NOT exist for video. If audio playout is happening, this is used to look up the corresponding RTCAudioPlayoutStats.

powerEfficientDecoder of type boolean

MUST NOT exist unless exposing hardware is allowed. (This is a fingerprinting vector.)

MUST NOT exist for audio. Whether the decoder currently used is considered power efficient by the user agent. This SHOULD reflect if the configuration results in hardware acceleration, but the user agent MAY take other information into account when deciding if the configuration is considered power efficient.

framesAssembledFromMultiplePackets of type unsigned long

MUST NOT exist for audio. It represents the total number of frames correctly decoded for this RTP stream that consist of more than one RTP packet. For such frames the totalAssemblyTime is incremented. The average frame assembly time can be calculated by dividing the totalAssemblyTime with framesAssembledFromMultiplePackets.

totalAssemblyTime of type double

MUST NOT exist for audio. The sum of the time, in seconds, each video frame takes from the time the first RTP packet is received (reception timestamp) and to the time the last RTP packet of a frame is received. Only incremented for frames consisting of more than one RTP packet.

Given the complexities involved, the time of arrival or the reception timestamp is measured as close to the network layer as possible. This metric is not incremented for frames that are not decoded, i.e., framesDropped or frames that fail decoding for other reasons (if any). Only incremented for frames consisting of more than one RTP packet.

retransmittedPacketsReceived of type unsigned long long

The total number of retransmitted packets that were received for this SSRC. This is a subset of packetsReceived. If RTX is not negotiated, retransmitted packets can not be identified and this member MUST NOT exist.

retransmittedBytesReceived of type unsigned long long

The total number of retransmitted bytes that were received for this SSRC, only including payload bytes. This is a subset of bytesReceived. If RTX is not negotiated, retransmitted packets can not be identified and this member MUST NOT exist.

rtxSsrc of type unsigned long

If RTX is negotiated for retransmissions on a separate RTP stream, this is the SSRC of the RTX stream that is associated with this stream's ssrc. If RTX is not negotiated, this value MUST NOT be present.

fecSsrc of type unsigned long

If a FEC mechanism that uses a separate RTP stream is negotiated, this is the SSRC of the FEC stream that is associated with this stream's ssrc. If FEC is not negotiated or uses the same RTP stream, this value MUST NOT be present.

totalCorruptionProbability of type double

MUST NOT exist for audio. Represents the cumulative sum of all corruption probability measurements that have been made for this SSRC, see corruptionMeasurements regarding when this attribute SHOULD be present.

Each measurement added to totalCorruptionProbability MUST be in the range [0.0, 1.0], where a value of 0.0 indicates the system has estimated there is no or negligible corruption present in the processed frame. Similarly a value of 1.0 indicates there is almost certainly a corruption visible in the processed frame. A value in between those two indicates there is likely some corruption visible, but it could for instance have a low magnitude or be present only in a small portion of the frame.

Note

The corruption likelihood values are estimates - not guarantees. Even if the estimate is 0.0, there could be corruptions present (i.e. it's a false negative) for instance if only a very small area of the frame is affected. Similarly, even if the estimate is 1.0 there might not be a corruption present (i.e. it's a false positive) for instance if there are macroblocks with a QP far higher than the frame average. Just like there are edge cases for e.g. PSNR measurements, these metrics should primarily be used as a basis for statistical analysis rather than be used as an absolute truth on a per-frame basis.

totalSquaredCorruptionProbability of type double

MUST NOT exist for audio. Represents the cumulative sum of all corruption probability measurements squared that have been made for this SSRC, see corruptionMeasurements regarding when this attribute SHOULD be present.

corruptionMeasurements of type unsigned long long

MUST NOT exist for audio. When the user agent is able to make a corruption probability measurement, this counter is incremented for each such measurement and totalCorruptionProbability and totalSquaredCorruptionProbability are aggregated with this measurement and measurement squared respectively. If the corruption-detection header extension is present in the RTP packets, corruption probability measurements MUST be present.