本頁面由 Cloud Translation API 翻譯而成。

音訊媒體來源擴充功能

Dale Curtis

簡介

媒體來源擴充功能 (MSE) 為 HTML5 <audio> 和 <video> 元素提供延伸的緩衝和播放控制項。雖然最初開發是為了協助透過 HTTP (DASH) 影片播放器進行動態自動調整串流，但下文將介紹這類播放器如何用於音訊；特別是不間斷播放。

您可能聽過音樂專輯，其中的歌曲會在這首曲目中順暢流動，您甚至可能正在聆聽其中一首歌。藝人打造這些無限制的播放體驗，除了可以做為藝術選擇，也可以是黑膠唱片和 CD 的成果，其中音訊是以連續串流的形式寫入。遺憾的是，由於 MP3 和 AAC 等現代音訊轉碼器的運作機制相當順暢，因此今天的流暢音訊體驗經常會遺失。

我們將詳細說明原因，但現在先從示範開始。以下是經過傑出的 Sintel 鏡頭前三十秒，已分割成五個不同的 MP3 檔案，然後使用 MSE 重新組合。紅線代表每個 MP3 在建立 (編碼) 期間出現的間隙；在這些時間點，您也會聽到故障。

示範模式

真了不起！無法提供令人滿意的體驗；我們可以做得更好。只要多花點功夫，就能使用與上述示範相同的 MP3 檔案，我們可以利用 MSE 消除這些令人困擾的缺口。下一個示範中的綠線代表檔案已彙整的位置，以及缺漏的部分。在 Chrome 38 以上版本中，這可以順暢播放！

示範模式

製作不間斷內容的方法有很多種。為便於本次示範，我們將重點放在一般使用者可能身在哪些類型的檔案上。每個檔案都已分別編碼，且不考慮其前後的音訊片段。

基本設定

首先，讓我們反向說明 MediaSource 執行個體的基本設定。顧名思義，媒體來源擴充功能只是現有媒體元素的擴充功能。在下方，我們會指派 Object URL (代表 MediaSource 例項) 給音訊元素的來源屬性，就像設定標準網址一樣。

var audio = document.createElement('audio');
var mediaSource = new MediaSource();
var SEGMENTS = 5;

mediaSource.addEventListener('sourceopen', function () {
  var sourceBuffer = mediaSource.addSourceBuffer('audio/mpeg');

  function onAudioLoaded(data, index) {
    // Append the ArrayBuffer data into our new SourceBuffer.
    sourceBuffer.appendBuffer(data);
  }

  // Retrieve an audio segment via XHR.  For simplicity, we're retrieving the
  // entire segment at once, but we could also retrieve it in chunks and append
  // each chunk separately.  MSE will take care of assembling the pieces.
  GET('sintel/sintel_0.mp3', function (data) {
    onAudioLoaded(data, 0);
  });
});

audio.src = URL.createObjectURL(mediaSource);

連結 MediaSource 物件後，系統會執行部分初始化，最終會觸發 sourceopen 事件；之後我們可以建立 SourceBuffer。在上述範例中，我們要建立 audio/mpeg 這個路徑能夠剖析及解碼 MP3 片段；有幾種其他類型可供使用。

異常波形

我們稍後就會移除程式碼，但現在要進一步瞭解我們剛才附加的檔案，特別是在這個檔案結尾。下圖是最近 3000 個樣本，顯示 sintel_0.mp3 歷程中兩個管道的平均值。紅線中的每個像素都是 [-1.0, 1.0] 範圍內的浮點範例。

mp3 差距

這些零 (靜音) 樣本究竟是什麼意思？這是因為編碼作業是在編碼期間導入的壓縮成果所致。幾乎所有編碼器都會加入某種邊框間距。在這個案例中，LAME 會在檔案結尾加入剛好 576 個邊框間距範例。

除了結尾的邊框間距外，每個檔案的開頭也加上了邊框間距。如果我們先查看 sintel_1.mp3 軌道，就會看見另一個 576 個邊框間距範例。邊框間距量會因編碼器和內容而異，但我們會根據每個檔案所含的 metadata 知道確切的值。

mp3 差距結束

每個檔案開頭和結尾的靜音部分，是導致上一個示範片段之間「故障」的原因。為實現無間斷的播放效果，我們必須移除這些無聲部分。幸好，只要使用 MediaSource 即可輕鬆完成。下方，我們將修改 onAudioLoaded() 方法，使用附加期間和時間戳記偏移移除這個靜音設定。

程式碼範例

function onAudioLoaded(data, index) {
  // Parsing gapless metadata is unfortunately non trivial and a bit messy, so
  // we'll glaze over it here; see the appendix for details.
  // ParseGaplessData() will return a dictionary with two elements:
  //
  //    audioDuration: Duration in seconds of all non-padding audio.
  //    frontPaddingDuration: Duration in seconds of the front padding.
  //
  var gaplessMetadata = ParseGaplessData(data);

  // Each appended segment must be appended relative to the next.  To avoid any
  // overlaps, we'll use the end timestamp of the last append as the starting
  // point for our next append or zero if we haven't appended anything yet.
  var appendTime = index > 0 ? sourceBuffer.buffered.end(0) : 0;

  // Simply put, an append window allows you to trim off audio (or video) frames
  // which fall outside of a specified time range.  Here, we'll use the end of
  // our last append as the start of our append window and the end of the real
  // audio data for this segment as the end of our append window.
  sourceBuffer.appendWindowStart = appendTime;
  sourceBuffer.appendWindowEnd = appendTime + gaplessMetadata.audioDuration;

  // The timestampOffset field essentially tells MediaSource where in the media
  // timeline the data given to appendBuffer() should be placed.  I.e., if the
  // timestampOffset is 1 second, the appended data will start 1 second into
  // playback.
  //
  // MediaSource requires that the media timeline starts from time zero, so we
  // need to ensure that the data left after filtering by the append window
  // starts at time zero.  We'll do this by shifting all of the padding we want
  // to discard before our append time (and thus, before our append window).
  sourceBuffer.timestampOffset =
    appendTime - gaplessMetadata.frontPaddingDuration;

  // When appendBuffer() completes, it will fire an updateend event signaling
  // that it's okay to append another segment of media.  Here, we'll chain the
  // append for the next segment to the completion of our current append.
  if (index == 0) {
    sourceBuffer.addEventListener('updateend', function () {
      if (++index < SEGMENTS) {
        GET('sintel/sintel_' + index + '.mp3', function (data) {
          onAudioLoaded(data, index);
        });
      } else {
        // We've loaded all available segments, so tell MediaSource there are no
        // more buffers which will be appended.
        mediaSource.endOfStream();
        URL.revokeObjectURL(audio.src);
      }
    });
  }

  // appendBuffer() will now use the timestamp offset and append window settings
  // to filter and timestamp the data we're appending.
  //
  // Note: While this demo uses very little memory, more complex use cases need
  // to be careful about memory usage or garbage collection may remove ranges of
  // media in unexpected places.
  sourceBuffer.appendBuffer(data);
}

滑稽的波形

讓我們在套用附加視窗後，再查看波形，看看這個閃亮的新程式碼有何成果。在下方，您可以看到 sintel_0.mp3 末端的靜音部分 (紅色) 和 sintel_1.mp3 開頭的靜音區段 (以藍色顯示) 已移除。我們能在多個片段之間順暢轉換。

mp3 中等

結語

透過這個方式，我們已將全部五個片段順利拼接為一個影片片段，並隨後達到示範的最後效果。在開始之前，您可能會發現 onAudioLoaded() 方法未考量容器或轉碼器。換言之，無論容器或轉碼器類型為何，所有這些技術都能正常運作。您可以在下方重播原始示範 DASH 的分段 MP4，而非 MP3。

示範模式

如要進一步瞭解有關內容建立與中繼資料剖析的細節，請參閱以下附錄。您也可以探索 gapless.js，進一步瞭解支援此示範的程式碼。

感謝您閱讀本信！

附錄 A：製作無縫接軌內容

創作不相干的內容可能並不容易。以下逐步說明如何建立此示範中使用的 Sintel 媒體。首先，您需要 Sintel 的無損 FLAC 原聲配樂副本；為了方便海報，SHA1 也包含在內。使用工具時，您需要 FFmpeg、MP4Box、LAME 以及透過 afconvert 安裝 OSX。

    unzip Jan_Morgenstern-Sintel-FLAC.zip
    sha1sum 1-Snow_Fight.flac
    # 0535ca207ccba70d538f7324916a3f1a3d550194  1-Snow_Fight.flac

首先，我們將 1-Snow_Fight.flac 音軌分割為前 31.5 秒。此外，我們也想在 28 秒開始時加入 2.5 秒的淡出效果，避免在播放結束時發生任何點擊。您可以使用下方的 FFmpeg 指令列完成所有動作，並將結果放入 sintel.flac。

    ffmpeg -i 1-Snow_Fight.flac -t 31.5 -af "afade=t=out:st=28:d=2.5" sintel.flac

接下來，我們會將檔案分割成 5 個Wave 檔案，每個 6.5 秒。這是最簡單的使用 Wave，因為幾乎所有編碼器都支援擷取檔案。再次強調，我們可以透過 FFmpeg 精確執行這項作業，之後會有 sintel_0.wav、sintel_1.wav、sintel_2.wav、sintel_3.wav 和 sintel_4.wav。

    ffmpeg -i sintel.flac -acodec pcm_f32le -map 0 -f segment \
           -segment_list out.list -segment_time 6.5 sintel_%d.wav

接下來，我們要建立 MP3 檔案。LAME 提供多種建立無限制內容的選項。如果您能夠控管內容，請考慮將 --nogap 與所有檔案的批次編碼搭配使用，以避免片段之間的完全邊框間距。但為了便於示範，我們需要進行邊框間距，因此為使用 Wave 檔案的標準高品質 VBR 編碼。

    lame -V=2 sintel_0.wav sintel_0.mp3
    lame -V=2 sintel_1.wav sintel_1.mp3
    lame -V=2 sintel_2.wav sintel_2.mp3
    lame -V=2 sintel_3.wav sintel_3.mp3
    lame -V=2 sintel_4.wav sintel_4.mp3

這就是建立 MP3 檔案所需的所有步驟。現在，我們來說明建立片段 MP4 檔案的建立方式我們將按照 Apple 的指示，建立 iTunes 主要版本的媒體。在下方，我們會依照操作說明，將波浪檔案轉換為中繼 CAF 檔案，然後再使用建議的參數，在 MP4 容器中將這些檔案編碼為 AAC。

    afconvert sintel_0.wav sintel_0_intermediate.caf -d 0 -f caff \
              --soundcheck-generate
    afconvert sintel_1.wav sintel_1_intermediate.caf -d 0 -f caff \
              --soundcheck-generate
    afconvert sintel_2.wav sintel_2_intermediate.caf -d 0 -f caff \
              --soundcheck-generate
    afconvert sintel_3.wav sintel_3_intermediate.caf -d 0 -f caff \
              --soundcheck-generate
    afconvert sintel_4.wav sintel_4_intermediate.caf -d 0 -f caff \
              --soundcheck-generate
    afconvert sintel_0_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \
              -b 256000 -q 127 -s 2 sintel_0.m4a
    afconvert sintel_1_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \
              -b 256000 -q 127 -s 2 sintel_1.m4a
    afconvert sintel_2_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \
              -b 256000 -q 127 -s 2 sintel_2.m4a
    afconvert sintel_3_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \
              -b 256000 -q 127 -s 2 sintel_3.m4a
    afconvert sintel_4_intermediate.caf -d aac -f m4af -u pgcm 2 --soundcheck-read \
              -b 256000 -q 127 -s 2 sintel_4.m4a

現在有多個 M4A 檔案需要適當片段，才能與 MediaSource 搭配使用。我們會使用一秒的片段大小MP4Box 會將每個片段的 MP4 以及可捨棄的 MPEG-DASH 資訊清單 (sintel_#_dash.mpd) 寫出為 sintel_#_dashinit.mp4。

    MP4Box -dash 1000 sintel_0.m4a && mv sintel_0_dashinit.mp4 sintel_0.mp4
    MP4Box -dash 1000 sintel_1.m4a && mv sintel_1_dashinit.mp4 sintel_1.mp4
    MP4Box -dash 1000 sintel_2.m4a && mv sintel_2_dashinit.mp4 sintel_2.mp4
    MP4Box -dash 1000 sintel_3.m4a && mv sintel_3_dashinit.mp4 sintel_3.mp4
    MP4Box -dash 1000 sintel_4.m4a && mv sintel_4_dashinit.mp4 sintel_4.mp4
    rm sintel_{0,1,2,3,4}_dash.mpd

大功告成！我們現在已為 MP4 和 MP3 檔案建立片段，並提供為無間斷播放所需要的正確中繼資料。如要進一步瞭解中繼資料的外觀，請參閱附錄 B。

附錄 B：剖析無邊框中繼資料

就像建立無限制的內容一樣，剖析無資料的中繼資料可能並不容易，因為儲存空間沒有標準方法。以下將介紹兩種最常見的編碼器 (LAME 和 iTunes) 如何儲存無限制的中繼資料。請先設定一些輔助方法，以及上方所用 ParseGaplessData() 的大綱。

    // Since most MP3 encoders store the gapless metadata in binary, we'll need a
    // method for turning bytes into integers.  Note: This doesn't work for values
    // larger than 2^30 since we'll overflow the signed integer type when shifting.
    function ReadInt(buffer) {
      var result = buffer.charCodeAt(0);
      for (var i = 1; i < buffer.length; ++i) {
        result <<= 8;
        result += buffer.charCodeAt(i);
      }
      return result;
    }

    function ParseGaplessData(arrayBuffer) {
      // Gapless data is generally within the first 512 bytes, so limit parsing.
      var byteStr = new TextDecoder().decode(arrayBuffer.slice(0, 512));

      var frontPadding = 0, endPadding = 0, realSamples = 0;

      // ... we'll fill this in as we go below.

我們將先介紹 Apple 的 iTunes 中繼資料格式，因為這是最容易剖析和說明的格式。在 MP3 和 M4A 檔案 iTunes (與 afconvert) 中，按照以下方式編寫在 ASCII 中的簡短區段：

    iTunSMPB[ 26 bytes ]0000000 00000840 000001C0 0000000000046E00

這項資訊會寫入 MP3 容器中的 ID3 標記內，以及 MP4 容器內的中繼資料不可部分。為達成此目的，我們可以忽略第一個 0000000 權杖。接下來三個符記為前端邊框間距、結尾邊框間距，以及非邊框間距樣本總數。將每個音訊除以音訊的取樣率，即可得到每個音訊的時間長度。

// iTunes encodes the gapless data as hex strings like so:
//
//    'iTunSMPB[ 26 bytes ]0000000 00000840 000001C0 0000000000046E00'
//    'iTunSMPB[ 26 bytes ]####### frontpad  endpad    real samples'
//
// The approach here elides the complexity of actually parsing MP4 atoms. It
// may not work for all files without some tweaks.
var iTunesDataIndex = byteStr.indexOf('iTunSMPB');
if (iTunesDataIndex != -1) {
  var frontPaddingIndex = iTunesDataIndex + 34;
  frontPadding = parseInt(byteStr.substr(frontPaddingIndex, 8), 16);

  var endPaddingIndex = frontPaddingIndex + 9;
  endPadding = parseInt(byteStr.substr(endPaddingIndex, 8), 16);

  var sampleCountIndex = endPaddingIndex + 9;
  realSamples = parseInt(byteStr.substr(sampleCountIndex, 16), 16);
}

另一方面，大部分的開放原始碼 MP3 編碼器都會將無限制的中繼資料儲存在無聲 MPEG 框架中的特殊 Xing 標頭中 (沒有靜音，因此不瞭解 Xing 標頭的解碼器只會播放靜音)。可惜的是，這個標記不會一直存在，且包含多個選填欄位。以本示範來說，我們擁有媒體控管權，但實際上，在執行無限制中繼資料時，需要進行一些額外的保密檢查。

首先，我們會剖析樣本總數。為簡單起見，我們會從 Xing 標頭讀取這一點，但可以透過一般的 MPEG 音訊標頭建構。 Xing 標頭可用 Xing 或 Info 標記標示。這個標記在 32 位元之後，剛好有 4 位元組，代表檔案中的影格總數；將這個值乘以每個影格的樣本數，即可得到檔案中的樣本總數。

    // Xing padding is encoded as 24bits within the header.  Note: This code will
    // only work for Layer3 Version 1 and Layer2 MP3 files with XING frame counts
    // and gapless information.  See the following document for more details:
    // http://www.codeproject.com/Articles/8295/MPEG-Audio-Frame-Header
    var xingDataIndex = byteStr.indexOf('Xing');
    if (xingDataIndex == -1) xingDataIndex = byteStr.indexOf('Info');
    if (xingDataIndex != -1) {
      // See section 2.3.1 in the link above for the specifics on parsing the Xing
      // frame count.
      var frameCountIndex = xingDataIndex + 8;
      var frameCount = ReadInt(byteStr.substr(frameCountIndex, 4));

      // For Layer3 Version 1 and Layer2 there are 1152 samples per frame.  See
      // section 2.1.5 in the link above for more details.
      var paddedSamples = frameCount * 1152;

      // ... we'll cover this below.

現在，我們已經有了樣本總數，可以繼續看看邊框間距樣本的數量。視編碼器而定，此標記可能會以巢狀結構寫入 Xing 標頭的 LAME 或 Lavf 標記下。這個標頭後方僅有 17 個位元組，也就是 3 個位元組，分別以 12 位元各代表前端和結尾邊框間距。

        xingDataIndex = byteStr.indexOf('LAME');
        if (xingDataIndex == -1) xingDataIndex = byteStr.indexOf('Lavf');
        if (xingDataIndex != -1) {
          // See http://gabriel.mp3-tech.org/mp3infotag.html#delays for details of
          // how this information is encoded and parsed.
          var gaplessDataIndex = xingDataIndex + 21;
          var gaplessBits = ReadInt(byteStr.substr(gaplessDataIndex, 3));

          // Upper 12 bits are the front padding, lower are the end padding.
          frontPadding = gaplessBits >> 12;
          endPadding = gaplessBits & 0xFFF;
        }

        realSamples = paddedSamples - (frontPadding + endPadding);
      }

      return {
        audioDuration: realSamples * SECONDS_PER_SAMPLE,
        frontPaddingDuration: frontPadding * SECONDS_PER_SAMPLE
      };
    }

藉由這個做法，我們有能剖析絕大多數無間斷內容的完整函式。不過在邊緣情況下，該情況也設有限制，因此在實際工作環境中使用類似的程式碼前，建議您謹慎進行。

附錄 C：垃圾收集

系統會根據內容類型、平台專屬限制和目前播放位置，主動收集屬於 SourceBuffer 執行個體的記憶體。在 Chrome 中，系統會先透過已播放的緩衝區回收記憶體。但如果記憶體用量超過平台專用的限制，就會從未播放的緩衝區中移除記憶體。

如果因為收回記憶體，播放作業達到時間軸上的間隔時，如果差距夠小，或間隔過大時就會完全停滯。這兩者都不是良好的使用者體驗，因此請務必避免一次附加過多資料，並手動從媒體時間軸中移除不再需要的範圍。

您可以透過每個 SourceBuffer 的 remove() 方法移除範圍，該方法以秒為單位。[start, end]與 appendBuffer() 類似，每個 remove() 都會在完成時觸發 updateend 事件。在事件啟動之前，不應發出其他移除或附加內容。

在電腦版 Chrome 中，您可以一次在記憶體中保存約 12 MB 的音訊內容和 150 MB 的影片內容。建議您避免跨瀏覽器或平台使用這些值，例如這些值大多無法代表行動裝置。

垃圾收集只會影響新增至 SourceBuffers 的資料；JavaScript 變數可保留的資料數量沒有限制。如有必要，您也可以在相同位置重新附加相同的資料。

音訊媒體來源擴充功能

簡介

基本設定

異常波形

程式碼範例

滑稽的波形

結語

附錄 A：製作無縫接軌內容

附錄 B：剖析無邊框中繼資料

附錄 C：垃圾收集

意見回饋