Metal

尚未讀完~~

2014 603 Overview

  • Dramatically reduced overhead
  • Unified graphics and compute
  • Precompiled shaders
  • Efficient multithreading
  • Designed for A7

10x more draw call

Draw Call

每個 Draw Call 需要他們的向量狀態(state vector)

  • shader
  • state
  • textures
  • render target…

對於 CPU 來說,改變向量狀態是非常昂貴低。(要翻譯成硬體指令)

Metal Design

  • Thinnest possible API
  • Modern GPU features
  • Do expensive tasks less often
  • Predicatable performance
  • Explicit command submission
  • Optimized for CPU behavior
  • Thinnest possible API
  • Modern GPU features

為什麼 GPU Programming 成本昂貴

狀態驗證

  • 確認 API 是有效
  • 編碼 API State 到 hardware state

Shader compilation

  • 運行時產生 shader 機械碼
  • Interactions between state and shaders (互動於狀態跟著色器)

送任務給 GPU

  • 管理資源 residency
  • 批次命令
When Frequency Before Metal After Metal
Application build “Never” Shader compilation
Content loading Rare State validation
Draw time 1000s of times per frame Shader compilation
State validation
Start work on GPU
Start work on GPU
Objects Purpose
Device The GPU
Command Queue Serial sequence of command buffers
Command Buffer Contains GPU hardware commands
Command Encoder Translates API commands to GPU hardware commands
State Framebuffer configuration, blend, depth, samplers, etc.
Code Shaders
Resources Textures and Data Buffers (vertices, constants, etc.)

指令提交模型

指令編碼器把 API 指令轉換成機器指令

機器指令存放於指令緩衝器

三種類型的指令編碼器:

  • Render(Graphics rendering), Compute(Data parallel computations), Blit(GPU-accelerated resource copy operations)
  • Can interleave different types into single command buffer
  • Avoids implicit expensive state save and restore operations

Explicit command buffer construction and submission
App creates many lightweight command buffers
App controls command buffer submission
Metal signals app when command buffers finish execution

Command encoders generate commands immediately
No deferred state validation Direct call to driver

Multithreaded command encoding
Multiple command buffers can be encoded in parallel
App decides execution order
Very efficient implementation to ensure scalable performance

P94~End


2014 604 Fundamental

  1. Get the Device
  2. Create a CommandQueue
  3. Create Resources (Buffers and Textures)
  4. Create RenderPipelines
  5. Create a View