Metal
2015-12-01
Kommentare
尚未讀完~~
2014 603 Overview
- Dramatically reduced overhead
- Unified graphics and compute
- Precompiled shaders
- Efficient multithreading
- Designed for A7
10x more draw call
Draw Call
每個 Draw Call 需要他們的向量狀態(state vector)
- shader
- state
- textures
- render target…
對於 CPU 來說,改變向量狀態
是非常昂貴低。(要翻譯成硬體指令)
Metal Design
- Thinnest possible API
- Modern GPU features
- Do expensive tasks less often
- Predicatable performance
- Explicit command submission
- Optimized for CPU behavior
- Thinnest possible API
- Modern GPU features
為什麼 GPU Programming 成本昂貴
狀態驗證
- 確認 API 是有效
- 編碼 API State 到 hardware state
Shader compilation
- 運行時產生 shader 機械碼
- Interactions between state and shaders (互動於狀態跟著色器)
送任務給 GPU
- 管理資源 residency
- 批次命令
When | Frequency | Before Metal | After Metal |
---|---|---|---|
Application build | “Never” | Shader compilation | |
Content loading | Rare | State validation | |
Draw time | 1000s of times per frame | Shader compilation State validation Start work on GPU |
Start work on GPU |
Objects | Purpose |
---|---|
Device | The GPU |
Command Queue | Serial sequence of command buffers |
Command Buffer | Contains GPU hardware commands |
Command Encoder | Translates API commands to GPU hardware commands |
State | Framebuffer configuration, blend, depth, samplers, etc. |
Code | Shaders |
Resources | Textures and Data Buffers (vertices, constants, etc.) |
指令提交模型
指令編碼器把 API 指令轉換成機器指令
機器指令存放於指令緩衝器
三種類型的指令編碼器:
- Render(Graphics rendering), Compute(Data parallel computations), Blit(GPU-accelerated resource copy operations)
- Can interleave different types into single command buffer
- Avoids implicit expensive state save and restore operations
Explicit command buffer construction and submission
App creates many lightweight command buffers
App controls command buffer submission
Metal signals app when command buffers finish execution
Command encoders generate commands immediately
No deferred state validation Direct call to driver
Multithreaded command encoding
Multiple command buffers can be encoded in parallel
App decides execution order
Very efficient implementation to ensure scalable performance
P94~End
2014 604 Fundamental
- Get the Device
- Create a CommandQueue
- Create Resources (Buffers and Textures)
- Create RenderPipelines
- Create a View