[Chapter 18] Clarification regarding the threadsPerGroup

harmansmith · June 7, 2022, 8:32am

Hello!
At the beginning of the chapter for the supportsFeatureSet(.iOS_GPUFamily4_v1) it is stated:
let threadsPerGroup = MTLSize(width: 1, height: 1, depth: 1)
It’s not a mistake? Why don’t we consider threadExecutionWidth?

caroline · June 7, 2022, 9:29am

Hi @harmansmith - please could you give me more context? I can’t find

let threadsPerGroup = MTLSize(width: 1, height: 1, depth: 1)

Maybe copy and paste a paragraph so I can track it down please?

harmansmith · June 7, 2022, 10:14am

Yes, sure
The starter project for this chapter:
struct FlockingPass
func draw(in view: MTKView, commandBuffer: MTLCommandBuffer)
// render boids

caroline · June 7, 2022, 12:45pm

Thank you - I see you meant in the starter project!

In the starter code, there are three given ways of execution:

iOS gpu family 4, which supports non-uniform thread groups
iOS pre gpu family 4, which does not support non-uniform thread groups
macOS, which prior to M1, did not support non-uniform thread groups

M1 on macOS is not catered for here, but I think the iOS supporting non-uniform thread groups would work.

This is an image from chapter 16, demonstrating non-uniform thread groups.

original

The width, height and depth here are 1 to ensure that the threads don’t go off the end in the kernel function. I suspect that 1 is not optimal, but it works here.

If you’re only supporting recent Apple GPUs, then you should use the longer iOS code that calculate the groups per grid.

Btw catastrophic memory errors is what happens on my Intel Mac when I run that non-uniform thread group code on it:

Screen Shot 2022-06-07 at 10.33.49 pm