Encode Compute Commands on GPU

Hello -

I have set a challenge for myself that I believe could be solved by encoding an indirect command buffer with compute commands. The metal specification says there is a compute_command type that works like render_command. Oddly, the compute command is not recognized by the compiler, and I do not see the type in the header where the specification says it should be. Does anyone know if this has been removed or depreciated?

For a short background on my challenge, I am trying to leverage the GPU to process what would normally be a two-dimensional array on the CPU. My first attempt was to use a single compute function with a 2D thread group.

The problem with this solution is that the data structure is not perfectly square like a texture would be. The structure would result in something like the following:

0 1 2 3
1 X X -- --
2 X -- -- --
3 X X X X

Where X represents threads where there is a valid subscript for both parts of the “array”. Since the whole group is dispatched at once, this solution leads to a crash because there are threads that will try to execute with at least one subscript that is out of bounds.

To resolve, I tried to return early if there was an access to a null. I am inexperienced with C++ so perhaps there is an easy way to do this, but I was unsuccessful. Plus, that solution could result in a lot of unused threads.

Alternatively, on the CPU side, I could loop through and dispatch a 1D thread group, but I believe that eliminates the usefulness of the GPU in this case. And the same issue if I made a loop on the GPU side.

compute_command is not available on macOS but is available on iOS. If your target is set to macOS, then the project won’t compile.

I don’t really know how to deal with your specific problem. If I were going to do it in 2d, I think I would fill the gaps with zeros or nulls and make the arrays standard.

You’d probably have to test whether this is more efficient than flat-mapping (removing the nulls) and dispatching a 1d thread group.

Thank you for the info on the compute command. Incidentally, I tried your approach of flattening the data, and it works wonderfully. I still plan to experiment with the GPU compute encoder, but for this particular issue, flat data is the way to go.

1 Like

Quite possibly, but if it’s not documented, I wouldn’t depend upon it