When to call useResource:usage:

lducot2 · March 3, 2021, 4:34am

Hello -

I was hopeful someone might be able to help me out with an argument buffer question that has me stumped. I do not understand when you have to call useResource:usage: on an argument buffer or useHeap:usage.

I was experimenting with using a MTLHeap object to back a mesh buffer. A pointer to this mesh is encoded into an argument buffer.

If everything worked the way I expected, I would call useHeap on my buffer heap and useResource on my argument buffer. However, this approach does not work. However, if I call useResource on the model that holds the mesh, things do work as expected. And in fact, I discovered I do not have to call useHeap on the buffer heap, nor do I have to call useResource on my argument buffer.

I confirmed that the buffers are indeed allocated from the heap (not the device) using the GPU capture.

What am I missing?

caroline · March 3, 2021, 8:58am

@lducot2

Do you have the most recent edition of Metal by Tutorials? Chapter 15 talks about argument buffers and indirect resources.

In the early part of the chapter, you put textures into an argument buffer. You then run the code without useResource. In the good old days, it would render with errors, so you could look at the error on the GPU debugger.

On my M1 Xcode crashes horribly, leaving an app window that won’t go away until I reboot.

The argument buffer points to the textures, but the textures don’t go to the GPU until you useResource on them.

The memory that the textures use may possibly not be overwritten for the next frame, and the resource may still be there. But you can’t rely on that, and you must useResource every frame to make sure it’s there.

Do you have a minimal project that I can look at?

lducot2 · March 4, 2021, 12:49am

Thanks, Caroline!

Unfortunately, I do not have a minimal project, but I was able to produce the issue using the sample code in Chapter 15. I may have figured out what’s going on but first the steps to reproduce using the Chapter 15 final project (line items in Renderer.swift):

Comment out line 273 [useResource:icb, usage: .write]. The project builds and runs.
Comment out line 274 [useResource(modelsBuffer, usage: .read]. Project builds and runs.
Comment out line 276 [useHeap(heap)]. Project builds and runs.
Comment out line 281 Now we get a blue screen.

The project works fine as long as you do not comment out useResource:usage on the resources we encoded into the argument buffer. This is also consistent with what I am seeing in my project.

I think what might be happening is that we are suppose to call useResource either on the render encoder (instead of the compute encoder) or we are suppose to call useResource on the resources that are encoded into the argument buffer and not the argument buffer itself.

I’m not sure which, but in my project, when I call useHeap on my buffer heap on the compute encoder, no vertices are sent to the vertex function. But, when I call useHeap on the render encoder, everything works fine.

Thoughts?

caroline · March 4, 2021, 1:13am

For sure it’s not the argument buffer that you should useResource on. The textures are the resources. And possibly that’s why the icbBuffer doesn’t have to be used.

Commenting out line 273 gave me a GPU error in the debug console the first time I ran, but not subsequent times, so that error may have been irrelevant.
Commenting out line 276 conditional gives this error in the debug console: ** Execution of the command buffer was aborted due to an error during execution. Ignored (for causing prior/excessive GPU errors) (IOAF code 4), but does still run. I wouldn’t ignore that debug error though.

I’d have to look into why lines 273 and 274 appear to make no difference. I would think they are there for a reason, but I can’t remember why . It’s also possible that I thought they should be there but don’t need to be!

As to when you use useHeap on the compute encoder, that has no bearing on the vertex function. Vertex functions are only relevant for render encoders. Compute encoders use kernel functions.

Good findings

The new debugger shows redundant bindings, so I will have to investigate why they are redundant.

lducot2 · March 4, 2021, 1:35am

TY!

Would love to hear what you find out on the redundant bindings issue. I’ve been investigating that myself. The best I can tell is that - for something like function uniforms where you would normally set it once outside of your main draw loop - the icb creates an entire command set for each thread that runs. I don’t see a way to avoid this. I was thinking of using a 2d grid or maybe dividing up the models into thread groups to avoid the redundancy, but I have not tried it yet. If I get to it, I’ll report my findings.

lducot2 · March 11, 2021, 8:43pm

In case anyone stumbles upon this in the future, I did a few tests trying to figure out where the redundant buffer bindings are coming from. I didn’t figure it out, but here’s what I found out did not work:

If you try to use an if statement to ensure that uniforms are only encoded once, you still get redundant bindings. I placed everything that only needed a single encoding in an if to make sure only the first thread encoded the commands. I think it cut down 1 or 2 bindings, but not all of them.
Only one of the two draw calls results in redundant bindings. Most of the time in my samples, the last draw call had the buffer that was bound multiple times.
If you simplify the code and just encode 1 buffer, you still get the same number of redundant bindings.
The redundant buffer is always the vertex buffer or the index buffer. In most cases, it is the vertex buffer. I assume this is be cause the index buffer is only involved in the draw call, but sometimes I would see only the vertex buffer as redundant.
The number of models only affects how many redundancies you get. It’s always a multiple of the redundancies with 2 models. (Of course, no redundancies result if you render 1 and only 1 model). In addition, the order in which you render the models does not seem to matter.
The optimize command buffer blit pass doesn’t seem to have any effect. You get the same number of redundant bindings with or without the blit. In addition, if you do a reset blit before the compute pass, there’s no discernible difference.

Well, that’s all I’ve got on this problem. I thought it might be underutilizing some threads or a simd group, but I have no way of telling if that’s just a coincidence, so won’t go into that here.

My only parting thought is that the code in the book is correct, but maybe the GPU debugger is misinterpreting something.