Group Group Group Group Group Group Group Group Group

Render large collection of custom type with only one draw call

Hi there again!

When I initialise the app I create a buffer with the length to fit 25K Triangles. After updating the triangles accessing directly to the buffer (binding memory) I need to render the buffer:

1.- My first approach is to extract the vertices and do only one non-indexed draw call:

  private func extractVertices()->[float3]{
    var vertices:[float3]=[]
    for i in 0..<Emitter.trianglesCount {
      if Emitter.trianglesPointer?.advanced(by: i) == 1{
        vertices.append(Emitter.trianglesPointer!.advanced(by: i).pointee.vertex0)
        vertices.append(Emitter.trianglesPointer!.advanced(by: i).pointee.vertex1)
        vertices.append(Emitter.trianglesPointer!.advanced(by: i).pointee.vertex2)
    return vertices

but extractVertices() becomes a bottleneck

2.- The second approach is send the buffer with the triangles directly and do an indexed call per triangle.

    for i in 0..<Emitter.trianglesCount { 
        if Emitter.triangles.contents().bindMemory(to: Triangle.self, capacity: Emitter.trianglesCount).advanced(by: i) == 1 {
            renderEncoder.setVertexBuffer(Emitter.triangles, offset: MemoryLayout<Triangle>.stride*i, index: 0)
            renderEncoder.drawIndexedPrimitives(type: .triangle,
                                                indexCount: 3,
                                                indexType: .uint16,
                                                indexBuffer: emitter.triangleIndiciesBuffer,
                                                indexBufferOffset: 0)

But there are a 25K drawcalls… so another bottleneck!!!

There is a way to render the buffer with the 25K triangles with only one drawcall ?!?

Triangle looks like :

typedef struct {
  simd_float3 vertex0;
  simd_float3 vertex1;
  simd_float3 vertex2;
  simd_float3 direction;
  float velocity;
  int age;
  int mainAge;
  int subdivisionLevel;
  TriangleType type;
  TriangleMainState mainSubdState;
  float distanceToCam;
  int active ;

Thanks in advance!!!

Both of those methods would be extremely slow. The first because you’re dynamically allocating memory and the second because of the draw calls.

Is there a reason why you can’t do what you want to in the buffer?

For example, in chapter 13 “Instancing & Procedural Generation”, you create a buffer up front, bound to a structure format, and then you update in updateBuffer and draw the instances just after the update.


Yeah! Instantiating is the solution because I have the vertex_id and instance_id to get the vertex position! Plus the performance…

Thank you very much, Caroline!

1 Like