Why is my Metal code on my iOS Device angry at me?

I hope someone here can give me some insight, because I am at my wits end.

I have been trying to learn Metal the past couple of months. In the process, I came across an examples and articles of Sorting Networks and decided to try and implement them in Metal.

Now the problem is, if I run the code on my Mac. Everything is fine. But if I run the the same code on my iDevice (iPadPro wLIDAR), I get all sort of errors I do not understand or sorted data is corrupted and all wrong.

Typical Error

2021-02-17 12:13:11.218394-0500 METAL_ISSUE[97650:6709092] [GPUDebug] Invalid device load executing kernel function "bitonicSort" encoder: "0", dispatch: 0, at offset 384
file:///Users/staque/Development/OTHER/METAL_ISSUE/METAL_ISSUE/Shaders.metal:77:40 - bitonicSort()
<MTLBufferArgument: 0x28006d200>
        Name = floats 
        Type = MTLArgumentTypeBuffer 
        Access = MTLArgumentAccessReadWrite 
        LocationIndex = 0 
        IsActive = 1 
        ArrayLength = 1 
        TypeInfo = 
            DataType = MTLDataTypePointer 
            ElementType = MTLDataTypeFloat  
            Access = MTLArgumentAccessReadWrite 
            Alignment = 4 
            DataSize = 4
        Alignment = 4 
        DataSize = 4 
        DataType = MTLDataTypeFloat
buffer: "<unknown>"

You can pretty much drop these in the default Xcode Metal Game default app.

Shader (slightly modified to track the indexes of the floats (because I want to know the index of the positions of the sorted value.))

/*
 [Using Code based off of this](https://github.com/tgymnich/MetalSort)
 Rewritten to make it more understandable.
 */
kernel void bitonicSort(device float 	*floats 	[[ buffer(0) ]],
						device int		*uInts 		[[ buffer(1) ]],
						constant int 	&p 			[[ buffer(2) ]],
						constant int 	&q 			[[ buffer(3) ]],
						uint 			gid 		[[ thread_position_in_grid ]])
{

	int pMinusQ = p-q;
	int distance = 1 << pMinusQ;
	uint gidShiftedByP = gid >> p;

	// True: Increasing / False: Descreasing
	bool direction = (gidShiftedByP & 2) == 0;
	uint gidDistance = (gid & distance);
	bool isGidDistanceZero = (gidDistance == 0);

	uint gidPlusDistance = (gid | distance);

	bool isLowerIndexGreaterThanHigher = (floats[gid] > floats[gidPlusDistance]);

	if (isGidDistanceZero && isLowerIndexGreaterThanHigher == direction) {
		float temp = floats[gid];
		floats[gid] = floats[gidPlusDistance];
		floats[gidPlusDistance] = temp;

		int temp2 = uInts[gid];
		uInts[gid] = uInts[gidPlusDistance]
		uInts[gidPlusDistance] = temp2;
	}

}

The call.

	func runSort() {
		let device = MTLCreateSystemDefaultDevice()!
		let commandQueue = device.makeCommandQueue()!
		let library = device.makeDefaultLibrary()!
		let sortFunction = library.makeFunction(name: "bitonicSort")!
		let pipeline = try! device.makeComputePipelineState(function: sortFunction)

		let setRange = 0..<1024
		var floatData = [Float]()
		var uintData = [UInt32]()

		// Build the Float and index data backward to form worst case scenerio for sorting.

		for value in stride(from: Float(setRange.upperBound-1), to: Float(setRange.lowerBound-1), by: -1.0) {
			floatData.append(value)
		}



		for value in stride(from: setRange.upperBound-1, to: setRange.lowerBound-1, by: -1) {
			uintData.append(UInt32(value))
		}

		print(floatData)
		print("")
		print(uintData)

		guard let logn = Int(exactly: log2(Double(floatData.count))) else {
			fatalError("data.count is not a power of 2")
		}

		for p in 0..<logn {
			for q in 0..<p+1 {
				let floatDataBuffer = device.makeBuffer(bytes: &floatData,
														length: MemoryLayout<Float>.stride * floatData.count,
														options: [.storageModeShared])!
				floatDataBuffer.label = "floatDataBuffer"

				let uintDataBuffer = device.makeBuffer(bytes: &uintData,
													   length: MemoryLayout<UInt32>.stride * uintData.count,
													   options: [.storageModeShared])!
				uintDataBuffer.label = "uintDataBuffer"

				let threadgroupsPerGrid = MTLSize(width: floatData.count, height: 1, depth: 1)
				let threadsPerThreadgroup = MTLSize(width: pipeline.threadExecutionWidth, height: 1, depth: 1)

				var n1 = p
				var n2 = q

				let commandBuffer = commandQueue.makeCommandBuffer()!
				let encoder = commandBuffer.makeComputeCommandEncoder()!

				encoder.setComputePipelineState(pipeline)
				encoder.setBuffer(floatDataBuffer, offset: 0, index: 0)
				encoder.setBuffer(uintDataBuffer, offset: 0, index: 1)
				encoder.setBytes(&n1, length: MemoryLayout<Float>.stride, index: 2)
				encoder.setBytes(&n2, length: MemoryLayout<UInt32>.stride, index: 3)
				encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
				encoder.endEncoding()

				commandBuffer.commit()
				commandBuffer.waitUntilCompleted()

				let dataPointer = floatDataBuffer.contents().assumingMemoryBound(to: Float.self)

				let dataBufferPointer = UnsafeMutableBufferPointer(start: dataPointer, count: floatData.count)

				floatData = Array.init(dataBufferPointer)

				let dataPointer2 = uintDataBuffer.contents().assumingMemoryBound(to: UInt32.self)

				let dataBufferPointer2 = UnsafeMutableBufferPointer(start: dataPointer2, count: uintData.count)

				uintData = Array.init(dataBufferPointer2)
			}
		}

		print(floatData)
		print("")
		print(uintData)
	}

If anyone has a clue what I should be doing I am all ears, because I need help.

Thanks in advance.

Stan

This project appears to work on my devices where yours didn’t. Not sure why. I created the project gradually to see what wouldn’t work, and it just kept on working.

Let me know if you find out why :smiley:

Test3.zip (38.1 KB)

This topic was automatically closed after 166 days. New replies are no longer allowed.