I hope someone here can give me some insight, because I am at my wits end.
I have been trying to learn Metal the past couple of months. In the process, I came across an examples and articles of Sorting Networks and decided to try and implement them in Metal.
Now the problem is, if I run the code on my Mac. Everything is fine. But if I run the the same code on my iDevice (iPadPro wLIDAR), I get all sort of errors I do not understand or sorted data is corrupted and all wrong.
Typical Error
2021-02-17 12:13:11.218394-0500 METAL_ISSUE[97650:6709092] [GPUDebug] Invalid device load executing kernel function "bitonicSort" encoder: "0", dispatch: 0, at offset 384
file:///Users/staque/Development/OTHER/METAL_ISSUE/METAL_ISSUE/Shaders.metal:77:40 - bitonicSort()
<MTLBufferArgument: 0x28006d200>
Name = floats
Type = MTLArgumentTypeBuffer
Access = MTLArgumentAccessReadWrite
LocationIndex = 0
IsActive = 1
ArrayLength = 1
TypeInfo =
DataType = MTLDataTypePointer
ElementType = MTLDataTypeFloat
Access = MTLArgumentAccessReadWrite
Alignment = 4
DataSize = 4
Alignment = 4
DataSize = 4
DataType = MTLDataTypeFloat
buffer: "<unknown>"
You can pretty much drop these in the default Xcode Metal Game default app.
Shader (slightly modified to track the indexes of the floats (because I want to know the index of the positions of the sorted value.))
/*
[Using Code based off of this](https://github.com/tgymnich/MetalSort)
Rewritten to make it more understandable.
*/
kernel void bitonicSort(device float *floats [[ buffer(0) ]],
device int *uInts [[ buffer(1) ]],
constant int &p [[ buffer(2) ]],
constant int &q [[ buffer(3) ]],
uint gid [[ thread_position_in_grid ]])
{
int pMinusQ = p-q;
int distance = 1 << pMinusQ;
uint gidShiftedByP = gid >> p;
// True: Increasing / False: Descreasing
bool direction = (gidShiftedByP & 2) == 0;
uint gidDistance = (gid & distance);
bool isGidDistanceZero = (gidDistance == 0);
uint gidPlusDistance = (gid | distance);
bool isLowerIndexGreaterThanHigher = (floats[gid] > floats[gidPlusDistance]);
if (isGidDistanceZero && isLowerIndexGreaterThanHigher == direction) {
float temp = floats[gid];
floats[gid] = floats[gidPlusDistance];
floats[gidPlusDistance] = temp;
int temp2 = uInts[gid];
uInts[gid] = uInts[gidPlusDistance]
uInts[gidPlusDistance] = temp2;
}
}
The call.
func runSort() {
let device = MTLCreateSystemDefaultDevice()!
let commandQueue = device.makeCommandQueue()!
let library = device.makeDefaultLibrary()!
let sortFunction = library.makeFunction(name: "bitonicSort")!
let pipeline = try! device.makeComputePipelineState(function: sortFunction)
let setRange = 0..<1024
var floatData = [Float]()
var uintData = [UInt32]()
// Build the Float and index data backward to form worst case scenerio for sorting.
for value in stride(from: Float(setRange.upperBound-1), to: Float(setRange.lowerBound-1), by: -1.0) {
floatData.append(value)
}
for value in stride(from: setRange.upperBound-1, to: setRange.lowerBound-1, by: -1) {
uintData.append(UInt32(value))
}
print(floatData)
print("")
print(uintData)
guard let logn = Int(exactly: log2(Double(floatData.count))) else {
fatalError("data.count is not a power of 2")
}
for p in 0..<logn {
for q in 0..<p+1 {
let floatDataBuffer = device.makeBuffer(bytes: &floatData,
length: MemoryLayout<Float>.stride * floatData.count,
options: [.storageModeShared])!
floatDataBuffer.label = "floatDataBuffer"
let uintDataBuffer = device.makeBuffer(bytes: &uintData,
length: MemoryLayout<UInt32>.stride * uintData.count,
options: [.storageModeShared])!
uintDataBuffer.label = "uintDataBuffer"
let threadgroupsPerGrid = MTLSize(width: floatData.count, height: 1, depth: 1)
let threadsPerThreadgroup = MTLSize(width: pipeline.threadExecutionWidth, height: 1, depth: 1)
var n1 = p
var n2 = q
let commandBuffer = commandQueue.makeCommandBuffer()!
let encoder = commandBuffer.makeComputeCommandEncoder()!
encoder.setComputePipelineState(pipeline)
encoder.setBuffer(floatDataBuffer, offset: 0, index: 0)
encoder.setBuffer(uintDataBuffer, offset: 0, index: 1)
encoder.setBytes(&n1, length: MemoryLayout<Float>.stride, index: 2)
encoder.setBytes(&n2, length: MemoryLayout<UInt32>.stride, index: 3)
encoder.dispatchThreadgroups(threadgroupsPerGrid, threadsPerThreadgroup: threadsPerThreadgroup)
encoder.endEncoding()
commandBuffer.commit()
commandBuffer.waitUntilCompleted()
let dataPointer = floatDataBuffer.contents().assumingMemoryBound(to: Float.self)
let dataBufferPointer = UnsafeMutableBufferPointer(start: dataPointer, count: floatData.count)
floatData = Array.init(dataBufferPointer)
let dataPointer2 = uintDataBuffer.contents().assumingMemoryBound(to: UInt32.self)
let dataBufferPointer2 = UnsafeMutableBufferPointer(start: dataPointer2, count: uintData.count)
uintData = Array.init(dataBufferPointer2)
}
}
print(floatData)
print("")
print(uintData)
}
If anyone has a clue what I should be doing I am all ears, because I need help.
Thanks in advance.
Stan