Face Detection Tutorial Using the Vision Framework for iOS | raywenderlich.com

toojuice · March 11, 2019, 1:16pm

In this tutorial, you’ll learn how to use Vision for face detection of facial features and overlay the results on the camera feed in real time.

This is a companion discussion topic for the original entry at https://www.raywenderlich.com/1163620-face-detection-tutorial-using-the-vision-framework-for-ios

rextremodtt · March 13, 2019, 3:50pm

I’m trying to modify this to work with the back camera but the results aren’t reliables. Both the position and size of the rect is wrong. Is there anything that has to be done to work with the back camera rather than the front camera?

shogunkaramazov · March 13, 2019, 5:34pm

@toojuice Can you please help with this when you get a chance? Thank you - much appreciated! :]

toojuice · March 14, 2019, 8:00pm

Hi @rextremodtt,

Ok, I see what’s going on here. I think the difference is that the front facing camera mirrors the image in order to show the user the same thing they see in a mirror. If they would see themselves as others do, it would be disorienting… Humans aren’t as symmetrical as we think :]

So the calculations used in convert(rect:) worked for the bounding box. However, the back camera doesn’t mirror the image, so it no longer works. To fix that function, replace it with:

func convert(rect: CGRect) -> CGRect {
    // 1
    let opposite = rect.origin + rect.size.cgPoint
    let origin = previewLayer.layerPointConverted(fromCaptureDevicePoint: rect.origin)

    // 2
    let opp = previewLayer.layerPointConverted(fromCaptureDevicePoint: opposite)

    // 3
    let size = (opp - origin).cgSize
    return CGRect(origin: origin, size: size)
}

Instead of converting the size directly, it:

Calculates the location of the opposite corner to the origin of the rectangle.
Converts this opposite corner to its location in the previewLayer
Calculates the size by subtracting the two points.

This should now work for both the front and the back cameras.

You’ll notice that the direction of the lasers will also be wrong for the back camera… again due to the mirroring. Here, you should just change the comparison in this line:

let focusX = (yaw.doubleValue < 0.0) ? -100.0 : maxX + 100.0

To be a greater than for the back camera.

leojoseph · March 15, 2019, 11:38am

Hi, I am not able to see any result in my iPad, iOS 11. Neither Face nor Laser. Please help.

shogunkaramazov · March 15, 2019, 2:43pm

@toojuice Do you have any feedback about this? Thank you - much appreciated! :]

toojuice · March 17, 2019, 8:01pm

Hi @leojoseph

I don’t have an iPad with iOS 11 on it, but it works on my iPad running 12.1.4.

The project is currently setup for iPhone, but if you change it to Universal, it looks fine on an iPad, too.

Which iPad are you using?

kwalkerk · June 6, 2019, 5:23pm

Now that I can show and identify faces, how can I identify particular faces? Any tutorial for that?

Thanks

shogunkaramazov · July 9, 2019, 7:17pm

@toojuice Can you please help with this when you get a chance? Thank you - much appreciated! :]

toojuice · July 10, 2019, 9:59am

@kwalkerk What you’re looking for is face recognition, instead of face detection. Apple does not have a public API for that. You would need to look into using Machine Learning to do something like this.

For instance, FaceNet is a model used to compare the similarity of two faces. Given a database of known faces, you can compare the image of an unknown face to each one using FaceNet to determine if there is a match or not. You might be able to find a CoreML version of FaceNet, but I’m not entirely sure, as I’ve never looked myself.

Hope this helps.

miraco · July 10, 2019, 2:30pm

@toojuice Thanks for this wonderful tutorial, I learned a lot from it. I have a question. I tested the app on an iphone 6s, I noticed that when I move my face, the facial landmarks move accordingly but there is a delay. Snapchat’s bear ears always follow my facial movements with no delay. Can you shed some lights on why there’s a delay on our app but not on SnapChat? What might SnapChat be doing to decrease the time it takes for the face map to be redrawn after it moves.

toojuice · July 10, 2019, 5:02pm

Hi @miraco,

Good question! Let me look into this a little bit.

Out of curiosity, do you see a delay when you just detect the face (as opposed to the face landmarks)?

miraco · July 14, 2019, 2:17am

Hi @toojuice,

Glad to see your message! I didn’t receive a notification of your reply so I wasn’t able to answer you earlier.

I have been working on this issue in the last couple of days. I haven’t solved the issue, but I want to share what I have found out.

Before sharing my discoveries, I want to answer your question. I see the delay in the face landmarks, not so much in the boundingBox showing the detected face if that’s what you mean. When I move my head, the face landmarks are not catching up with my face movement. Here’s a video clip I recorded. You can see that it takes a second for the face landmarks to move to the right position on my face. When I use Snapchat filters, the bear ears and fake glasses added to my face can follow immediately when my face moves. Here’s a video clip showing how the bear ears and glasses staying at their correct position on my face all the time. I am using iPhone 6s.

I think the delay is because of the amount of time it takes to run the sequenceHandler analyzing the sampleBuffer. The time it takes to run the sequenceHandler is on average 0.1 second, during which time about 2-3 sampleBuffer were dropped.

Here is the code I used to get the time-elapse measured and count the amount of sampleBuffer dropped.

func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    guard let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
        return
    }

    let detectFaceRequest = VNDetectFaceLandmarksRequest(completionHandler: detectedFace)

    do {

// get the start time of the sequenceHandler

        let start = Date()

// run the sequenceHandler

        try sequenceHandler.perform(
            [detectFaceRequest],
            on: imageBuffer,
            orientation: .leftMirrored)

// get the end time of the sequenceHandler, and calculate the time elapsed.

        let end = Date()
        print("A frame is analyzed, with elapse time : \(end.timeIntervalSince(start))")

    } catch {
        print(error.localizedDescription)
    }
    
}

func captureOutput(_ output: AVCaptureOutput, didDrop sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
    print("a frame is dropped!")
}

When I run the code above, I get the following results:

A frame is analyzed, with elapsed time : 0.099357008934021
a frame is dropped!
a frame is dropped!
A-frame is analyzed, with elapsed time : 0.08487498760223389
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.08759593963623047
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.08887004852294922
a frame is dropped!
A frame is analyzed, with elapsed time : 0.0864570140838623
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.09023606777191162
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.10215997695922852
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.09466695785522461
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.08662497997283936
a frame is dropped!
A frame is analyzed, with elapsed time : 0.09468793869018555
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.10828900337219238
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.12512493133544922
a frame is dropped!
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.11516201496124268
a frame is dropped!
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.11460399627685547
a frame is dropped!
a frame is dropped!
A frame is analyzed, with elapsed time : 0.11211204528808594
a frame is dropped!
a frame is dropped!
a frame is dropped!

The time it takes to run the updateFaceView is 0.004 second, which is very short.
I currently don’t know what to do next to solve this problem, I’ll be happy to try anything you might suggest!

Best regards,

toojuice · July 14, 2019, 5:42am

Hi @miraco!

Great debugging work! This is good information. Can you try something for me? In captureOutput, find this line:

try sequenceHandler.perform(requests, on: imageBuffer, orientation: .leftMirrored)

And add the following line right above it, so it looks like this:

// new line
let sequenceHandler = VNSequenceRequestHandler()

// same line as before
try sequenceHandler.perform(requests, on: imageBuffer, orientation: .leftMirrored)

This will create a new VNSequenceRequestHandler each time instead of reusing the old one. In a new (not yet released) tutorial I wrote, I seem to remember there was some sort of issue/performance hit with reusing the the VNSequenceRequestHandler and opted to just recreate it each time.

When I profile the app using Instruments on my iPhone XS, prior to the change, the perform was taking up 39% of all runtime and after the change it was only 29%. This may be an even bigger improvement on older phones.

Let me know if that helps. If not, we can debug further.

miraco · July 14, 2019, 6:59pm

Hi @toojuice!

I just tried this method, unfortunately, it didn’t improve the performance for the iPhone 6s I’m using. When you run this app on iPhone XS, was the face landmarks moving smoothly?

I just found a Github page where people are discussing this issue. Someone mentioned using the analysis result of the previous frame for the analysis of the future frames. In Swift Developer Documentation, under the overview section of VNDetectFaceLandmarksRequest, it said

By default, a face landmarks request first locates all faces in the input image, then analyzes each to detect facial features.

If you’ve already located all the faces in an image, or want to detect landmarks in only a subset of the faces in the image, set the [ inputFaceObservations] property to an array of [VNFaceObservation] objects representing the faces you want to analyze.

I am guessing if I use the location of the face from the first frame to analyze the second frame, and so on, the performance should be improved.

That’s what I am going to do next, and I’ll share the result once I have any!

snappy14u · August 7, 2019, 10:32pm

Hello @toojuice ,

This app is excellent. Thank you so much for this tutorial. I would like to extend the functionality of the app so that a user can create and store a video with facial landmarks. I have already written code to create and store a video on your phone using the face detection app and I wanted to know if you have any article links that would help me to store the facial landmarks along with the video? Any help with this would be greatly appreciated.

toojuice · August 8, 2019, 8:37am

Hi @snappy14u

That is a good question. This site has a tutorial that could help. The only issue is that it is fairly old and might be out of date. It might be enough to get you started, though:

https://www.raywenderlich.com/2734-avfoundation-tutorial-adding-overlays-and-animations-to-videos

bhavi · August 23, 2019, 11:40am

Hello @toojuice ,

This app is excellent. Thank you so much for this tutorial. I would like to extend the functionality of the app so that a user can know there face type or shape for example, Rectangle, oval or circular. Please help me where I can get. do I need to make some coordinate calculation to recognise. Any help with this would be greatly appreciated.

Thanks.

toojuice · August 23, 2019, 1:04pm

Hi @bhavi,

Thanks for kind words!

For the shape of the face, you could take the landmarks for just the face outline and close the path. Then you could set the fill for the path to be a particular color and see the shape of the face.

Is this what you mean?

Have a nice day,
Yono

konsdor · September 16, 2019, 4:27pm

Hi @toojuice!

Thank you for the nice tutorial. Is there any way to modify your code to add a smile detection?
I am trying to use CIDetector.

Thanks