Machine Learning : Make pictures to train image classifier

I will try to make my own pictures to train a model following the steps of Machine learning by tutorials.

Is there a guide about how to take those images? For example if you want to classify flowers and you have roses. Is there a best way to make the characteristics of the object you want to classify more clear and then start adding more complicated images?

Should someone take pictures of roses on a white background and then move further ?

The question comes because everything is almost pixel related, would that actually help (having white backgrounds and then more complicated backgrounds ) or it will make things more complicated since white background is not a physical background of a rose flower ?


@clapollo @audrey @hollance Can you please help with this when you get a chance? Thank you - much appreciated! :]

You should take pictures that are representative of the pictures your users will take when using the app.

If you only train the model on roses on a white background, then the model will only learn what a rose on a white background is. It may not understand that a picture of a rose in a garden is also supposed to be a rose.

Ideally, you would have many pictures of each type of flower, taken against many different backgrounds (including white ones if you have them). This way the model learns what pixel patterns really are a rose and what parts of the image are not roses.

1 Like

Thanks for the fast reply.

In that case someone need a huge amount of images if he wants to train something that the background can vary.

Meaning I am trying to train hand gestures for Sign Language. Trying to start by learning my model just to identify sign ONE( hand with point finger ) and sign TWO (hand with point and middle finger.)

I created 700 images for each Gesture in the same position (meaning same background)

After training the model with the images I most got results of gesture ONE 90% of the times. I am using the live video from your example on your book.

After that I took more pictures of Gesture sign TWO thinking that the images are not good or not enough and ended up with 1400 images of gesture TWO and 700 of gesture ONE. (even though this is against machine learning training since data are disproportional I thought to give a try)

Output was approximately the same.

Just to confirm I am testing with same iPhone I took the images and same background that I took the training images.

Result is 85-90% of the times gesture One.

Any suggestion about how I could solve this kind of problem and then I can move to more difficult things and make my life more difficult :smiley: .

Is transfer learning the correct approach for this project?

I could share the images that I have since it is nothing confidential just so you could point where am I doing something wrong.

Also is there some other forums that I should point my questions in case that this question is very detailed for RW Forums ?

Thanks a lot in advance.

@hollance Do you have any feedback about this? Thank you - much appreciated! :]

There are unfortunately a lot of things that could be going wrong here…

First off, 700 images for each gesture should be plenty. Although it would be useful if they would have many different backgrounds, lighting conditions, etc.

With so many training images, it’s a little strange that your classifier isn’t giving you the expected results.

It would be useful for me to see (at least some) of these images so I can get a better idea of what you’re dealing with.

As an example of why having a good dataset is important, check out this notebook:

This learns a model that can detect where the fingernails are in a photo of a hand. I took the data from that repo and trained my own version of this model. I can confirm that it works really well on images from the training set and the validation set. (You can also see this in that notebook.)

Does it work well on arbitrary images of hands? Not at all.

Note that the background in all these training images is blue and that only the top portion of the fingers is ever shown.

On an image where the background is not blue, no nails are detected. On an image with a blue background where the whole hand is visible, the model thinks the knuckles are the nails.

Note how there are no knuckles visible anywhere in the training images. So the model sees things it has never seen before and does not understand what to do with them.

That’s why it’s important that you train the model on the same sort of images that users will be using it on. My initial guess is that there is some fundamental difference between your training images and the images from the live video, but I’d need to see some examples to make sure.

This topic was automatically closed after 166 days. New replies are no longer allowed.