Contributing to the coffee beans image dataset

This topic is a linked part of a larger work: “Open source optical coffee sorter project”.

Content

1. Introduction

2. Capturing the images

3. Enhancing the images

4. Splitting the images

5. Archiving the images


1. Introduction

This document shows our process for adding images to our open source coffee beans dataset, which we use to train the decision algorithms of our optical coffee sorter.

Everyone can create and contribute images, and all contributions are valuable because coffee beans differ wildly by variety and cultivation conditions. A smartphone with camera is enough to contribute. The image capture mode of our optical coffee sorter is more efficient (but the sorter is not yet finished, and not every contributor will have one).

2. Capturing the images

3. Enhancing the images

  • If you have pictures that are overexposed: use “Colors → Levels” and for channel “Value” map the visible part of the histogram (in logarithmic display mode) to target values “0 … 245”. For the source values, you can set the white point so it includes all white tones with high histogram values, mapping the even whiter ones (that occur rarely) to pure white this way.

  • If you have pictures that have a lot of stray light (“whitewashed low contrast appearance”): same as before, but map to 0 … 255 since they do not have to appear visually darker overall.

  • If you have pictures that have the wrong light temperature setting: open the “Colors → Levels” dialog and use the “grey point” button to select a point on the white background. This will adjust the channel gamma values for each color channel differently so that the color of the selected pixel appears neutral gray.

  • Alternative if the pictures are overexposed: use the “Colors → Auto → Normalize” filter. If necessary use “Brightness -30, Contrast +10” after that.

  • If the pictures are underexposed: set “Brightness +70, Contrast +20”. But often, pictures cannot be properly rescued in such cases.

4. Splitting the images

The Photo that you take of coffee beans is obviously collection of it; since that’s not efficient and smart. So you take photo for collection of bad and good beans separately. Once you have finished taking photo, you might want to separate each of the beans from those photo.

Following is the Steps/Procedures for splitting coffee beans:

4.1. Renaming:

  • Keep all good and bad images in separate folder. Example: Set01-good and Set01-bad
  • Rename each images. Example: Set01-good.01, Set01-bad.01, and soon.

4.2. Install GIMP

4.3. Set up the export_paths_to_png.py script

  • Download this file
  • Paste downloaded file in “C:\Program Files\GIMP 2\lib\gimp\2.0\plug-ins”

4.4. Set up the set_paths_visibility.py script

4.5. Configure keyboard shortcuts in GIMP to speed up this process

  • Open GIMP EditPreferencesInterface
  • Keyboard shortcut section → Configure Keyboard shortcut
  • Search for Selection to path and Set paths visibility
  • Click once on disable of Selection to path not Selection to path(advance) and assign EReassign Shortcut
  • Do same for Set paths visibility and assign Ctrl + A → Click CloseOk

4.6. Create “paths” for individual coffee beans in an image

  • Use the rectangular selection tool and set its tool options to Fixed: Size
  • Create a selection with one bean in its center.
  • Press E to create a new path.
  • Occasionally press Ctrl + A to show the “Set paths visibility” dialog and use it to show all paths. So you can see which beans you did not yet “process”.
  • Select one path in the Paths dialog, press Return, rename it to a two-digit number (01, 02, 03 etc.), press Return to confirm and an arrow to select the next one. This is the fastest way to work through the whole list to rename all.

4.7. Use the export_paths_to_png.py script on an image of coffee beans to create 220×220 px images in PNG format of individual beans

  • Select FilterPathsExport paths to PNG".
  • In the dialog, always select the same output folder for the PNG export images of all source images.
  • In the dialog, select Export Paths: All paths.
  • In the dialog, set a prefix Set01-good.01. (or whatever is the part of your source image filename before the .png).
  • Click “OK”. The result will be files Set01-good.01.01.png, Set01-good.01.02.png etc., for all your paths.
  • Also save the full image in GIMP’s .xcf.bz2 format, by choosing File → Save as … in GIMP. Example : Set01.good.01.xcf.bz2 for image Set01.good.01.jpg

4.8. Repeat the last two steps for all images of coffee beans for which the process has not yet been done

5. Archiving the images

5 Likes

i will start taking the required pics next week i’m just waiting for a better camera. but how are you guys coming along with the design of the machine?

Hi, Anu and fellow members,

Thanks for sharing the scripts and the image set! Appreciate it!

I am a coffee roaster in California. I am new to the community and just introduced myself last week. I have been hand-sorting coffee beans and would like to contribute the open source community and build a bean sorter.

I have some questions regarding the image sets:

For different type of beans, the good bean image might hold true for both types of beans. For example, a bean with slightly yellowish look be perfectly good for dry processed Ethiopia, but it would not be good for Wet procesesd Kenya.

Wondering what you would recommend to handle this type of situation? Do we need to maintain different sets of images of good and bad beans for different types of beans?

Or any way, we still share some of the images sets? Like stones, like bug bites etc.

Thanks!

Regards,
Wen

1 Like

Yes, I think different image sets would be the way to go. Think of the image set as the “configuration” of the optical sorter. Optical sorters from industry also need configuration, typically by entering parameters. We want to go for image classifiers, so our configuration would be example images. You would sort maybe 1-2 kg of beans by hand first, set the sorter to configuration mode, feed it the good beans (and tell it these are the good ones), feed it the bad beans (and tell it), and then set it to sorting mode and let it sort.

The image set we have so far is of semi-natural processed Arabica from upper Gorkha, Nepal. It’s meant as a test dataset to prove that image classifiers work at all.

If you want, you could contribute your own datasets. It’s quite some work to create them before you have a sorter, though. We now made a script to automatically extract individual bean images from pictures of ~50 beans though. So the above instructions using GIMP plugins are outdated now.

What I missed initially was detailed instructions for manual sorting, including illustrative images of the damage types you describe. If we had an open source document about this, it would give us a better idea of what the details of the job are that this machine should do. Also, what possible expansions could be: you mentioned sorting roasted beans in the other thread, and I once thought about the possibility to sort beans optically by probable flavor attributes.

2 Likes