Generate Dataset
The Mothbot_CreateDataset.py script creates a user-friendly, editable dataset for you. It uses a program called Fiftyone to create the User Interface.
Inputs
We generally will only edit one night at a time.
Pre-processing Thumbnails
The first time you run the CreateDataset script on a night’s data, the first thing it needs to do is create thumbnails for each of the creatures it detected.
This can take a while, and the terminal will show a progress bar.
These thumbnail patch images will be stored in a little folder alongside that night’s data called “patches.”
Whenever you run this script again, however, it should go much faster, as it won’t have to create those thumbnails.
Results
After it completes all the processing, a couple things will happen.
Datasets stored to disk
First the script will save 3 files to that night’s folder.
- samples.json and metadata.json
- these store a consolidated set of all your automated samples created
- a .csv file with export date
- This is a convenience file generated to have an easy way to look at all the data, 1 detection per line, in a format that things like GBIF like
Dataset Opens in Web Browser
You computer will then also launch an interface in your web browser. This is still reading your data locally (nothing is in the “cloud”), so you don’t need an internet connection.
Using the Interface
The interface lets you filter your detections by their identifications. This can let you see how good the automated detections were.
The most important part of this interface is that you can edit the tags on these datasets to:
- Correct any mislabels
- Note any errors (e.g. a raindrop mis-identified as an insect)
- provide deeper labels
Editing Tags
When the interface first opens, you will probably see a view something like this:
It is already automatically sorted by image size, with the smallest detections shown first. This is because most errors tend to happen on the really small insects.
On the left side of the interface, you can filter detections. Click on “Sample Tags.” You can see in this night, we detected about 5,000 Arthropod creatures!
You can type in the filter area to select on particular taxa. For instance Lepidoptera: Note that for now, this filter may be case sensitive, ie “Lepi…” works, but not “lepi…”
Now the interface will show you only things that have been categorized as Lepidoptera:
You can click the checkbox to toggle showing all the ID tags on a sample too:
Changing Tags
You can select a set of samples. For instance, these grasshoppers were categorized incorrectly:
You can click the checkbox in each sample, OR you can hold SHIFT+click to select a range.
Now we need to change the tags because these are not Lepidoptera. While those are selected, click the “Tag” button.
Now, scroll through the tags, and UNCHECK the erroneus tags. (that is, it is still KINGDOM_Animalia, but not ORDER_Lepidoptera)
Next, we find the correct classification for these. I don’t know what family these crickets are, but I am pretty sure they are Order_Orthoptera Then hit “Apply.”
Now if we change our view to “Orthoptera,” we can see our re-classified crickets there!
Keep doing this for ALL incorrect labels!
Save the Corrected Dataset
We want to make sure all these edits do not disappear, so there’s a couple things we need to do to save this work.
Correct IDby Tag
By default, the creatures were all identified by Biolclip. But now you have gone and verified all the IDs, so we need to change this tag. First click on just this tag. It should select all the samples from your dataset.
Then we are going to create a new tag showing what we have ID-ed. Make sure that no samples are selected (this makes whatever changes apply to ALL the samples). Click the “tag” button at the top, and type in your new IDby tag. Use your own name. For instance I write: “IDby_Quitmeyer”. Hit “add…” Then click “apply.”
Now we need to remove the “IDby_Bioclip” tag from all of them. This is easy. Again just make sure that no samples are selected, and click the “tag” button. Now unclick the checkbox next to “IDby_Bioclip.” This will remove the tag from all these samples.
Export the Dataset
Now we need to save this corrected dataset. Click the “Browse Operations” button.
Select “Export Samples.”
Now make sure to select:
- Entire Dataset
- Labels only
- FiftyOne Dataset
Finally we need to put in a filepath for a new folder of where to save this. If you are on a Mac or Linux, this is easy. Just paste a file path.
If you are on Windows, it’s a little trickier because of the silly way that Windows saves file paths.
If you copy a Windows path from a Windows panel, it will look something like this:
C:\Users\andre\Desktop\Mothbox data\PEA_PeaPorch_AdeptTurca_2024-09-01\2024-09-01\QuitmeyerID
However, if you paste that into Fiftyone, there’s a bug where it still cannot handle a Windows-style file path.
So you can paste in a file path, plus the name of a new folder you want to create to store your new dataset in (for instance, my new folder is “QuitmeyerID”), but then you need to change all the “" to “/”.
C:/Users/andre/Desktop/Mothbox data/PEA_PeaPorch_AdeptTurca_2024-09-01/2024-09-01/QuitmeyerID
And now you can see it will let you click “execute.”
Now you have a new folder with your new data in it!
Export a CSV file of your new Dataset
Finally, if you want a new CSV file of this corrected data, there’s just one more script to run! Open Mothbot_ConvertDatasettoCSV.py Change the input path to your new dataset folder.
Hit the Run button.
And now you have a new CSV file in that folder too of all your data!