Organizing Data

Our data comes in from the field to a server and is organized like this.

.
└── Country/ (Country it is operated in)
    └── Deployment_YYYY-MM-DD/  (Deployment Folder)
        ├── YYYY-MM-01 (first nightly folder of a deployment)
        ├── YYYY-MM-02 (second...)
        └── YYYY-MM-03/
            ├── DEVICE_YYYY-MM-DD-HH-MM-SS.jpg  (Raw Image collected)
            ├── DEVICE_YYYY-MM-DD-HH-MM-SS.json (Yolo detection with auto-ID)
            └── DEVICE_YYYY-MM-DD-HH-MM-SS_metadata.json

Deployment

Each “deployment” is a device left out in the field somewhere. The deployment has a unique name like this:

AREA_SITE_DEVICE_YYYY-MM-DD

The “Area” is a broad area that a specific field agent tends to work on, like “MtTotumas” The “Site” is a human name for the very specific place you left the Mothbox, like “TreeNearLodge” The “Device” is a unique name that the Mothbox calls itself. These are names based off the internal serial number of the Raspberry Pi on the Mothbox meshed with a list we made of Spanish and English verbs, nouns, and adjectives. Like “FuerteFrog” Then there is a date stamp that marks the first day a mothbox was left out in the field. like 2024-04-30. The format is YYYY-MM-DD.

Nightly Folders

A deployment usually has several nights. Each night is collected in its own folder. The nightly folders are automatically created by the Mothbox and have a basic format:

YYYY-MM-DD

A special note about Mothbox “nights.” Since most of our data collection happens at night, each night for these folders runs from 12:00 pm of the first day it is left out until 11:59am of the next day. In this way, images captured at, for instance, 3AM are considered part of the same night that started 10 hours earlier at 7 PM the preceding day. This is somewhat similar to the Ethiopian time system.

Samples

Each data “sample” consists of a set of grouped files.

DEVICE_YYYY-MM-DD-HH-MM-SS.jpg  (Raw Image collected)
DEVICE_YYYY-MM-DD-HH-MM-SS_metadata.json
DEVICE_YYYY-MM-DD-HH-MM-SS.json (Labels: Detection Data like Yolo detection and Bioclip-ID)

Raw Photo

The “raw” photos we capture look like this. They are insects on a white background.

Metadata

Next we create a metadata file for each raw photo. This contains information about the sampling like:

- GPS: [lat,lon]
- Person Who Collected it
- Land Use Type
- Type of Mothbox Deployed
- Any additional Data

Detection Data

Finally the data about individual insects is stored in another .json file that has the same name as the original raw photo. This detection data is created by several scripts. First a script (Mothbox_Detect.py) uses a trained Yolo model to detect where there might be interesting creatures present in the image. Its data looks like this when visualized in a program like X-Anylabelling:

Then we feed all those detections in another pass to a different script called Mothbox_ID.py, which uses BioCLIP to automatically ID the different creatures detected. It gives the detections labels based on the taxa it predicts them to be:

Database Editing

Finally there are some remaining scripts that help you open this data in database visualization and editing systems like Voxel51.