Auto-filtering validation in citizen science biodiversity monitoring: a case study
Keywords: Citizen Science, data validation, machine learning, biodiversity, automatic filters
Abstract. Data quality is the primary concern for researchers working on citizen science projects. The collected data by citizen science participants are heterogeneous and therefore must be validated. There are several validation approaches depending on the theme and objective of the citizen science project, but the most common approach is the expert review. While expert validation is essential in citizen science projects, considering it as the only validation approach can be very difficult and complicated for the experts. In addition, volunteers can get demotivated to contribute if they do not receive any feedback regarding their submissions. This project aims at introducing an automatic filtering mechanism for a biodiversity citizen science project. The goals of this project are to first use an available historical database of the local species to filter out the unusual ones, and second to use machine learning and image recognition techniques to verify if the observation image corresponds with the right species type. The submissions that does not successfully pass the automatic filtering will be flagged as unusual and goes through expert review. The objective is on the one hand to simplify validation task by the experts, and on the other hand to increase participants’ motivation by giving them real-time feedback on their submissions. Finally, the flagged observations will be classified as valid, valid but uncommon, and invalid, and the observation outliers (rare species) can be identified for each specific region.