Environment, taxonomy, and socioeconomics predict non-imperilment in freshwater fishes

February 16, 2026

Abstract

Freshwater fishes are among the most threatened taxa, yet conservation assessments remain incomplete for many species. Freshwater fishes provide essential ecosystem services such as food security, recreational opportunities, and cultural significance. Despite heavy alterations to freshwater ecosystems, the reasons for species’ sensitivity and resistance to imperilment are unclear. To address this need, we develop a machine learning framework to predict global imperilment status for 10,631 freshwater fish species using a comprehensive set of environmental, socioeconomic, and intrinsic species-level predictors. Using updated IUCN Red List data, we train and validate Random Forest classifiers to distinguish imperiled (Vulnerable, Endangered, Critically Endangered) from non-imperiled species. We examine the relative influence of 52 variables derived from 12 global sources describing extrinsic environmental and socioeconomic factors and intrinsic species-specific characteristics. Our models achieve higher accuracy for non-imperiled species (90.1%) compared to imperiled species (81.8%), reflecting the greater heterogeneity of threats and conditions driving imperilment. Across models, key predictors include habitat variables, taxonomic order, hydrological characteristics, and disturbance indicators, underscoring the interplay between ecology, geography, and human pressures. This integrative, reproducible approach demonstrates the utility of machine learning for guiding proactive conservation and provides a scalable framework for global biodiversity risk assessment.

Introduction

Biodiversity conservation is often focused on taxa that are most at-risk1,2. These species have often declined to levels that have reduced their capacity to recover, and thus require conservation actions that are difficult and very expensive to implement3,4. Although there is general agreement that at-risk species deserve immediate attention, conservation activities can also be effectively focused on species that have yet to be listed as imperiled. Addressing non-imperiled species more proactively can offer management solutions that involve fewer regulatory and logistical constraints2,5. In practice, both approaches are complementary, and thus it is important not only to recognize species at risk, but also to identify intrinsic and extrinsic characteristics that make species less vulnerable to extinction.

Understanding a species’ status with respect to extinction risk is a necessary, yet insufficient step toward identifying appropriate conservation interventions. Species status assessments typically focus on geographic ranges, population trends, species traits, and threats such as loss of habitat, overexploitation, and impacts of invasive species6,7,8,9. In most cases, the drivers of imperiled status can be traced back to human influences10,11. Species status assessments that collectively address socioeconomic, environmental, and intrinsic (e.g., species traits, taxonomic) factors can provide a more comprehensive evaluation of drivers, improved status predictions, and a clearer understanding of effective solutions for addressing threats to biodiversity10.

The global biodiversity crisis increasingly affects freshwater fishes, with a rapidly expanding list of species at risk of extinction from a complex and interacting array of threats12,13. Widespread habitat degradation and loss are among the most important threats. For example, approximately two-thirds of the world’s largest rivers have already been impounded14, and many more dams are being planned globally15, posing challenges for species movement, ecosystem functioning, and habitat connectivity16. In addition, habitat degradation and loss and risks posed by biological invasions17,18 associated with competition19, predation20,21, disease22, and hybridization23 represent major threats to freshwater fishes. These stressors further interact with climate change to threaten species globally24. Consequently, nearly one-third of freshwater fishes currently face the threat of extinction25.

As the most imperiled and diverse vertebrate group globally12, freshwater fishes present a critical case for expanding conservation tools. We examine whether complementary data, currently absent from formal status assessments, could reveal broader patterns of imperilment across 10,631 species worldwide. A range of globally consistent indicators of environmental, socioeconomic, and intrinsic species-level factors, excluding variables directly used as listing criteria (e.g., population size and trend, range area), are analyzed as predictors of conservation status (Table 1). We also examine the degree to which these factors are able to accurately predict distributions of fish species believed to be at risk or species not recognized as such (non-imperiled). The current conservation status of freshwater fishes used in this study is based on the International Union for Conservation of Nature (IUCN) Red List designations26.

Table 1 Global source datasets, predictor categories, and variables included in machine-learning random forest models to predict freshwater fish imperilment status

In this work, we apply machine-learning classification models to predict the global conservation status of freshwater fishes, integrating 52 variables from 12 international sources, and quantify the relative contributions of extrinsic environmental and socioeconomic factors and intrinsic species-specific characteristics (Table 1). We correct for geographic range-size bias by area-weighting correlated (R2 > 0.3) spatial variables (e.g., count of dams, human population; see “Methods�). We initially tuned an ordinal forest model to predict individual IUCN Red List categories, but high rates of misclassification between adjacent categories prompted a shift to a conventional random forest framework, reframing the analysis as a binary classification of species as either imperiled or non‑imperiled. Species classified by the IUCN as Near Threatened, or of Least Concern, were considered non-imperiled, whereas species classified as Vulnerable, Endangered, and Critically Endangered were considered imperiled (Fig. 1). Our findings underscore the critical role of intact habitats and fish taxonomy in predicting non-imperilment across species.

Fig. 1: Conservation status of freshwater fishes.
figure 1

Global distribution of conservation status (panels a, c, e) and introductions of freshwater fishes (panel g) in the International Union for Conservation of Nature (IUCN) Red List dataset as of 2024 (v2) at 100-km grid cells26. Box-and-whisker plots to the right (panels b, d, f, h) show the mean (center line), interquartile ranges (box), 5th and 95th percentiles (whiskers), and outliers (points) in the number of species for cells within each continent. Note that figures only represent assessed species in the spatial dataset of the IUCN Red List, except from data deficient (DD) species.

Results and discussion

Model results indicated that extrinsic environment and socioeconomics were most strongly associated with conservation status (Fig. 2), whereas intrinsic species-level factors contributed less than 10% to model assignment. Overall, non-imperiled species tended to occur in regions with greater water availability, moderate impoundment density, minimal habitat alteration, low human footprint, stable gross domestic product, and relatively few habitat types per unit area (Figs. 3,4). Generally, extreme values for environmental and socioeconomic factors were associated with imperiled species. This was observed even for species predominantly associated with forested lands. These findings are consistent with previous studies suggesting an association between species imperilment and habitat specificity27,28. In many cases, relationships of predictors with conservation status were non-linear (Figs. 3, 4).

Fig. 2: Summary results and variable importance.
figure 2

a Contributions of predictor categories to accuracy of imperilment assignment (random forest models). PR-AUC is the area under the precision-recall curve. b Confusion matrix of global accuracy (88%), misclassification was asymmetric as imperiled species were misclassified more frequently than non-imperiled species. c Accuracy contributions from the 20 most important individual predictors colored by category. IUCN refers to the International Union for Conservation of Nature, and GDP to Gross Domestic Product. The color and hash marks for the specific predictors (c) correspond to the broad categories (color: Environment, blue; Socioeconomic, orange; and Species, yellow) and sub-categories (hash marks, as labelled) within (a).

Fig. 3: Partial dependence on socioeconomic predictors.
figure 3

Partial dependence plots (panels a, d, g, j) show relationships between the predicted probability of imperilment and the four most important socioeconomic variables, colored on the scale in maps (panels b, e, h, k) with the spatial distribution of each variable by grid cell. Box-and-whisker plots (panels c, f, i, l) show the mean (center line), interquartile ranges (box), and 1.5x interquartile ranges (whiskers) for cells within each continent (outliers not shown). Grey represents mapped areas with no value for the mapped variable (panels e, h; e.g., no protected areas or no species data, due to lack of range). Unknown attributes (panel d) are a sum of the NA values for non-spatial predictors from FishBase v4/202439,67 and IUCN RedList 2024 v226 for a given species (representative of knowledge gaps). Socioeconomic source data (panels b, h, k) derived from publicly available datasets, World Bank, Global Terrestrial Human Footprint, and the World Database of Protected Areas (WDPA)46,49,50.

Fig. 4: Partial Dependence on Environmental Predictors.
figure 4

Partial dependence plots (panels a, d, g, j) of predicted imperilment probability and four of the most important environmental variables, colored on the scale in maps (panels b, e, h, k), with the distribution of variables by grid cell. Box-and-whisker plots (panels c, f, j, l) show the mean (center line), interquartile ranges (box), and 1.5x interquartile ranges (whiskers) for cells within each continent (outliers not shown). Value ranges differ between partial dependence plots and map color scales because of the spatial scale of measurement represented (i.e., range-based in partial dependence plots vs. grid-based maps). Mean temp. wet. quarter = mean temperature of the wettest quarter; Avg = average. Environmental source data derived from publicly available datasets, WorldClim v2.1, Global River Classification (GloRiC), and Copernicus Global Land Service43,44,68.

Classification by status

The model predicted conservation status with high accuracy (balanced classification accuracy: 88%), yet performance was lower for imperiled species (81%) than for non-imperiled species (90%), as indicated by misclassification rates in the test dataset (Fig. 2b). This mismatch suggests that conditions associated with non-imperilment are more consistent and better represented than those driving imperilment. This may be unsurprising if fish imperilment is partly influenced by specific or narrow habitat requirements, consumptive values to humans, or other unique traits contributing to their conservation status. Influential predictors of imperilment may thus provide less consistent targets for conservation than predictors associated with non-imperilment. Further, the IUCN represents a global species status, whereas within the extent of a species range, more local threats could be important (e.g.29), and not detected with an analysis focused on whole species’ ranges. Our analysis is a first step towards systematically evaluating influences and general drivers of imperilment on a global extent. It integrates knowledge of species distributions, other data to inform listings, and predictors to better understand global patterns of imperilment.

The strongest and most unique predictor of the binary random forest model was hydro-geomorphic diversity, where more habitat types per unit area in the species range showed a positive non-linear dependence with imperilment probability (Fig. 2). Higher environmental heterogeneity (low homogeneity) may proxy compromised connectivity among habitat patches, a described driver of population decline in stream fishes30. Different measures of water availability, such as stream power and permanent water cover, were among the most influential factors, consistent with their obvious influence on habitat and its relationships with other predictors (Fig. 3). Taxonomic order was the second most influential variable, likely due the fact that fishes within the same order share many traits in common and thus may respond more similarly to environmental stressors (Fig. 2). Taxonomic influence has been reported in previous analyses of the conservation status of small-bodied freshwater fishes, though only hypothesized to occur throughout IUCN Red List designations31. Future efforts could determine whether these results are due to common stressors or unavoidable subjectivities inherent in conservation assessment frameworks, despite rigorous implementation32.

Whereas intrinsic factors had the least cumulative influence on predictions (predominantly in the form of taxonomy), the level of human knowledge on species traits showed high predictive importance (Fig. 2c Knowledge gaps). Species with either scarce or ample information are more likely to be classified as imperiled, supporting the influence of risk aversion in conservation decisions33. Previous modeling efforts have described the importance of human knowledge in discriminating between data-sufficient and data-deficient species in the IUCN Red List34. In our analysis, the influence of information gaps may be even more relevant as we excluded data-deficient species and examined the role of missing knowledge in real assessment decisions. These results should be interpreted with caution, given that a high proportion of assessed species are missing numerous traits and environmental attributes (e.g., 48% missing ≥30 attributes). Whereas the criteria of the IUCN evaluation framework are not directly represented in these datasets, assessments across species may be inconsistent, in part due to information biases35. Results of the preliminary ordinal forest model support at least some influence of the latter, evidenced by high confusion of sub-classifications within imperiled and non-imperiled groupings (Table S1). Furthermore, species assessments exhibit biases with respect to geography and taxonomy, with economically developed regions and early described species having a greater representation35, which is likely reflected in our findings. To address these alternative explanations, future efforts could focus on attempting to disentangle whether high economic productivity results in higher environmental stress, fewer conservation efforts, or whether economic growth alters perceptions of biodiversity loss.

Next steps

Future models may be further improved by increasing standardization of listing criteria and the inclusion of more species, with a focus on imperiled species and poorly assessed regions. However, a similar analysis on a prior Red List dataset for 2020 (with 5725 instead of the present 10,631 species) revealed comparable patterns, except for the importance of taxonomic order. A comparison of our model findings with species not yet evaluated by IUCN might provide information about taxa and regions that would benefit most from additional effort for classification. Including additional predictor variables might improve model performance, as well as temporally explicit data, higher-resolution environmental models, and trajectories from historical conditions. Future models could be modified to predict classification histories or future status using historical or scenario datasets, as well as for predicting the conservation status of data-deficient36 and yet-to-be-assessed species.

In conclusion, our global analysis of the conservation status of freshwater fish species offers insights into the main factors driving their imperilment or non-imperilment. Our findings underscore the critical role of habitat connectivity, taxonomy, water availability, and low-to-moderate human disturbances in predicting non-imperilment across species. Moreover, the higher model error in identifying imperiled species suggests higher idiosyncrasy in imperilment processes than those observed for secure populations. These findings suggest proactive protection strategies may be more efficient and consistent than reactive conservation approaches, emphasizing the gains that targeted and forward-looking conservation initiatives may create by safeguarding global freshwater fish biodiversity.

Methods

Sources and categorization of imperilment and predictor variables

The dataset developed for this study involved the compilation and processing of 12 global data sources (Table 1). Predictive data sources were selected based on their offering of information relevant to one or more of three broad categories: Environmental, Socioeconomic, and Intrinsic. Each of the initial 122 candidate predictive variables in our dataset was also classified into one of 13 sub-categories (i.e., Habitat, Climate, Hydrology, Economy, Development, Footprint, Threats, Impoundments, Taxonomy, Physiology, Life-history, Knowledge, Conservation), based on their domain. A predictor representing knowledge gaps was calculated using unknown attributes, the number of predictive variables with NA values, for each species. For the present study, we only used one response variable (i.e., the latest IUCN Red List conservation status), provided as binary (imperiled and non-imperiled) and ordinal (five classes) responses for random forest and ordinal forest models, respectively. For a more detailed description of each variable in the conflated dataset, see Table S1.

Intrinsic and species response data

We accessed species conservation data from the IUCN Red List of Threatened Species26 (2024-v2) using the IUCN Red List API: http://apiv3.iucnredlist.org. Data included the latest conservation assessments, identified threats, and necessary management actions for species recovery. Original threat classes (15) were binned into four larger categories, namely: Human Development, Exploitation of Natural Resources, Dams and Reservoirs, Invasive Species, and Natural Disasters. Management actions from IUCN were also reclassified into broader classes: Habitat Restoration, Species Control, and Social Policy. Additionally, we used species ranges available through IUCN’s Red List spatial database (https://www.iucnredlist.org/resources/spatial-data-download). As a standard requirement, these fish ranges are delineated based on HydroBASINS37 polygons (basins, sub-basins, and large water bodies38). The original dataset of fish ranges was filtered based on uncertainty and origin and combined to represent one merged polygon representing the present, native range of each species (n = 14,666). This species list was then filtered to match the list of species available through FishBase (n = 14,128). We used the rfishbase package39 to obtain the species, stock, ecology, and swimming information for each freshwater fish species in the IUCN database based on scientific name (May 20, 2025; v04/2025), removing duplicate entries. We then calculated ranges for values that included minimum and maximum values (e.g., pH, dH).

To complement the spatial information provided in species ranges, we accessed occurrence locations available through the Global Biodiversity Information Facility (GBIF)40 via the rgbif package in R (as of May 20, 2025). To ensure the precision and quality of spatial coordinates, we queried records ranging from 1990 to 2024 and thoroughly cleansed based on all filters available in the package CoordinateCleaner41, dropping duplicates, and all records coinciding with country capitals, centroids, scientific collections, as well as coordinates falling outside each species range. This resulted in 407,627 clean occurrences for 8943 species, then used to calculate elevation values (min., max., and mean) using the package elevatr42 to access AWS (Amazon Web Services) Open Data Terrain Tiles at 100-m pixel size.

Environmental data

Non-human environmental data were retrieved from various public repositories, including datasets hosted by Copernicus, WorldClim, and HydroSHEDS. For the characterization of fish ranges based on climatic data, we used WorldClim v2.143, a pixel-based bioclimatic model providing 19 variables representing trends in ecologically meaningful climatic conditions, which is often used in species distribution modeling. Datasets for the period 1970–2000 were used as representations of contemporary climatic conditions. For each species, we obtained hydro-geomorphic habitat metrics through the Global River Classification dataset (GloRiC)44, offering an elevation-derived global hydrography dataset populated with various value-added attributes. Fish ranges were used to summarize GloRiC’s reach data, such as counts of unique hydrologic, climatic, and geomorphic reach types as measures of hydro-geomorphic heterogeneity, and mean estimated annual discharge. We used 100-m resolution land cover data from Copernicus Global Land Service (Collection 3, Epoch 2015), derived from PROBA-V satellite observations. We quantified the mean fraction of fish ranges covered by trees, shrubs, grass, crops, bare, snow, and water cover classes.

Socioeconomic data

The spatial interaction between species ranges and anthropic variables was assessed based on six main datasets: WorldPop45, WorldBank46, the Global Dam Watch47, HydroWaste48, Global Terrestrial Human Footprint49, and the World Database of Protected Areas50. WorldPop develops publicly available datasets of spatial demography, including several indicators of human population dynamics and development indicators. For each species range, we averaged WorldPop’s yearly population density values for years 2000 and 202045. Gross Domestic Product (GDP) and GDP per capita as current US dollars were obtained from DataBank, the public database and analysis tool made available by The World Bank46. These yearly time series data were assigned to spatially explicit country polygons, as offered by the high-resolution ESRI’s Online Service Feature Layer World Countries51. We collected human footprint information for each species based on the Global Terrestrial Human Footprint map made available by Venter et al49 for the years 1993 and 2009. This dataset provides a single index value compositing various measures of human disturbance over the landscape, including built infrastructure, population density, and agricultural areas, among others. We accessed the Global Dam Watch (GDW) database, a curated source of spatial data on dams and reservoirs47. We merged three different datasets for contemporary global data on small dams and reservoirs: the Global geOreferenced Database of Dams (GOODD)47, Global River Obstruction Database (GROD)52, and the Joint Research Centre Data (JRC)53. GOODD, GROD and JRC contain additional records for smaller dams and reservoirs not captured in other GDW datasets, including all impoundments identifiable on Google Earth imagery, totaling 33,495 data points47,52,53. We also used the Global Reservoir and DAm Dataset (GRAND)54, a curated product of a global collaborative effort mapping existing dams higher than 15 meters and reservoirs bigger than 0.1 cubic kilometers. Its updated version, published in 2019, contains coordinates and accompanying data for 7,424 dams.

We evaluated interactions between fish ranges and wastewater treatment plants using HydroWASTE data48. HydroWASTE provides the location and characteristics of 58,502 wastewater treatment plants around the globe, ensuring spatial consistency with the HydroSHEDS network used in this study. We quantified the number of treatment plants within each species range, and two additional metrics: the sum of population served and the sum of river discharge treated.

To quantify the prevalence of protected areas within species ranges, we accessed the World Database on Protected Areas50 as the most comprehensive and updated source of global protected areas for conservation. Two conservation-related variables were calculated for each species: protected percent and a count of UNESCO’s Ramsar Sites within each species range.

Characterization of fish ranges

The environmental and socioeconomic characterization of fish ranges involved several steps and was scripted in Python 3.1155 using different packages depending on data formats. For pre-processing of IUCN ranges and most vector-based calculations, we used GIS software (ArcGIS Pro 3.451) functionalities via the arcpy library, while raster-based data sources were processed using the exactextract package56 to help speed up computations. To focus on freshwater ecosystems, we removed marine regions for marine and estuarine-associated fishes. Fish ranges were then used to summarize underlying spatial data by averaging, adding, counting, or getting ranges, depending on variable types (e.g., continuous, discrete, categorical). All raster data, including climatic variables, land cover classes, and human footprint indexes, were summarized based on the overlapping pixels with fish ranges.

The spatial interaction of fish ranges with dams and reservoirs was captured by calculating the total number of dams, impounded drainage areas, average flows (log), power generation capacity, and reservoir surface areas within the range boundaries of each species. Since most of IUCN’s fish ranges consist of large watersheds, calculating the elevation of species’ habitats using them would likely give deceiving results, since fish many times use only a few environments within those watersheds. To address this impediment, we used elevation ranges (mean, max., min.) obtained from GBIF occurrence locations, as described above. Lastly, to convert country-level socioeconomic metrics to a range scale, we used a weighted average of the proportional overlap between countries and fish ranges.

Machine-learning models

Random and ordinal forest algorithms are commonly used in ecology because of their advantages, particularly in dealing with non-linear relationships and interactions57. Previous modeling studies have demonstrated their superior performance in classification tasks involving large datasets, including global-scale predictions of IUCN Red List status36,58,59.

Our initial approach using ordinal forests was implemented using the cforest algorithm in the party package60; however, high misclassification rates among adjacent conservation classes (e.g., Least Concern and Vulnerable; See Supplementary Information Table S2) suggested a higher predictability of two response groups, that we refer to here as imperiled (Critically Endangered, Endangered, and Vulnerable) and non-imperiled (Near Threatened and of Least Concern). Respectively, we constructed a binary random forest model, calling the randomForest package61 through caret62 for repeated cross-validation and hyperparameter tuning.

Variable selection and standardization

The development of random forests was preceded by a variable selection process based on biological relevance, interpretability, and correlation of predictors and response variables. Variables that were expected to contribute directly to the assignment of imperilment (i.e., range area; perimeter) or that contained comparable information (i.e., CITES code; historical listing status; population trend; vulnerability) were removed. We also removed categorical variables with too many levels (i.e., higher than 53) for random forest (i.e., genus, subfamily and family), and those shared by nearly all species (i.e., class).

In order to reduce the influence of species range area in our numerical analysis, variables expected to correspond to range area (i.e., number of sympatric species, introduced species, dams, area of lakes, blocked discharge, dam catchment area, and reservoir surface area, population relying on wastewater treatment plants, and waste discharge) were standardized by dividing over range area, and the correlations of the uncorrected and corrected variables were then compared to range area. The version with the lowest correlation was retained. This reduced all correlations with the range area to less than 30%.

Finally, other highly correlated variables (|r| > 0.75) were identified and filtered to remove redundancy and retain the least correlated variables to the rest of the dataset: latest and earliest GDP were removed, GDP slope was retained; latest and earliest GDP per capita were removed, GDP per capita slope was retained; east and west limits were 95% correlated, east limit was retained; count of geomorphic classes and sum of stream network in km were 98% correlated, count of geomorphic classes was retained; a number of historical WorldClim variables were highly correlated (85–98%), mean diurnal temperature range, maximum temperature of the warmest month, mean temperature of the warmest quarter, mean temperature of the coldest quarter, annual temperature range, annual precipitation, precipitation seasonality were removed; mode of classes of physio-climatic sub-classification was 93% correlated with minimum temperature, minimum temperature was retained; human footprint indices for 1993 and 2009 were 97% correlated, human footprint for 2009 was retained; the earliest and latest population densities were 88% correlated, the most recent population density was retained; wastewater population served over range area and wastewater discharge over range area were 89% correlated; wastewater discharge over range area was retained.

Model construction and weighting

Random forest models were configured and tuned using the recursive partitioning package caret in R62. The random forest model was weighted 3:1 (imperiled:non-imperiled) to account for the greater representation of non-imperiled species and na.roughfix from the randomForest package61 was used to compensate for gaps in data. For numeric variables, this replaces missing data (NAs) with column medians, while for factor variables, it replaces NAs with the most frequent levels observed, breaking ties at random. We constructed the random forest model with 1500 trees, as we expected more trees to improve performance and were not appreciably limited by run time63. Each tree used 63.2% of the data, with the remaining 36.8% of the data used to assess predictive performance based on various metrics and set to prioritize macro-averaged mean absolute classification errors as a class-unbiased metric for model selection in ordinal regression64, and the area under the precision-recall curve (PR-AUC) as an unbiased metric for the performance of binary random forest models65.

Partial dependence tables and plots were generated for all input predictors using the pdp package66 in R and scaled to assignment probability (e.g., a value of 0.5 corresponded to an even chance of assignment to either category, 0.9 corresponded to a 90% probability of assignment to imperiled). All tuning performance metrics, final model outputs, and partial dependence plots are available in Tables S3–S5.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data used in this study were sourced from publicly available datasets: IUCN Red List v2024.226, FishBase v4/202439,67, Global Biodiversity Information Facility40, WorldClim43, Global River Classification44, Copernicus Global Land Service68, Global Dam Watch47, HydroWASTE48, Global Terrestrial Human Footprint49, WorldPop45, WorldBank46, and the World Database of Protected Areas50. No additional datasets were generated in this study. Data questions can be addressed to J. Andres Olivos (andres.olivos@oregonstate.edu).

Code availability

All processing scripts and output classification models are publicly available at https://github.com/AndresOlivos/freshwater_fish_imperilment_classification and are archived at https://doi.org/10.5281/zenodo.17674411.

References

  1. Betts, J. et al. A framework for evaluating the impact of the IUCN Red List of threatened species. Conserv. Biol. 34, 632–643 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  2. Possingham, H. P. et al. Limits to the use of threatened species lists. Trends Ecol. Evol. 17, 503–507 (2002).

    Article 

    Google Scholar
     

  3. Hanski, I., Moilanen, A. & Gyllenberg, M. Minimum viable metapopulation size. Am. Nat. 147, 527–541 (1996).

    Article 

    Google Scholar
     

  4. Kuussaari, M. et al. Extinction debt: a challenge for biodiversity conservation. Trends Ecol. Evol. 24, 564–571 (2009).

    Article 
    PubMed 

    Google Scholar
     

  5. Donlan, C. J. Proactive Strategies for Protecting Species: Pre-Listing Conservation and the Endangered Species Act. (Univ of California Press, 2015).

  6. Pearson, R. G. et al. Life history and spatial traits predict extinction risk due to climate change. Nat. Clim. Change 4, 217–221 (2014).

    Article 
    ADS 

    Google Scholar
     

  7. Davidson, A. D. et al. Drivers and hotspots of extinction risk in marine mammals. Proc. Natl. Acad. Sci. 109, 3395–3400 (2012).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  8. Ducatez, S. & Shine, R. Drivers of extinction risk in terrestrial vertebrates. Conserv. Lett. 10, 186–194 (2017).

    Article 

    Google Scholar
     

  9. Betts, M. G. et al. Global forest loss disproportionately erodes biodiversity in intact landscapes. Nature 547, 441–444 (2017).

    Article 
    PubMed 

    Google Scholar
     

  10. Cinner, J. E. et al. Bright spots among the world’s coral reefs. Nature 535, 416–419 (2016).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  11. Bennett, N. J. et al. Conservation social science: understanding and integrating human dimensions to improve conservation. Biol. Conserv. 205, 93–108 (2017).

    Article 

    Google Scholar
     

  12. Dudgeon, D. Multiple threats imperil freshwater biodiversity in the Anthropocene. Curr. Biol. 29, R960–R967 (2019).

    Article 
    PubMed 

    Google Scholar
     

  13. Carpenter, S. R., Stanley, E. H. & Vander Zanden, M. J. State of the world’s freshwater ecosystems: physical, chemical, and biological changes. Annu. Rev. Environ. Resour. 36, 75–99 (2011).

    Article 

    Google Scholar
     

  14. Grill, G. et al. Mapping the world’s free-flowing rivers. Nature 569, 215–221 (2019).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  15. Barbarossa, V. et al. Impacts of current and future large dams on the geographic range connectivity of freshwater fish worldwide. Proc. Natl. Acad. Sci. 117, 3648–3655 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  16. Belletti, B. et al. More than one million barriers fragment Europe’s rivers. Nature 588, 436–441 (2020).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  17. Gherardi, F. Biological Invaders in Inland Waters: Profiles, Distribution, and Threats. (Springer, 2007).

  18. Rahel, F. J. Biogeographic barriers, connectivity and homogenization of freshwater faunas: it’s a small world after all. Freshw. Biol. 52, 696–710 (2007).

    Article 

    Google Scholar
     

  19. Dominguez Almela, V., South, J. & Britton, J. R. Predicting the competitive interactions and trophic niche consequences of a globally invasive fish with threatened native species. J. Anim. Ecol. 90, 2651–2662 (2021).

    Article 
    PubMed 

    Google Scholar
     

  20. Clark, K. H. et al. Freshwater unionid mussels threatened by predation of Round Goby (Neogobius melanostomus). Sci. Rep. 12, 12859 (2022).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  21. Alofs, K. M. & Jackson, D. A. Meta-analysis suggests biotic resistance in freshwater environments is driven by consumption rather than competition. Ecology 95, 3259–3270 (2014).

    Article 

    Google Scholar
     

  22. Poulin, R., Paterson, R. A., Townsend, C. R., Tompkins, D. M. & Kelly, D. W. Biological invasions and the dynamics of endemic diseases in freshwater ecosystems. Freshw. Biol. 56, 676–688 (2011).

    Article 

    Google Scholar
     

  23. Perry, W. L., Lodge, D. M. & Feder, J. L. Importance of hybridization between indigenous and nonindigenous freshwater species: an overlooked threat to North American biodiversity. Syst. Biol. 51, 255–275 (2002).

    Article 
    PubMed 

    Google Scholar
     

  24. Barbarossa, V. et al. Threats of global warming to the world’s freshwater fishes. Nat. Commun. 12, 1701 (2021).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  25. Hughes, K. The World’s Forgotten Fishes. (World Wide Fund for Nature (WWF), 2021).

  26. International Union for Conservation of Nature (IUCN). The IUCN Red List of Threatened Species. Version 2024-2. (2024).

  27. Bury, G. W., Flitcroft, R., Nelson, M. D., Arismendi, I. & Brooks, E. B. Forest-associated fishes of the conterminous United States. Water 13, 2528 (2021).

    Article 

    Google Scholar
     

  28. Keinath, D. A. et al. A global analysis of traits predicting species sensitivity to habitat fragmentation. Glob. Ecol. Biogeogr. 26, 115–127 (2017).

    Article 

    Google Scholar
     

  29. United States. The Endangered Species Act, as amended by Public Law 97–304, U.S.C. §§15311544.

  30. Fuller, M. R., Doyle, M. W. & Strayer, D. L. Causes and consequences of habitat fragmentation in river networks. Ann. N. Y. Acad. Sci. 1355, 31–51 (2015).

    Article 
    ADS 
    PubMed 

    Google Scholar
     

  31. Kopf, R. K., Shaw, C. & Humphries, P. Trait-based prediction of extinction risk of small-bodied freshwater fishes. Conserv. Biol. 31, 581–591 (2017).

    Article 
    PubMed 

    Google Scholar
     

  32. Biedenweg, K., Trimbach, D., Delie, J. & Schwarz, B. Using cognitive mapping to understand conservation planning. Conserv. Biol. 34, 1364–1372 (2020).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  33. Tulloch, A. I. T. et al. Effect of risk aversion on prioritizing conservation projects. Conserv. Biol. 29, 513–524 (2015).

    Article 
    PubMed 

    Google Scholar
     

  34. Cazalis, V. et al. Prioritizing the reassessment of data-deficient species on the IUCN Red List. Conserv. Biol. 37, e14139 (2023).

    Article 
    PubMed 

    Google Scholar
     

  35. Hughes, A. C. et al. Sampling biases shape our view of the natural world. Ecography 44, 1259–1269 (2021).

    Article 
    ADS 

    Google Scholar
     

  36. Wieringa, J. G. Comparing predictions of IUCN Red List categories from machine learning and other methods for bats. J. Mammal. 103, 528–539 (2022).

    Article 

    Google Scholar
     

  37. Lehner, B. & Grill, G. Global river hydrography and network routing: baseline data and new approaches to study the world’s large river systems. Hydrol. Process. 27, 2171–2186 (2013).

    Article 
    ADS 

    Google Scholar
     

  38. IUCN SSC Red List Technical Working Group. Mapping Standards and Data Quality for the IUCN Red List Spatial Data. Version 119 (2021).

  39. Froese, R. & Pauly, D. FishBase. World Wide Web electronic publication, version (04/2025). (2025).

  40. Derived dataset GBIF.org (28 December 2025) Filtered export of GBIF occurrence data https://doi.org/10.15468/dd.pu934q.

  41. Zizka, A. et al. CoordinateCleaner: standardized cleaning of occurrence records from biological collection databases. Methods Ecol. Evol. 10, 744–751 (2019).

    Article 

    Google Scholar
     

  42. Hollister, J. elevatr: access elevation data from various APIs. CRAN Contrib. Packag. (2017).

  43. Fick, S. E. & Hijmans, R. J. WorldClim 2: new 1-km spatial resolution climate surfaces for global land areas. Int. J. Climatol. 37, 4302–4315 (2017).

    Article 

    Google Scholar
     

  44. Ouellet Dallaire, C., Lehner, B., Sayre, R. & Thieme, M. A multidisciplinary framework to derive global river reach classifications at high spatial resolution. Environ. Res. Lett. 14, 024003 (2019).

    Article 
    ADS 

    Google Scholar
     

  45. Lloyd, C. T. et al. Global spatio-temporally harmonised datasets for producing high-resolution gridded population distribution datasets. Big Earth Data 3, 108–139 (2019).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  46. The World Bank. World Development Indicators. World Bank Open Data Catalog. (Washington, D.C., United States. 2020).

  47. Lehner, B. et al. The Global Dam Watch database of river barrier and reservoir information for large-scale applications. Sci. Data 11, 1069 (2024).

    Article 
    PubMed 
    PubMed Central 

    Google Scholar
     

  48. Ehalt Macedo, H. et al. Distribution and characteristics of wastewater treatment plants within the global river network. Earth Syst. Sci. Data 14, 559–577 (2022).

    Article 
    ADS 

    Google Scholar
     

  49. Venter, O. et al. Global terrestrial Human Footprint maps for 1993 and 2009. Sci. Data 3, 1–10 (2016).

    Article 

    Google Scholar
     

  50. UNEP-WCMC, I. Protected planet: the world database on protected areas. UNEP-WCMC Camb. UK (2025).

  51. Esri Inc. ArcGIS Pro (Version 3.4). (2024).

  52. Yang, X. et al. Mapping flow-obstructing structures on global rivers. Water Resour. Res. 58, e2021WR030386 (2022).

    Article 
    ADS 

    Google Scholar
     

  53. De Felice, M. Hydropower information for power system modelling: the JRC-EFAS-Hydropower dataset. (2020).

  54. Lehner, B. et al. High-resolution mapping of the world’s reservoirs and dams for sustainable river-flow management. Front. Ecol. Environ. 9, 494–502 (2011).

    Article 

    Google Scholar
     

  55. Van Rossum, G. Python programming language. in USENIX annual technical conference. 41, 1–36 (Santa Clara, CA, 2007).

  56. Baston D. exactextractr: Fast Extraction from Raster Datasets using Polygons. R package version 0.10.0, https://github.com/isciences/exactextractr, https://isciences.gitlab.io/exactextractr/ (2024).

  57. Cano-Barbacil, C. et al. Key factors explaining critical swimming speed in freshwater fish: a review and statistical analysis for Iberian species. Sci. Rep. 10, 18947 (2020).

    Article 
    ADS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  58. Soares, N., Gonçalves, J. F., Vasconcelos, R. & Ribeiro, R. P. Combining Multiple Data Sources to Predict IUCN Conservation Status of Reptiles. in Advances in Intelligent Data Analysis XX (eds Bouadi, T., Fromont, E. & Hüllermeier, E.) 302–314 https://doi.org/10.1007/978-3-031-01333-1_24 (Springer International Publishing, Cham, 2022).

  59. Henry, E. G. et al. Modelling the probability of meeting IUCN Red List criteria to support reassessments. Glob. Change Biol. 30, e17119 (2024).

    Article 

    Google Scholar
     

  60. Hothorn, T., Hornik, K., Strobl, C., Zeileis, A. & Hothorn, M. T. Package ‘party’. Package Ref. Man. Party Version 09 16, 37 (2015).


    Google Scholar
     

  61. Liaw, A. & Wiener, M. Classification and regression by randomForest. R News 2, 18–22 (2002).


    Google Scholar
     

  62. Kuhn, M. et al. Package ‘caret’. R J 223, 48 (2020).


    Google Scholar
     

  63. Probst, P. & Boulesteix, A.-L. To tune or not to tune the number of trees in a random forest. J. Mach. Learn. Res. 18, 1–18 (2018).

    MathSciNet 

    Google Scholar
     

  64. Baccianella, S., Esuli, A. & Sebastiani, F. Evaluation measures for ordinal regression. In 2009 Ninth International Conference on Intelligent Systems Design and Applications. 283–287 https://doi.org/10.1109/ISDA.2009.230 (2009).

  65. Sofaer, H. R., Hoeting, J. A. & Jarnevich, C. S. The area under the precision-recall curve as a performance metric for rare binary events. Methods Ecol. Evol. 10, 565–577 (2019).

    Article 

    Google Scholar
     

  66. Greenwell, B. M. pdp: an R package for constructing partial dependence plots. R Journal 9, 421 (2017).


    Google Scholar
     

  67. Boettiger, C., Chamberlain, S., Lang, D. T., Wainwright, P. & Boettiger, M. C. Package ‘rfishbase’. (2023).

  68. Buchhorn, M. et al. Copernicus global land service: land cover 100 m: collection 3: epoch 2019: Globe. https://doi.org/10.5281/zenodo.3939050 (2020).

Download references

Acknowledgements

IA was supported, in part, by the Oregon Agricultural Experiment Station with funding from the Hatch Act capacity funding program, award numbers NI25HFPXXXXXG022 and/or NI25HMFPXXXXG029, from the USDA National Institute of Food and Agriculture. In-kind support was provided by the US Geological Survey, Maine Cooperative Fish and Wildlife Research Unit. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Author information

Author notes

Authors and Affiliations

Authors

Contributions

C.A.M.: conceptualization, methodology, formal analysis, data curation, writing—original draft. J.A.O.: conceptualization, methodology, formal analysis, data curation, writing – original draft, visualization. I.A.: conceptualization, methodology, writing—review & editing, visualization. E.G.B.: conceptualization, methodology, writing—review & editing. S.L.J.: conceptualization, methodology, writing—review & editing. J.D.: conceptualization, methodology, writing—review & editing.

Corresponding author

Correspondence to
Christina A. Murphy.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Rosalia Maglietta and Guohuan Su for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Murphy, C.A., Olivos, J.A., Arismendi, I. et al. Environment, taxonomy, and socioeconomics predict non-imperilment in freshwater fishes.
Nat Commun 17, 1661 (2026). https://doi.org/10.1038/s41467-025-68154-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41467-025-68154-w

 

Search

RECENT PRESS RELEASES

Go to Top