Data collection from open-access databases
A total of 2 920 AIV hemagglutinin (HA) gene sequences were collected until 04 February 2021 using the keywords: “environment”, “water”, “sewage”, “wastewater”, “surface_water”, and “drinking_water” from the NCBI Influenza Virus Database (https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi), the GISAID EpiFlu™ Influenza database (https://www.gisaid.org/), the Influenza Research Database (IRD,https://www.fludb.org/) and the OpenFlu database (http://openflu.vital-it.ch/). All non-water sample sequences detected in animal environments and incomplete subtype data were removed. The resulting dataset was de-replicated to remove identical records based on the Accession Numbers (Table S1). The remaining 234 HA gene sequences compose the final dataset used for the data analysis.