Data collection from open-access databases
A total of 2 920 AIV hemagglutinin
(HA) gene sequences were collected until 04 February 2021 using the
keywords: “environment”, “water”, “sewage”, “wastewater”,
“surface_water”, and “drinking_water” from the NCBI Influenza
Virus Database
(https://www.ncbi.nlm.nih.gov/genomes/FLU/Database/nph-select.cgi),
the GISAID EpiFlu™ Influenza database
(https://www.gisaid.org/), the
Influenza Research Database (IRD,https://www.fludb.org/) and the
OpenFlu database
(http://openflu.vital-it.ch/).
All non-water sample sequences detected in animal environments and
incomplete subtype data were removed. The resulting dataset was
de-replicated to remove identical records based on the Accession Numbers
(Table S1). The remaining 234 HA gene sequences compose the final
dataset used for the data analysis.