Fugang Ren1 , Jing Huang2 & Yongqing Yang3*
1College of Medical Technology, Chengdu University of Traditional Chinese Medicine, Chengdu 611137, China
2Department of Vocal Performance, Sichuan Conservatory of Music, Chengdu 610021, China
3Department of Vocal Performance, Sichuan Conservatory of Music, Chengdu 610021, China
*Correspondence to: Dr. Yongqing Yang, College of Life Science, Chongqing Normal University, Chongqing 401331, China.
Copyright © 2022 Dr. Yongqing Yang, et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Identification of species by short orthologous DNA sequences was known as DNA barcodes. In this study, identification and analysis were performed to get the properties of the top–cited articles in the field of DNA barcodes based on the Science Citation Index Expanded. The parameters include journals, publication years, Web of Science categories, authors, institutions, countries, and citation life cycle curve. Y-index is applied to evaluate the publication properties of authors of topcited articles. As for the citations, citations in 2018, total citations, and citations in the publication year were concerned to analyze the top-cited articles in DNA barcodes. The top three journals were Molecular Ecology Resources, Proceedings of the National Academy of Sciences of the United States of America, and Plos One. The three most active authors in DNA barcode research were P.D.N. Hebert, M. Hajibabaei, and M.A. Smith. University of Guelph of Canada was the most productive institution in DNA barcode research. The mainstream countries have controlled this research field with several publications. Some keywords such as “barcodes”, “identification”, “taxonomy”, “evolution”, “diversity”, and “mitochondrial” appeared on the top keywords in all three types of keywords analysis: Words in titles, author keywords, and KeyWords Plus.
DNA barcode is defined as one or few relatively short DNA sequences taken from a standardized portion of the genome. The term DNA barcode was first applied in 1993 by Arnot et al, , wherein the circumsporozoite gene was employed as a barcode to distinguish isolates of Plasmodium falciparum. In 2003, Hebert et al first proposed ‘‘barcoding DNA’’ as a way to identify species, and they established the mitochondrial gene cytochrome c oxidase I (COI) as the core of a global bio-identification system for animals . Since then, DNA barcoding approach has been successful used for identification of algae [3,4], fungi [5,6], land plants [7-9] and many animal groups, such as Nematodes , insect , spider , fish [13,14], birds  and mammals . Nowadays, DNA barcode is considered the best tool for species taxonomy and identification. In particular, due to its universality and versatility, DNA barcode is becoming an increasingly powerful tool for other branches of biology such as conservation biology, ecology, medicine, pharmaceuticals, food science and systems biology [17,18]. Research on DNA barcode has achieved significant progress and lots of research articles have been published by authors from all over the world in a large number of journals during the recent decades. More than three hundred of reviews related to DNA barcode have been published in order to provide the readers with different kinds of information. However, the vast majority of these research focuses on just one subject, for instance, DNA barcode application on species identification [19-25], on conservation biology [26,27], on food science , on ecology  and on pharmacy [25,29], but with few research papers considering the whole field by using the bibliometrics methods. Bibliometrics studies are important approaches to make comprehensive evaluation of the development of regarding research area and have been widely conducted in many natural and social sciences [30-35]. In this study, we focus only on the bibliometric analysis of DNA barcodes based on highly cited articles. Highly cited articles generally represent the most significant developments in a specific field, thus providing a substantial and valuable insight into which authors, articles, and topics are motivating the research field over time . Previous bibliometrics researchers revealed the characteristics of highly cited articles on specific topics, such as ecosystem services , biomass , and electrocardiogram . However, to our knowledge, no study has attempted to analyze the highly cited articles in the field of DNA barcode research.
Therefore, the objective of this paper is to make a comprehensive study of highly cited papers in the web of science (WOS) database in the DNA barcode field with respect to the most influential scientists, institutions, and countries. Based on these results, we can reveal the profile of scientific advancement and give a historic perspective on scientific progress, thus help scientific researchers better understand the high impact research in the DNA barcode field.
Initially, we searched a total of 9,711 documents comprising 13 document types, including article, review, meeting abstract, editorial material, proceedings paper, letter, correction, early access, news item, book chapter, data paper, reprint, and retracted publication. But only the document type “article” was accepted for this study.
Two additional filters, total citations since their publication to September 24 of 2019, TC2019, and the “front page” , were employed to select the top-cited articles. The articles selected by TC2019 ≥ 100 were regarded as top-cited articles. The total number of times an article was cited from its publication until the September of 2019 was marked as TC2019.
The other filter, the “front page” , was used to distinguish articles with the indicated keywords (DNA barcod*) on their “front page” only, including the article title, abstract, and keyword parts. Articles found only by KeyWords Plus were removed. At last 217 articles (2.48% of the 8,759 total articles) were screened as top-cited articles. These records were downloaded into spreadsheet software and additional coding was manually applied by Microsoft Excel 2016 .
The items of country downloaded from WoS need to be organized before analysis. Articles originating from England, Scotland, Northern Ireland, and Wales belong to the United Kingdom (UK) .
VOSviewer software (version 1.6.11, www.vosviewer.com/) was employed to map Co-citation of authors and journals references, as well as co-occurrence of authors’ keywords. Co-citation means that two authors/ journals /references are cited together in a citing reference. In these maps, each node represents one entity (author/ journal or keyword). Larger node sizes imply that the entity is an important one within the knowledge domain. Nodes share the same color belonged to one cluster. The links between nodes represented the co-citation or co-occurrence relationships, a shorter distance and/or a thicker link between nodes reveals a stronger relation, which is also numerically represented as relative link strength. The total link strength attribute indicates the total strength of the co-citation/co-occurrence links of a given item with other items.
Results and Discussion
A total of 217 articles are selected as top-cited articles (articles obtained greater than or equal to 100 citations from their time of publication to September of 2019) which were published between 2000 and 2016. Figure 1 illustrates the number of top-cited articles and citations per publication by year which illustrated that most of the top-cited articles were published between 2005 and 2012. Most top-cited articles were published in 2009. There were no top-cited articles in the years 2001, 2002, 2017, 2018, and 2019.
In all, 1557 authorships contributed to the 217 top-cited articles and the average number of authors in every top-cited article were 7.2 Small groups of authors appear in highly cited articles, for example 1 in Environmental Sciences and 3 in Biodiversity Conservation A large group of authors were also found in highly cited articles in Mycology with 27.8. Totally 11,481 references were cited by the 217 top-cited articles. An average of 13.2 pages in every top-cited article were applied to publish the research findings. Furthermore, there was a wide variation in pages from 2 to 379 and in references from 5 to 547 in every article.articles. An average of 13.2 pages in every top-cited article were applied to publish the research findings. Furthermore, there was a wide variation in pages from 2 to 379 and in references from 5 to 547 in every article.
These top-cited articles were published in 73 journals. Only one top-cited article was published per journal by 32(63%) Journals and two articles were published per journal by 11 (15%) journals and 120 (55.3%) topcited articles were published by nine core journals (Table 1). The impact factor (IF2019) of the top nine journals exists between 2.776 and 10.266.
TP: total number of articles; IF2019: impact factor in 2019
The Journal Molecular Ecology Resources (IF2019=7.049) published the most top-cited articles with 28 articles (12.9% of 217 articles), followed by Proceedings of The National Academy of Sciences of The United States of America (IF2019=9.580) with 21 articles (9.7%), Plos One (IF2019=2.776) with 19 articles (8.8%), Philosophical Transactions of The Royal Society B-Biological Sciences (IF2019=6.139) 12 articles(5.5%), Molecular Ecology Notes (IF2019=not available) with 9 articles(4.1%), Systematic Biology (IF2019=10.266) with 9 articles(4.1%), Molecular Ecology (IF2019=5.855) with 8 articles (3.7%), Proceedings of The Royal Society B-Biological Sciences (IF2019=4.304) and New Phytologist (IF2019=7.299) with seven articles (3.2%) respectively.
However, top-cited articles were also published in journals with low-impact factors, for example, Zootaxa (IF2019=0.99), Journal of Crustacean Biology (IF2019=1.069), and Journal of Aoac International (IF2019 =1.201). Of course, three high-impact factor journals also published the top-cited articles, such as Science (IF2019 =41.037) with one article, Cell (IF2019 =36.216) with one article, and Nature Biotechnology (IF2013 =31.864) with one article.
The top-cited articles were published in the 30 Web of Science subject categories in the science edition. Among these, one-fifth of the categories published each one article and half categories published the topcited articles between 2 and 6. The top 9 (30%) categories published the top-cited articles between 7 and 84. The five top categories contain Evolutionary Biology (84 articles; 38.7% of 217 articles), Ecology (64; 29.5%), Biochemistry & Molecular Biology (62; 28.6%), Multidisciplinary Sciences (42; 19.4%), and Biology (34; 15.7%).
These five categories published a majority of the total top-cited articles (159 articles; 73.2% of 217 articles). One thing should be noticed that some journals may be sorted into two or more categories in WoS, for instance, Molecular Ecology exists in both categories of “Biochemistry & Molecular Biology”, “Ecology” and “Evolutionary Biology”. It illustrates the multidisciplinary character of this research field.
VOSviewer uses cluster algorithms based on the strengths of the connections among items to analyze the network. In these maps, the total link strength value denotes the importance of an item in the field since a higher value means that it has been linked with others many times. A map of the journal cocitation demonstrates that the journals are divided into four clusters (Figure 2.) The blue cluster contains Proceedings of the National Academy of Sciences of the United States of America, Plos One, American Journal of Botany, Bioinformatics, BMC Evolutionary Biology, Molecular Ecology Resources, Taxon, representing journals in evolution and taxa. The green cluster contains Applied and Environmental Microbiology, Canadian Journal of Botany, Journal of Phycology, Mycological Research, Mycologia, Studies in Mycology, New Phytologist, and Nucleic Acids Research, representing journals in algae and fungi. The yellow cluster contains Nature and Science representing journals of high impact factor and Multidisciplinary Sciences. The red cluster contains half of the total journals, including Annual Review of Ecology Evolution and Systematics, Biological Journal of the Linnean Society, Canadian Journal of Zoology, Cladistics, Conservation Biology, Evolution, Journal of Molecular Evolution, Molecular Biology and Evolution, Molecular Ecology, Molecular Ecology Notes, Molecular Phylogenetics and Evolution, Proceedings of The Royal Society B-Biological Sciences, Philosophical Transactions of the Royal Society B-Biological Sciences, Philosophical Transactions of the Royal Society of London Series B-Biological Sciences, Plose Biology, Systems Biology and Trends in Ecology & Evolution, represents journals in evolution and ecology.
Obviously, among these journals, Proceedings of the National Academy of Sciences of the United States of America had the highest total link strength of 434.15, and the highest citations of 484 indicating that this journal was co-cited with most other journals, followed by Systems Biology (total link strength of 340.16), and Molecular Ecology (total link strength of 330.48).
Among the 1105 authors responsible for 217 top-cited articles, (176 first authors) 929 authors (84.1% of 1105 authors) had no first author articles, (164 corresponding authors) 941 (85.2%) authors had no corresponding author articles and only 130 (11.8%) authors had both first author articles and corresponding author articles. Table 2 illustrates the total top-cited articles (TP), first-author top-cited articles (FP), corresponding-author top-cited articles (RP); and single-author top-cited articles (SP) of leading authors with at least six top-cited articles.
TP: total top - cited articles; FP: first - author top - cited articles; RP: corresponding - author top - cited articles; SP: single - author top - cited articles; N/A: not available
The top three authors of top-cited articles are P.D.N. Hebert from the University of Guelph in Canada with 43 articles, M. Hajibabaei from the University of Guelph in Canada with 15 articles, and D.H. Janzen from the University of Pennsylvania in the USA with 11 articles. G.W. Saunders and R.D. Ward published single-author articles with two and one respectively.
The definition of the first and corresponding author provides a powerful method to take the multiple authorships into account . The first author conducts the research work and writes the research paper so that this author makes the most contribution to the paper [43,44]. The corresponding author supervises the planning and executes the research work and write the research paper [45,46]. The corresponding author contributes to the study by increasing the author’s credit . The country or institution of the corresponding author is usually the origin of the selected field study .
The recently developed Y-index  has been applied in many studies to assess the performance of highly cited authors, such as the highly cited articles in the research field of volatile organic compounds . The Y-index (j, h) was employed to assess authors of the top-cited articles . This index is concerned to the numbers of first-author publications (FP) and corresponding-author publications (RP), the formulae are as follows [50,51]:
An author with a higher j means more first- or corresponding-author articles, contributing to leadership. j illustrates that one author published more articles as the important author . h is a publication property constant that describes the proportion of FP to RP .
Values of h > 0.7854 mean more corresponding-author articles, and values of h < 0.7854 mean more first-author articles. When h=0.7854, it illustrates the author has the same number of first-author articles and corresponding-author articles. If h=0, j=the number of first-author articles, and if h=π/2, j=the number of corresponding-author articles .
Figure 3 indicates the distribution of the top 20 authors (j ≥ 4) with a Y-index from the top-cited 217 articles.
These 20 authors were deemed to be the main contributors to the top-cited articles in DNA barcode research. Their contribution included conception and design, analysis and interpretation of data and the drafting or reviewing of the article [52,53].
In Figure 3, each dot stands for one value that could be a single author or several authors when they had the same publication intensity and properties . The three most productive authors in DNA barcode research were P.D.N. Hebert (j=13), M. Hajibabaei (j=11), and M.A. Smith (j=10). Publication property h indicates the different proportion of first author articles to corresponding author articles. It is useful especially if j of authors is similar to recognize the different contributions of authors. Among these 20 authors, six authors: P.D.N. Hebert (h=1.0122), M. Hajibabaei (h=0.8761), R.D. Ward (h=0.9273), P.M. Hollingsworth (h=0.9828), D.H. Janzen (h=0.9828) and S. Chen (h=1. 2490) had more corresponding author articles than the first author articles. No authors had more first author articles than corresponding author articles. Fourteen authors exist on the boundary line because of the same numbers of first author articles and corresponding author articles with h=0.7854. Although the fourteen authors had the same h values of 0.7854, the higher values of j indicate that M.A. Smith (j=10) and S.G. Newmaster (j=8) had higher publication intensity.
In addition, the results of Figure 4 show that P.D.N. Hebert is the most co-cited author with the highest total link strength (333.52). The second and third most co-cited authors were M. Hajibabaei and R.D. Ward, with total link strength of 115.47 and 88.43, respectively. These authors could be considered the core authors of this network beacuse they developed research co-cited with nearly all other top 31 authors.
Of the 217 top-cited articles, only one article had no corresponding author affiliation information and three articles had no affiliation information on the Web of Science. In the 214 top-cited articles with author affiliations, 31 (14.5% of 214 articles) were single institution articles and 183 (85.5%) were inter institutionally collaborative articles.
A small proportion of institutions published a high proportion of top-cited articles which was similar to the research field of volatile organic compounds . Table 3 indicates the properties of the top 11 most productive institutions by six indicators: total number of top-cited articles (TP), single institution top-cited articles (IP), inter-institutionally collaborative top-cited articles (CP), first author top-cited articles (FP), corresponding author top-cited articles (RP), and single author top-cited articles (SP).
TP: total top – cited articles; IP: single - institution top - cited articles; CP: inter - institutionally collaborative top - cited articles; FP: first - author top - cited articles; RP: corresponding - author top - cited articles; SP: single - author top - cited articles; R: rank; N/A: not available
University of Guelph of Canada (63 articles) was ranked first on the list, Smithsonian Institution of the USA (14 articles) and Natural History Museum of the UK (13 articles) ranked second and third. Of the top 11 most productive institutions, four were situated in the USA followed by Canada (3 institutions) and the UK (3 institutions), and France (1 institution). University of Guelph of Canada was ranked No.1 in all the indicators except for single-author top-cited articles. University of New Brunswick of Canada was ranked 1st in SP and 2nd in IP, FP, and RP, and 15th in CP, which implied that the scientists in this organization were willing to contribute top-cited articles without collaboration. Smithsonian Institution of the USA and Natural History Museum of the UK were ranked 2nd and 3rd in CP, however, these two institutions were not available in IP, which suggested that the scientists from these institutions published articles in the way of inter-institutional collaboration.
The 214 top-cited articles containing author affiliations were published by 55 countries or regions, of which 102 (48% of 214 articles) were single country articles and 112 (52%) were internationally cooperative articles. The countries were ranked based on the number of total top articles published as indicated in Table 4. The Rank comprised the total number of articles (TP), single country articles (IP), internationally collaborative articles (CP), first author articles (FP), corresponding author’s articles (RP), and single author articles (SP).
TP: total top – cited articles; IP: single – country top – cited articles; CP: internationally collaborative top –cited articles; FP: first – author top – cited articles; RP: corresponding – author top – cited articles; SP: single– author top – cited articles; R: rank; N/A: not available.
Canada published the most number of articles, with 88 articles, occupying 41%, followed closely by the USA with 80 articles, the UK with 39 articles, Germany with 27 articles, and France with 25 articles. Control by core countries in the publication is not surprising, because this phenomenon has taken place in many research fields such as membrane science , cancer research , and ammonia oxidation research .
As far as internationally cooperative articles were concerned, 26% were published by Canada; and in terms of non-cooperative independent articles, 15% were contributed by Canada. Canada was ranked top in all fields, which illustrated Canada is the most productive country in independent and cooperative research top-cited articles. The USA and the UK were ranked second and third in all 6 indicators. Germany and France were ranked fourth and fifth in most indicators respectively. However, France had no single author top-cited articles.
Total citations of articles were extensively used in most studies . Three kinds of citations: C2018, TC2019, and C0 were applied to evaluate the top-cited articles in this DNA barcode research .
The articles with the highest TC2019 were deemed as the most popular articles in this research field. The total number of times (TC2019) meant an article was cited from its time of publication to the September of 2019 .
The citation lives of the top seven articles (TC2019 >1500) are illustrated in Figure 5. Six of the seven articles were published before 2007.
Six of the top seven articles in TC2019 also appeared in the top seven in C2018 such as Hebert et al. , Ratnasingham et al. , Schoch et al.  and Ward et al. . Of the seven articles, the top article “Biological identifications through DNA barcodes” by Hebert et al. published in 2003 illustrated a continual and sharp increase in citations in all years since its time of publication (Figure 5). This article is also topranked one in C2018 =674.
Since TC2019 is an accumulative number, it would become a large value as time flew. We needed to study the citations of an article every single year (2018) to explain the research highlight transfer in recent years. The citation life curves of the top eight articles (C2018 >160) are illustrated in Figure 6. All these articles were published after 2003. The No.1 article in the C2018 is still the paper “Biological identifications through DNA barcodes” by Hebert et al. published in 2003, which demonstrate the COI identification system would supply a reliable, cost-effective, and accessible solution to the current problem of species identification and molecular evolution .
Articles with higher citations in the publication year (C0) were applied to characterize the top–cited articles in recent years . Figure 7 indicated the citation life of the top six most cited articles in their publication year (C0 ≥15). Among them, three articles were published in 2012, one article was published in 2006, 2010, and 2015 respectively.
The No. 1 Top-cited article in the C0 list was “Nuclear ribosomal internal transcribed spacer (ITS) region as a universal DNA barcode marker for Fungi” by Schoch et al. , which was also ranked fourth top-cited article in TC2019. Schoch et al. suggested the internal transcribed spacer (ITS) region as the standard barcode for fungi identification. All the articles had an increasing trend of citations after their publications then showed decreasing trends in the following years.
In the past decade, distribution of words in article titles, author keywords, and KeyWords Plus were employ to assess trends of research fields [40,61].
The words in titles and author keywords provided useful information about the article subjects and contained the main message the author would describe to their readers . Only 161 top-cited articles comprised author keyword information on the Web of Science. Author keywords appearing more than six times in top-cited articles were ranked in Table 5. The most frequently author keywords were “DNA barcoding”, “COI” and “species identification”. We also investigated all the single words in the titles of top-cited articles on DNA barcode research. Some prepositions for example “of ” and “with” and articles such as “A” and “the” were removed, as they meant nothing for the next study . The most employed single words in titles were “DNA” and “barcoding” which existed in 148 articles followed by “species” (63), “identification” (41), “plant” (26), “fungi” (17), “biodiversity” (17), “diversity” (16) and “mitochondrial” (16). Some words and abbreviation such as ITS, taxonomy, evaluation, sequencing, phylogenetic, cytochrome, oxidase and evolution were frequently applied in the titles, which indicated that the research fields focus on identification and assessment. Among 217 top-cited articles, 10 articles had no KeyWords Plus information in Web of Science. Analysis results indicated that the keywords “identification”, “diversity”, “taxonomy”, “sequences”, “evolution”, “mitochondrial” and “barcodes” appeared highly frequently. Distribution of words illustrated that some keywords existed among the top ranks in all the forms of word analysis such as “barcodes”, “identification”, “taxonomy”, “evolution”, “diversity”, and “mitochondrial”.
TP: total number of articles
Of all the author keywords shown in the analysis, the term “DNA barcoding”, is the biggest node and it appears in nearly half of the papers in our database, other high-frequency keywords such as “COI”, “species identification”, and “biodiversity”, co-occurred 49, 23, and 18 times respectively.
Keyword co-occurrence analysis demonstrated that the keywords Bair “DNA barcoding” and “COI” occurred together more frequently (Link Strength = 16.53) among other keywords. Followed by keywords Bair “DNA barcoding” and “biodiversity” (10.25), “COI” and “species identification” (7.28). Since COI was applied in phylogenetic studies of animals. These results implied that DNA barcodes are most used in research on animal species identification and/or biodiversity .
Altogether, 217 top-cited articles were published between 2000 and 2016 in the field of DNA barcodes from the SCI-EXPANDED database. Most of the top-cited articles were published between 2005 and 2013 while no publications in the years 2001 and 2002. A short author list with one to seven authors was found in the top-cited articles.
Most articles were published in six main journals: Molecular Ecology Resources, Proceedings of the National Academy of Sciences of The United States of America, Plos One, Philosophical Transactions of the Royal Society B-Biological Sciences, Molecular Ecology Notes, and Systematic Biology. Most articles focused on five WOS categories: Evolutionary Biology, Ecology, Biochemistry & Molecular Biology, Multidisciplinary Sciences, and Biology.
According to the distribution of Y-index results, P.D.N. Hebert possessed the most publication potential. M. Hajibabaei and M.A. Smith published the same account article in the first authored and corresponding authored articles. University of Guelph of Canada was the most productive on top-cited articles in the DNA barcode fields. The USA published most of the top articles followed closely by Canada.
Citation life cycles (an article’s publication year, in the most recent recorded year, and total citations from publication to September of 2019) indicated different rankings of top-cited articles. Results of keywords analysis illustrated that this research field was related to taxonomy and evolution.
Highly cited articles is a good indicator to characterize the high impact research, and generally represent the most significant developments in a specific field, therefore, our analysis and results revealed the profile of scientific advancement and give a historic perspective on scientific progress in the DNA barcode field.
This work was supported by the Key project of Sichuan Provincial Human Resources and Social Security Department (00809501).
Conflict of Interest
The authors report no conflict of interest
Fugang Ren and Yongqing Yang conceived the research, prepared the tables and figures and analyzed the results, Jing Huang retrieved the data. All authors participated in writing the manuscript.