Guide to Metrics

Bibliometrics are quantitative measures of the citation impact of publications. The best bibliometric indicators are those that have a transparent methodology and a clearly defined dataset.

The use of metrics can provide valuable insights into aspects of research in some disciplines if used critically. However, when metrics are used in the wrong context or uncritically, it can be problematic for researchers and research progress. Inaccurate assessment of research might become unethical when metrics take precedence over expert judgement, where the complexities and nuances of research or a researcher's profile cannot be quantified.

Metrics are limited and should be used only as an addition to a thorough expert assessment. Carefully selected metrics can provide supporting evidence in decision making as long as they are utilized in the right context and not in isolation.

Guiding Principles

When we use metrics, we should:

  • Use metrics related to publications (article-based metrics e.g. field weighted citation ratio) rather than the venue of publication (journal-based metrics e.g. Journal Impact Factor™, SJR or SNIP) or the author (e.g. h-index).
  • Be clear and transparent in the metric methodology we use. If a source does not give information about the origins of the dataset (such as e.g. Google Scholar), it isn't seen as reliable.
  • Be explicit about any criteria or metrics being used and make it clear that the content of the paper is more important than where it has been published.
  • Use metrics consistently: don't mix and match the same metric from different products in the same statement.

For example: don't use article metrics from Scopus for one set of researchers and article metrics from Web of Science for another set of researchers.

  • Compare Like with Like: an early career researcher's output profile will not be the same as that of an established professor, so raw citation numbers are not comparable.
  • Consider the value and impact of all research outputs, such as datasets, rather than focussing solely on research publications, and consider a broad range of impact, such as influencing policy.

Guide to Bibliometrics

Bibliometrics can be journal-based, referring to the journal in which a research output is published, or article-level, referring to an individual output or group of outputs.  It is widely recognised that journal-based metrics have a very specific purpose, and should not be used generally for research assessment as they are not a good indicator of the quality of individual research items or their impact. Additionally, Journal metrics do not well reflect new/emerging fields of research.

The boxes below provide further detail, methodologies (where known), and guidance on how and when to use these metrics and more.

Article-based metrics

Scholarly output

The number of publications or research outputs produced.

Raw citation count

Can be sourced from SciVal/Scopus, Web of Science, Dimensions, PubMed, etc. Most publishers' websites will display a citation count, either sourced from a provider such as Scopus or from their own databases.

The number of citations a paper or set of papers has received. Citation-based metrics should not be interpreted as a direct measure of research quality.

A common assumption is that a high amount of citations means that a paper is a good piece of research and/or has had a positive impact. However, this is not always the case for the following reasons:

  • The citations can be negative, positive, neutral or even quite random and unless we examine all the references, we cannot know what their intent is.
  • Citation practices vary across fields, and in particular fields can be strongly influenced by self-citations.
  • Some authors deliberately cite their own or colleagues' articles even if the relevance is rather questionable. It has been shown that men tend to self-cite more than women (Maliniak, Powers, & Walter, 2013); therefore the metric can be artificially inflated and – when used for research assessment – furthers the disadvantage against non-male groups.
  • Certain types of output tend to get more citations than others – for example, reviews are more cited than a traditional research journal article. Thus an individual output of a certain type, receiving citations out of keeping with the pattern of the author's other outputs, can both boost the author's overall citation count significantly and can lead to an impression that their other works have not done 'as well'.
  • Citations are not reviewed and removed in the case where citing articles have been retracted.

All the above leads to skewing of the numbers.

Field Weighted Citation Impact

Sourced from SciVal using Scopus data.

FWCI is a citation (mean) average calculated as the ratio of the citations a paper or set of papers has received and the total citations that would be expected based on the average of the subject field (for documents of a similar age, discipline and type). An FWCI of 1.00 means that the output performs as expected for the global average; an FWCI of 1.44 means it is 44% more cited than expected. The citation window for inclusion in the calculation is 'received in the year in which an item was published and the following three years'.

FWCI is a better metric than a raw citation count or a journal-based metric because it measures the citation impact of the output itself, not the journal in which it is published, and it compares like-with-like (outputs of the same age and type as classed by Scopus). As an average however the FWCI is susceptible to skew from outlying values and can fluctuate often, and as such should be used against large and stable (>300 papers) datasets.

The Field Weighted Citation Impact is not a robust metric when applied to an individual researcher profile.  The nature of mean average citation-based metrics is that there are a few highly cited outputs and many that receive no citations; the dataset has a heavily skewed distribution.

See more about FWCI at the SciVal Support Centre.

Field Citation Ratio 

Sourced from Dimensions using Dimensions data.

FCR is a citation average similar to FWCI, but sources data from the Dimensions database.  It is calculated by dividing the number of citations a paper has received by the average number received by documents published in the same year and in the same Fields of Research (FoR) category.  It is calculated for all publications in the Dimensions database which are at least 2 years old and were published from 2000 onwards.

As with FWCI, the FCR is not a robust metric when applied to an individual researcher profile.  The nature of mean average citation-based metrics is that there are a few highly cited outputs and many that receive no citations; the dataset has a heavily skewed distribution.

Note that the Fields of Research is also a Dimensions categorisation.

Publications in top percentiles of cited publications - field-weighted

Sourced from SciVal using Scopus data.

The number of publications of a selected entity that are highly cited, having reached a particular threshold of citations received (top 1%, 5%, 10% or 25%).

This metric counts and ranks the number of citations for all outputs worldwide covered by the Scopus dataset for its publication year. Percentile boundaries are calculated for each year, meaning an output is compared to the percentile boundaries for its publication year, and can be normalised by field.

Data are more robust as the sample size increases (comparing a unit to one of a similar size is more meaningful than comparing one researcher to another) and are normalised by field. It can be used to distinguish between entities where other metrics such as number of outputs or citations per output are similar.

When using this metric, ensure that you are working with field-weighted data, and with 'percentage of papers' rather than 'total value of papers', especially when benchmarking entities of different sizes.

Relative Citation Ratio

Sourced from Dimensions using Dimensions data for PubMed publications.

RCR is a citation-based measure of scientific influence of a publication. It is calculated as the citations of a paper, normalized to the citations received by NIH-funded publications in the same area of research and year.  The RCR is not available for all outputs as it is calculated only for those which are listed in PubMed.  Caution should therefore be applied to ensure that it's an appropriate metric for your dataset in terms of coverage.

The RCR is calculated for all PubMed publications which are at least 2 years old. Values are centered around 1.0 so that a publication with an RCR of 1.0 has received the same number of citations as would be expected based on the NIH-norm, while a paper with an RCR of 2.0 has received twice as many citations as expected. 

As with FWCI, the RCR is not a robust metric when applied to an individual researcher profile.  The nature of mean average citation-based metrics is that there are a few highly cited outputs and many that receive no citations; the dataset has a heavily skewed distribution.

H-index

Use of the h-index is to be avoided at the University of Liverpool.  It may be found in external material, sourced from Scopus, SciVal, Web of Science or Google Scholar.

H-index is number of publications (n) by a researcher which have received at least that same number (n) of citations. An h-index of 10 indicates a researcher with 10 papers that have each received at least 10 citations; the researcher will reach an h-index of 11 when those papers reach 11 citations each and one other paper also reaches 11 citations.

There are a number of issues with the h-index.

  • It is based on productivity, and therefore it favours those who have been in their career a long time and haven't taken a break. It discriminates against early career researchers, women and those with caring responsibilities, part-time researchers or those who have taken a career break.
  • It can be manipulated by self-citations. It has been shown that men tend to self-cite more than women (Malimiak, Powers & Walter, 2013); therefore the metric can be artificially inflated and - when used for research assessment - furthers the disadvantage against non-male groups.
  • It does not account for disciplinary differences.
  • It favours senior researchers who get their name by default on articles published by their junior colleagues.

For an informative infographic on the key issues with the h-index, please see: https://www.leidenmadtrics.nl/articles/halt-the-h-index.

Journal-based metrics

Journal-based metrics should not generally be used as they refer to the venue of publication and are not good indicators of the quality of the research or its impact. Additionally, Journal metrics do not well reflect new/emerging fields of research. As DORA signatories, we have committed to avoiding the use of journal-based metrics.

JIF = Journal Impact Factor

Only available from Clarivate Analytics.

The Journal Impact Factor is calculated by taking the number of citations for that journal in the last two years and dividing it by the number of citable items in said journal.

It is a commercial product owned by Clarivate Analytics. The data are sourced from journals indexed by the Web of Science citation indices (Science Citation Index, Social Science Citation Index, Arts and Humanities Citation Index), and only article and review output types are included. JIF was invented by Eugene Garfield, a librarian and bibliometrician from the US, in order to advise libraries on which journals to subscribe to. In the absence of more suitable indicators, it also became a proxy for assessing the quality of research. However, this approach is very problematic and has been challenged in the last decade by the research community, which is calling for change.

The major flaws of using JIF include:

  • Citation distributions within journals are highly skewed. The average number of citations an article in one journal might get can be a very different number to the typical number of citations an article in an another journal might get.
  • JIF is a simple average which is highly susceptible to outliers. By publishing a few papers that attract a large number of citations, the results can be skewed by pushing up the mean.
  • JIF is Field-specific. The amount of citations received in different fields vary significantly. For example, Physics papers often refer to particular scientific methods or theories that have been used or considered and thus include a lot of citations. Whereas in History, researchers tend to describe the theory or method in more detail rather than cite others and therefore in general have fewer citations. This becomes very problematic when comparing institutions with varying research units across disciplines.
  • JIF can be manipulated by editorial policy, self-citations and citation 'cartels'. For example, editors can boost the citations by:
    • asking authors to cite articles from the same journal
    • deliberately publishing more citable paper types such as review article
    • reducing the number of "citable items" in the JIF calculation
  • Data used to calculate JIF are not transparent nor openly available.
  • Using the JIF as a measure to identify journals in which to publish, or to assess the quality of outputs published in a high-JIF journal, biases in favour of research in 'big name' journals, when a targeted publication in a smaller journal may have a more specific reach and greater impact on its field. Novel low-profile, high-impact research often becomes influential outside of the measurement window of the JIF calculation.
JCI = Journal Citation Indicator

Only available from Clarivate Analytics, sourced from Web of Science.

The Journal Citation Indicator (JCI) is a new metric, launched in 2021 - the first set of JCIs uses 2020 data. JCI normalises to take into account citation differences related to discipline, document type (articles, reviews, etc.) and age of publication.  This enables some degree of comparison across different disciplines, so can help in interdisciplinary areas.  A value of 1.0 represents world average, with values higher than 1.0 denoting higher-than-average citation impact (2.0 being twice the average) and lower than 1.0 indicating less than average.

Despite the normalising methodology improving on the JIF, JCI is still a journal-based mean average metric, and suffers from many of the the same flaws as JIF.

CiteScore

Only available from Elsevier.

CiteScore is a metric provided by Elsevier. The data are sourced from journals indexed by the Scopus citation indices, and cover all output types.  It thus covers a wider range of output types than the Journal Impact Factor. It is calculated the same way as JIF with the exception of taking into account the previous three years instead of two.

Despite these differences, CiteScore is still a journal-based mean metric, and suffers from the same flaws as JIF. As with JIF, the CiteScore is nothing more than the mean average number of citations to articles in a journal, and is thus highly susceptible to outliers.

SNIP = Source normalised impact per paper

Owned by CWTS and sourced from SciVal based on Scopus data.

A journal's SNIP is its CiteScore divided by the citation potential. Citation potential is a reflection of how likely a paper is to be cited in the given field. The data are sourced from journals indexed by the Scopus citation indices, and cover conference papers and reviews.

SNIP comes with a 'stability interval' which reflects the reliability of the data indicated - the wider the stability interval, the less reliable the indicator.

SNIP citations are normalised to correct for differences between scientific fields, thereby allowing for more accurate between-field comparisons of citation impact.  Thus it removes one of the main flaws of JIF and CiteScore - however, it is still a journal-based metrics with all the remaining limitations.

SJR = Scimago Journal Rank

Owned by SCImago Institutions Rankings

SJR is a rating of the total importance of a scientific journal within a selected network of journals, similar to Google page ranking (and other network-based citation metrics such as the Eigenfactor). Citations from highly ranked journals are weighted to make a larger contribution than those from poorly ranked journals - Tthe subject field, quality and reputation of the journal directly affect the value of a citation. The SJR dataset covers articles, conference papers and reviews and is based on citations made in a given year to papers published in the prior three years, including self-citation. The calculation includes a complex algorithm for the determination of eigenvector centrality, which defines the journal's percentage of the total influence in the network. For a standard user, the computational calculation is difficult to verify.

SJR is a prestige metric similar to Eigenfactor calculated by a similar computational algorithm. The main difference is that SJR does not eliminate self-citations and uses 3-year window of citations.

For further reading on network-based citation metrics, please see: https://scholarlykitchen.sspnet.org/2015/07/28/network-based-citation-metrics-eigenfactor-vs-sjr/.

Eigenfactor and Article Influence Score

Available from Clarivate Analytics, sourced from Web of Science.

Eigenfactor is a rating of the total importance of a scientific journal within a selected network of journals, similar to Google page ranking (and other network-based citation metrics such as the SJR). Citations from highly ranked journals are weighted to make a larger contribution than those from poorly ranked journals - the subject field, quality and reputation of the journal directly affect the value of a citation. The Eigenfactor is based on citations made in a given year to papers published in the prior five years and eliminates self-citation. The calculation includes a complex algorithm for the determination of eigenvector centrality, which defines the journal's percentage of the total influence in the network. For a standard user, the computational calculation is difficult to verify.

When the Eigenfactor is adjusted for the number of papers published in each journal, it is called the Article Influence Score.

For further reading on network-based citation metrics, please see: https://scholarlykitchen.sspnet.org/2015/07/28/network-based-citation-metrics-eigenfactor-vs-sjr/.

Altmetrics

Alternative metrics ("altmetrics") capture online attention surrounding academic content e.g. Twitter, Facebook and Social Media activity; mentions in Policy documents and registered Patents; Media coverage etc.  

Complementary to traditional citation-based metrics, they offer useful information about impact outside of scholarly publishing and can serve as early indicators of possible intentions to cite publications. They are a useful tool that can help create success narrative around research impact.

As with most metrics, this one is open to being artificially influenced. Altmetric Explorer will discard where someone has repeatedly tweeted about research for example, but may not be sophisticated enough to detect where multiple accounts have tweeted a DOI just to increase an Altmetric score. To mitigate this, social media mentions - which can be more easily influenced – have a lower weighting in the overall Altmetric score than formally-recognised news and media mentions.

The University has a subscription to altmetric.com and the Open Research Team provide training on the use of the Altmetric Portal as part of the Researcher KnowHow.

Can be sourced from altmetric.com's Explorer for Institutions. These metrics are also displayed in Liverpool Elements and the Institutional Repository for publications with a DOI or ISBN, and can be toggled to appear on your web profile.

Metrics providers

The most commonly used metric providers are:

  • Web of Science owned by Clarivate Analytics (InCites is the tool used to analyse these metrics)
  • Scopus owned by Elsevier (SciVal is the tool used to analyse these metrics)
  • Dimensions owned by Digital Science
  • Google Scholar

The source of metrics should never be mixed and matched for comparison purposes, e.g. do not compare a citation count taken from Web of Science with a citation count taken from Scopus. The data source differs, and therefore the final figure may also differ!