19 September 2012

Outliers and Outbreaks

Outliers: the Stats

  - An outlier[1], in statistical terminology, is an observation that is numerically distant from the rest of the data. In effect, an observation that is towards either extreme end of the spectrum is an outlier. Extreme values show large or small data values that are relative to other data values. Outliers can occur by chance in any distribution, but they are often indicative either of measurement error or that the population has a heavy-tailed distribution. A physical apparatus for taking measurements may have suffered a transient malfunction. There may have been an error in data transmission or transcription. Outliers arise due to changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. In addition, the pathological appearance of outliers of a certain form appears in a variety of datasets, indicating that the causative mechanism for the data might differ at the extreme end.

 - When you look at frequency polygons or histograms[2], the question asked is whether the curve is bell-shaped at the middle, peaked at either end, or is the curve flat. Are the data values spread out, or do they cluster at one segment? The extreme values where there are few entries could be outliers. A better method of data visualization is the bagplot which is an approach to detecting outliers in bivariate data. This type of plot visualizes location, spread, correlation, skewness and the tails of the data without making assumptions about the data being symmetrically distributed.[3]  The Galbraith plot is a graphical method for identifying outliers in a meta-analysis. The standardized effect size is plotted against precision (the reciprocal of the standard error).[5] The arithmetic mean may be affected by outliers thereby giving an inaccurate value. The alpha-trimmed mean, which is less affected by outliers than the arithmetic mean, involves dropping a proportion (alpha) of the observations from both ends of the sample before calculating the mean of the remainder.[4]  

 - Outliers, being the most extreme observations, may include the sample maximum or sample minimum, or both, depending on whether they are extremely high or low. However, the sample maximum and minimum are not always outliers because they may not be unusually far from other observations. Simplistic interpretation of statistics derived from data sets that include outliers may be misleading. The median is a robust statistic, while the arithmetic mean is not, as we read earlier.  Existing methods that are seen for finding outliers in large data-sets can only deal with two dimensions or attributes. Knowledge discovery in databases, commonly referred to as data mining, is generating enormous interest in both the research and software arenas. With the development of better analytic algorithms for statistical exploration (and studying outliers), high-powered computing (HPC), and the progress of graphic ability, we are now better equipped to not only calculate significant values, but also to visualize data that will facilitate our decision making processes. 

 - Nevertheless, one should remember that “the whole is greater than the sum of its parts.” It is highly important to realize, that unless it can be ascertained that the deviation is not significant, it is ill-advised to ignore the presence of outliers. Outliers that cannot be readily explained demand special attention. In this area, we deal with probability and how we can conclude, with a reasonable assurance, hat we are dealing with either a numerical anomaly, or a more serious situation. This brings us then to the other question on hand: what constitutes an epidemic?  Since we are dealing with life, the right analysis and decision will spell the difference between a case-to-case response,  or a nationwide pathogen alert.


 - Outbreak is a term used in epidemiology to describe an occurrence of disease greater than would otherwise be expected at a particular time and place. It may affect a small group of the population in a specific location, or it may impact hundreds of the population, either in one specific location or across states. According to the CDC, an  outbreak   is the occurrence of more cases of disease, injury, or other health condition than expected in a given area or among a specific group of persons during a specific period. Usually, the cases are presumed to have a common cause or to be related to one another in some way. It is a phenomenon that is more localized, and less likely to invoke panic in the population than an epidemic. (Not to be confused with the word endemic, which is the term given to an ailment that is found commonly in a certain location)

 - According to the CDC, an epidemic is the occurrence of more cases of disease, injury, or other health condition than expected in a given area or among a specific group of persons during a particular period. Usually, the cases are presumed to have a common cause or to be related to one another in some way. At this point, there appears to be no difference between an outbreak and an epidemic. In epidemiology, an epidemic occurs when new cases of a certain disease, in a given human population, and during a given period, substantially exceed what is expected based on recent experience. For this reason, it is very important to stratify the conditions we set for differentiating an outbreak from an epidemic. One authority [6] describes an epidemic as an increased unusual widespread infection in the community causing         waves of infection. These spread through communities and affect all people who have no active immunity to that infection.

 - While epidemics due to exogenous pathogens have diminished in developed countries with a good health system, they may still be found in third-world areas where nutrition and other aspects of healthcare are substandard. There have been exceptions however. An example of an epidemic in the last two decades was in  the 1990s  where there  was  a  large diphtheria  epidemic  in  Russia  as  the  result of the collapse of the public health infrastructure, demonstrating that pathogenic  microbes  are  still  in  the  environment  and  can  become epidemic even in technologically advanced countries if we relax our efforts to contain them.

 - A mechanism that may give rise to epidemics, for example,  is the antigenic shift which refers to the emergence of a novel influenza virus in humans, due to direct introduction of an avian strain or to a new strainproduced by recombination and reassortment of two different influenza viruses. Recent influenza A pandemics occurred in 1957 (the H2N2‘Asian Flu’) and 1968 (the H3N2‘Hong Kong Flu’). An  outbreak  of  avian  influenza  from  exposure  to  infected  poultry  in Hong  Kong  in  1997  caused  18  human  deaths.  A genetically different strain of A/H5N1 circulated in domestic birds throughout Asia, causing 387 cases and 245 deaths between 2003 and 2008, raising concerns that a new pandemic might arise.[7]  Another infection that is being tracked is TB. Since 1998, the percent-age of US-born patients with MDR-TB has remained at less than 0.7%. However, the frequency of resistant infections in foreign-born persons increased from 25% (103 of 407) in 1993 to 80% (73 of 91) in 2006.

Current Hantavirus Outbreak

 - As of September 13, the National Park Service (NPS) has announced a total of 9 confirmed cases of hantavirus infection in people who recently visited Yosemite National Park. Officials believe that 8 out of the 9 patients acquired the virus while staying at the Signature Tent Cabins in Curry Village in Yosemite National Park. The 9th person may have acquired it at a location in the park 15 miles from Curry Village. The visitors to Yosemite are residents of: California (7), Pennsylvania (1), and West Virginia (1). Three of the confirmed cases were fatal. The National Park Service issued a notification to all Park visitors. You can view this at the NPS page here.

Decision Making and Response

 - The park is contacting visitors who stayed in the Signature Tent Cabins from mid-June through the end of August, advising them to seek immediate medical attention if they exhibit symptoms of Hantavirus Pulmonary Syndrome (HPS), a rare but serious illness caused by hantavirus. The park is also providing information about HPS risks and symptoms to visitors who stayed at the High Sierra Camps this summer. In addition to closing down the Signature Tent Cabins, the CDC is supporting the NPS response with testing of patient samples for evidence of hantavirus infection, providing guidance on clinical management of HPS and epidemiologic support for the response, and maintaining a Hantavirus Hotline for public inquiries. The park is providing educational materials about hantavirus and HPS to all visitors to the park.

 - While the number of hantavirus cases is very small, the fatality rate is <40%. So it makes it imperative to track the illness, carry out decontamination where possible, test persons who were exposed, and ensure rapid hospitalization of cases even it was just a suspicion. 

 - The question to be asked at this point is: are the victims of this current outbreak actually outliers in the immunocompetent population? Are they any more susceptible to the viral illness than all the rest of the visitors to the Yosemite Park? While we dig deeper into this scenario it is likely that we shall discover a novel mechanism for the outbreak. In the meantime we have to respond and take care of those who have fallen sick.

Yosemite National Park Hantavirus Infection Epi Curves:

"Statistics: The only science that enables different experts using the same figures to draw different conclusions."
Evan Esar, Esar's Comic Dictionary 
American Humorist (1899 - 1995)  

 - Time to wrap this up for now.  Stay  safe…

 - Fernando Yaakov Lalana, M.D

1.        Barnett, V. and Lewis, T.: 1994, Outliers in Statistical Data. John Wiley & Sons., 3rd edition.
2.       Bluman, Allan G.:Elementary Statistics-A Step by Step Approach, 8th Ed., Copyright © 2012 by The McGraw-Hill Companies, Inc.
3.       Everitt, B.S. and Skrondal, A.:The Cambridge Dictionary of Statistics, 4th Ed., Cambridge University Press, © B. S. Everitt and A. Skrondal 2010,; First, Second and Third Editions © Cambridge University Press 1998, 2002, 2006
4.       Fisher, L. D. and Van Belle, G., 1993, Biostatistics, J. Wiley & Sons, New York.
5.       Everitt, B.S.; Medical Statistics from A to Z, 2nd Ed.; Cambridge University Press, © B. Everitt 2006
6.       Kumar, P. and Clark, M., Kumar and Clark’s Clinical Medicine, © 2009, Elsevier Limited. All rights reserved
7.       Cohen, J., Opal,S.M., Powderly,W.G.,Editors,; Infectious Diseases, 3rd Ed., © 2010, Elsevier Limited. All rights reserved.

Helpful Links:
Calculation & Visualization of Outlier

CDC Glossary of Terms

CDC TB Report

Journal of Statistical Software

Yosemite National Park Hantavirus Infection Epi Curves:

Electron Cryo-Tomography of Tula Hantavirus

No comments:

Post a Comment