19 April 2013


Scientific Computing: Algorithm and Analysis - Part One

Over the last decade and some, the life sciences have seen an exponential growth due to, in a large part, to the methods employed in modelling, analysis, and simulation. What was once the domain of the engineering sciences is now likewise the province of bioinformatics and biophysics. And the software used by aeronautics engineers is now applied by bioengineers to study molecular dynamics simulations and their role in exploring the synthesis of large molecules. Physical and chemical processes are understood more clearly today owing to the application of algorithms and computer languages open to practically every biological systems analyst. We would not have an implantable cardiac defibrillator were it not for the biomedical engineers, clinicians and researchers working together as a team. In the past, research was reserved for the university and large institutions which could afford the resources, and were either often gifted with large grants e.g. by the government, a drug company, or by a foundation.

In addition, during the last decade or so, Health IT is making the electronic medical record system available to clinicians and hospitals. This has improved patient management in many cases. The OR (operating room) has seen PCs and mobile devices used by physicians to view x-rays, and even robotics guided surgical equipment is being used, for example, by the OB-GYN to extirpate a uterine mass. We are also witnessing the increasing use of nanotechnology to treat cancer and other difficult cases.

All of the preceding scenarios involve the use of algorithms and applications. Some use low-level languages, while some use high-level ones. All require writing a program whether for an embedded system, such as in robotics, or a program for use in chemometrics which is an interdisciplinary field that combines statistics and chemistry.



ALGORITHMS.


Several books have been written on the description, nature, design, analysis, language specifics, and optimization of algorithms. What is an algorithm? An algorithm is a sequence of well-defined instructions that takes a value or set of values entered as input into a computer (or other programmable device) to obtain a specific result or group of results, as output.[1]  An algorithm then is a “computable” solution to a problem.

An example algorithm is Euclid’s algorithm which calculates the greatest common divisor (GCD) of two natural numbers a and b. The greatest common divisor g is the largest natural number that divides both a and b without leaving a remainder.  The identity bases on gcd(a, b) = gcd(a-b, b). Synonyms for the GCD include the greatest common factor (GCF), the highest common factor (HCF), and the greatest common measure (GCM). Euclid’s algorithm, which computes the GCD of two integers, suffices to calculate the GCD of arbitrarily many integers. To find the GCD of two numbers by this algorithm, repeatedly replace the larger by subtracting the smaller from it until the two numbers are equal. E.g. 132, 168 -> 132, 36 -> 96, 36 -> 60, 36 -> 24, 36 -> 24, 12 -> 12, 12 so the GCD of 132 and 168 is 12. The following block of code is design to express Euclid's algorithm.


For the following C++ code, you could use Code:Blocks available at this site: http://www.codeblocks.org/


C++ code for Euclid’s algorithm would be as follows:



START of CODE


#include <iostream.h>
   // Fundamental idea of Euclid's algorithm (one of the oldest known algorithms)
   // gcd(a,0) = a
   // gcd(a,b) = gcd(b,a%b)
   int gcd (int a, int b)
   {
     int temp;
     while (b != 0)
     {
       temp = a % b;
       a = b;
       b = temp;
     }
     return(a);
   }
   int main ()
   {
     int x, y;
     cout << "Enter two natural numbers: ";
     cin >> x >> y;
     cout << "gcd(" << x << ", " << y << ") = " << gcd(x,y) << endl;
     return(0);
   }

END of CODE



The binary GCD algorithm, also known as Stein's algorithm, is an algorithm that computes the greatest common divisor of two non-negative integers. By replacing divisions and multiplications with shifts, it gains more efficiency by using the binary nature of computers, especially in embedded platforms that have no direct processor support for division.  For large integers, a fast GCD algorithm is Lehmer’s, an improvement on the simpler but slower Euclidean algorithm.


Example Applications.


The problem may be to compute for the interest on a time-deposit. Or, the problem may be to sort all the city’s residents’ names (e.g. in alphabetical order, last name first), addresses, phone numbers, and zip codes, for the purpose of publishing a directory, for print or online access. Either way, the algorithm input values may be numerical, or alphabetical, or both (alphanumeric). All this information may then be stored in databases, for later retrieval.

Another example of this process is the Human Genome Project (HGP) which has been able to determine the sequence of chemical base pairs which make up DNA as well as identifying and mapping the approximately 20,000 of the human genome. The genome was broken into smaller pieces, approximately 150,000 base pairs in length. From the start to completion of this project, it took approximately two decades to study. Scientists have made strides in identifying particular genes that underlie a large number of diseases, both genetic and acquired. This project was officially completed in 2003, but the data continues to provide leads into the origin and nature of certain illnesses. Now, it is evolving into public and private ventures to provide individuals, with tissue samples, with a profile of their DNA and their propensities to a certain a disease or diseases. This then gives clinicians an insight into the possible means of either prevention or treatment of the patient’s condition. 

The same algorithm may be coded differently, depending on the programming language chosen. And conversely, specific languages may be “better” suited for a specific algorithm or purpose.





Languages of Choice.

We’ll just take up a few of these, since already known ones are now applied to biology, and new ones are steadily being developed. The choice of language depends on at least 4 items: 



  1.     Familiarity. Previous exposure or work with a particular programming language, and its derivatives. Consider FORTRAN and its child, BASIC. Then we would have Visual Basic, Visual Basic for Applications (VBA), True Basic, etc.
  2.      Ease of learning and use. Since Theory and Experiment are melding, the ability to express one’s ideas (or hypothesis) depends on how rapidly the scientists can develop a program or module to test it. At this point there has to be a computer programmer, or a team member who is adept at this craft. This is where the algorithms are tested, possibly with computer generated “normal” test data first to make sure the algorithms make sense. Once you have actual subject data (animal model or human patient), you can run the actual findings in the same computer program in place of the test data.
  3.     Time factor.  From protocol design to analysis of results, the ability of the research team to come up with the study publication depends, in no small way, on the effort to learn and apply technology in less time than previously known. The pressure is on to produce a significant outcome. By this I mean, whether the results are positive or negative, it must sensibly add and improve on previously held concepts.
  4.      Cost. The team head and his colleagues must have a firm grasp on the financial needs of the experiment itself. If it’s tight on a budget, then it has to consider using free open source software, or usually, build a program. If there is a free software package that meets its requirements, then you’re good to go. Otherwise, you’ll have to shop around for cost-effective software.


This will be all for now. On the next post, we’ll be looking into different languages which make good candidates for scientific programming.

1.) Adapted  from: Introduction to Algorithms, 3rd Edition; T.H.Cormen, C.E. Leiserson, R.L.Rivest, C. Stein. ©2009 Massachussets Institute of Technology



Happy coding.
Fernando Yaakov Lalana, M.D.


------------------------------------------------------------


Sites you may want to visit:

Applications Websites



QResearch:  


Scientific Computing:


Empower 3  Chromatography:



Databases

Science Direct:

Bioinformatics Factsheet:

Entrez, The Life Sciences Search Engine:
http://www.ncbi.nlm.nih.gov/sites/gquery



Application to try out…

Find the Greatest Common Divisor [Web App]:


VSEncryptor 64-bit [Encryption Software, Windows Free,]:
http://download.cnet.com/VSEncryptor-64-Bit/3000-2092_4-75629878.html?tag=mncol;2




No comments:

Post a Comment