05 July 2013

Scientific Programming:Of Processors and Software

Hello everyone. I was hoping to post this some weeks ago, but, pardonnez-moi, I had a lot on my hands: Visual problems, plus moving residence (for the nth time), to say the least. Sorting, packing: I think you know that last one. At any rate, here in my new abode, I had to do some unpacking and getting stuff in order. Whew! In addition, I had to undergo laser treatment for my right eye. My vision is quite a lot better. I still have to manage my scientific bearings though. Not to mention my app development project, which I hope to monetize. So here we go with our topic for this week or so: Scientific Programming.

Processor Challenges

The evolution of microprocessor technology is upon us. Financial straits, partly due to the decrease in sales of PCs, and power constraints are driving companies to maximize IC space related to better energy efficiency. Maybe the consumer production ICs for PCs are following a downward trend. However, big machines plus big data are enabling big-time computer setups and startups. The mobile environment is changing the computing landscape. But I still have to see a tablet that can process petabytes of data using 16-core computing power, in addition to linking all the servers that big search giants can enable. The mobile device, to me, is at this point, a window to the sky. Big machines are here to stay. Meanwhile, we still have to build bigger and better programs in the name of discovery of better treatments and the improvement of health care. Even now, not all Medical Centers or clinics can afford electronic health records, let alone the qualification for big brother’s reimbursements. We’ll leave politics to the experts. Our concern is with programs and the development systems for discovery, diagnoses, and streamlined systems. So let’s move on in that direction.

Corpus

A scientific understanding of the physical world has allowed us to interact with a multitude of aspects of life, whether our own, or of our environment. In so doing, we have been able to improve the lives of countless individuals, by adding unique properties, or by correcting deficiencies. This has not come about in one big chunk but rather in incremental steps in the informatics evolution. We are getting better at understanding homeostasis and the elements that disturb it. While we have not yet modelled “feeling good” in the computer laboratory, it has become clear that the various organ systems are involved in a tightly knit web of actions and interactions. As a matter of fact, we can now “imitate” nature in the process known as cloning. At least we had an idea of cloning via the Star Wars saga.

For now though, we will just concern ourselves with the representation of biologic processes and interventions through scientific coding or programming. Curiously, one can ask if there is a difference between computing for science and computing for banking, or even computing for internet media (music anyone?). Well, it is at this juncture that we can see the differences between the structures and functions in each cell-tissue-organ-body system and an online bookstore giant. There is a striking similarity though (analogous, if you will) between an air traffic controller’s job and that of our brain’s. There is also a similarity between the cardiovascular system and a city, with all its inhabitants, buildings and thoroughfares.

Let’s remember that regardless of the computational platform, the essence is to enhance the Scientific Process:

1)      Observe
2)      Question
3)      Hypothesize
4)      Experiment
5)      Conclude
6)      Communicate

Programming software

How well one represents different systems in the body depends on both the skill of the programmer and the programming language used. The evolution of computer languages started with a mathematically stringent code, and as in mathematics, depends a lot on the terse symbolisms used. Anyone familiar with FORTRAN (derived from “Formula Translator”) will know, parenthetical operations are as important as the keywords and syntax. In the 90s, FORTRAN was the predominant programming language for numeric and most scientific applications, the version then being Fortran-77 and supercomputers were primarily used to run large-scale numeric and scientific applications. While FORTRAN is still used for “heavy duty” programming, there is now more competition in the C family of languages, as well as Python (more on that later in this series). BASIC was derived from FORTRAN, but tried to make it easier to code, and, as a result, many people start learning programming via the former. As a matter of fact, many have settled on and even still monetize their BASIC programs, such as in the multitude of add-ons for the Microsoft Office Suite, definitely no small feat in the commerce of computing. Then came Pascal, FORTH, the C family, Java, Python (examples of high-level languages): the list goes on. Of course, there is Assembly Language, a low-level language, which “speaks directly” to the CPU. Many linguistic features are shared by the different languages, however, one may have more benefits than the other for a particular purpose. The latter may depend a lot on the computing job to be built. There are at least 50 programming languages available, and their individual popularities can be surveyed at the Tiobe website here. http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html

Choosing which one to use depends on each one’s familiarity with the language’s structure, syntax, and level of learning and maintenance difficulty. There are proponents for every language, and their takes on each one’s utility varies of course. If one were to work with signal analysis, a language’s ability to interface patient subject and equipment is tantamount in importance. In this respect, there are software packages that take full advantage of the complex waveforms and then perform tasks available in the software modules. These modules are also sometimes called toolboxes.

Among the applications these software packages are capable of performing and/or facilitating are:

1)      Signal processing;
2)      Numerical data analysis;
3)      Modelling;
4)      Simulation;
5)      Computer graphic visualization;
6)      Robotics; and
7)      Computer/Machine vision.

One such software package is MATLAB (matrix laboratory) /Simulink, where a programmer can also write program routines for a specialized purpose. LabVIEW is another such platform. Maple and Mathematica can also handle operations in scientific computing. If one individual, or group, needs its own home-grown procedures, then a good knowledge of programming language/s is essential.

Another factor that is important in scientific programming, is the mode of data transfer from user to software (as well as computing machine) and back. Will it be an online app, or an offline one? Will it be embedded in the measuring unit itself (such as an ICU monitor), or is it an interface between machine and patient/subject. The bottom line, as it turns out, is shortening the path between source of data and the user: The quicker, the better. With the continuous evolution of microprocessors, the need for speed of number crunching and analysis is always at our doorstep. The accumulation of data and the “Big Data” environment will put strain on man and machine. It’s not just a cold machine doing the work when it’s turned on, but also the efficiency of the program, and the professional performing the crunching and analysis

There are four (4) basic characteristics to be understood regarding the better programming language/software environment:

1)      Learnability - Is the learning curve acceptable? ;
2)      Usability - Is the language/software environment best for the purpose? ;
3)      Maintainability - Is the language/software project easy to maintain? ;
4)      Portability – Is the language/software multiplatform (can it be adapted for an Operating System other than the native)?

The aforementioned factors will be of prime importance to at least four parties:

A) Developer (individual, or team);
B) Testing/Quality Assurance team;
C) Distribution team;
D) Marketing team (if a commercial project is considered.

Optimization of available technology to digest all the information is therefore the first goal of scientific computing. There is also needed necessary ability to identify and quantify technical defects in the whole man-machine interface. Who will judge a technology’s quality? The most common deficiencies in the technology we easily detect. If a heart rate monitor says you have a pulse of 3000, we know there’s something wrong. That is why software programmers have to put their programs to meticulous testing stages. And if the error escapes our detection we have to be continuously on the test, verify, correct and improve the system race track. In a way, program error detection starts with the identification by tracing our steps back to the beginning of the problem. Nowadays, there is a rise in malicious code which can seriously impede an application’s functions. Web applications are a burgeoning way of delivery of functionality for a wide variety of programs. However, it is turning out to be also a major point of vulnerability in organizations today.

Definitely, the line between embedded software and systems for offline analyses is blurring. The ubiquity of mobile devices is also widening the user base. From a telemedicine thermometer patch on your forehead, to the analyses of data to determine the incidence of isolated systolic hypertension in each town of the European Union, one will have no choice but to improve the abilities of our wits. And likewise improve health outcomes. No question about that.

Great links to visit on this subject:

Programming4scientists.com: http://www.programming4scientists.com/

Scientific Sentence.net: http://scientificsentence.net/

What is Scientific Programming?:http://www.linuxforu.com/2011/05/what-is-scientific-programming/

Have a great day- and listen to the music...

Fernando Yaakov Lalana, M.D.

19 April 2013

Scientific Computing: Algorithm and Analysis - Part One

Over the last decade and some, the life sciences have seen an exponential growth due to, in a large part, to the methods employed in modelling, analysis, and simulation. What was once the domain of the engineering sciences is now likewise the province of bioinformatics and biophysics. And the software used by aeronautics engineers is now applied by bioengineers to study molecular dynamics simulations and their role in exploring the synthesis of large molecules. Physical and chemical processes are understood more clearly today owing to the application of algorithms and computer languages open to practically every biological systems analyst. We would not have an implantable cardiac defibrillator were it not for the biomedical engineers, clinicians and researchers working together as a team. In the past, research was reserved for the university and large institutions which could afford the resources, and were either often gifted with large grants e.g. by the government, a drug company, or by a foundation.

In addition, during the last decade or so, Health IT is making the electronic medical record system available to clinicians and hospitals. This has improved patient management in many cases. The OR (operating room) has seen PCs and mobile devices used by physicians to view x-rays, and even robotics guided surgical equipment is being used, for example, by the OB-GYN to extirpate a uterine mass. We are also witnessing the increasing use of nanotechnology to treat cancer and other difficult cases.

All of the preceding scenarios involve the use of algorithms and applications. Some use low-level languages, while some use high-level ones. All require writing a program whether for an embedded system, such as in robotics, or a program for use in chemometrics which is an interdisciplinary ﬁeld that combines statistics and chemistry.

ALGORITHMS.

Several books have been written on the description, nature, design, analysis, language specifics, and optimization of algorithms. What is an algorithm? An algorithm is a sequence of well-defined instructions that takes a value or set of values entered as input into a computer (or other programmable device) to obtain a specific result or group of results, as output.^[1]An algorithm then is a “computable” solution to a problem.

An example algorithm is Euclid’s algorithm which calculates the greatest common divisor (GCD) of two natural numbers a and b. The greatest common divisor g is the largest natural number that divides both a and b without leaving a remainder. The identity bases on gcd(a, b) = gcd(a-b, b). Synonyms for the GCD include the greatest common factor (GCF), the highest common factor (HCF), and the greatest common measure (GCM). Euclid’s algorithm, which computes the GCD of two integers, suffices to calculate the GCD of arbitrarily many integers. To find the GCD of two numbers by this algorithm, repeatedly replace the larger by subtracting the smaller from it until the two numbers are equal. E.g. 132, 168 -> 132, 36 -> 96, 36 -> 60, 36 -> 24, 36 -> 24, 12 -> 12, 12 so the GCD of 132 and 168 is 12. The following block of code is design to express Euclid's algorithm.

For the following C++ code, you could use Code:Blocks available at this site: http://www.codeblocks.org/

C++ code for Euclid’s algorithm would be as follows:

START of CODE

#include <iostream.h>
   // Fundamental idea of Euclid's algorithm (one of the oldest known algorithms)
   // gcd(a,0) = a
   // gcd(a,b) = gcd(b,a%b)
   int gcd (int a, int b)
   {
     int temp;
     while (b != 0)
     {
       temp = a % b;
       a = b;
       b = temp;
     }
     return(a);
   }
   int main ()
   {
     int x, y;
     cout << "Enter two natural numbers: ";
     cin >> x >> y;
     cout << "gcd(" << x << ", " << y << ") = " << gcd(x,y) << endl;
     return(0);
   }

END of CODE

The binary GCD algorithm, also known as Stein's algorithm, is an algorithm that computes the greatest common divisor of two non-negative integers. By replacing divisions and multiplications with shifts, it gains more efficiency by using the binary nature of computers, especially in embedded platforms that have no direct processor support for division. For large integers, a fast GCD algorithm is Lehmer’s, an improvement on the simpler but slower Euclidean algorithm.

Example Applications.

The problem may be to compute for the interest on a time-deposit. Or, the problem may be to sort all the city’s residents’ names (e.g. in alphabetical order, last name first), addresses, phone numbers, and zip codes, for the purpose of publishing a directory, for print or online access. Either way, the algorithm input values may be numerical, or alphabetical, or both (alphanumeric). All this information may then be stored in databases, for later retrieval.

Another example of this process is the Human Genome Project (HGP) which has been able to determine the sequence of chemical base pairs which make up DNA as well as identifying and mapping the approximately 20,000 of the human genome. The genome was broken into smaller pieces, approximately 150,000 base pairs in length. From the start to completion of this project, it took approximately two decades to study. Scientists have made strides in identifying particular genes that underlie a large number of diseases, both genetic and acquired. This project was officially completed in 2003, but the data continues to provide leads into the origin and nature of certain illnesses. Now, it is evolving into public and private ventures to provide individuals, with tissue samples, with a profile of their DNA and their propensities to a certain a disease or diseases. This then gives clinicians an insight into the possible means of either prevention or treatment of the patient’s condition.

The same algorithm may be coded differently, depending on the programming language chosen. And conversely, specific languages may be “better” suited for a specific algorithm or purpose.

Languages of Choice.

We’ll just take up a few of these, since already known ones are now applied to biology, and new ones are steadily being developed. The choice of language depends on at least 4 items:

Familiarity. Previous exposure or work with a particular programming language, and its derivatives. Consider FORTRAN and its child, BASIC. Then we would have Visual Basic, Visual Basic for Applications (VBA), True Basic, etc.
Ease of learning and use. Since Theory and Experiment are melding, the ability to express one’s ideas (or hypothesis) depends on how rapidly the scientists can develop a program or module to test it. At this point there has to be a computer programmer, or a team member who is adept at this craft. This is where the algorithms are tested, possibly with computer generated “normal” test data first to make sure the algorithms make sense. Once you have actual subject data (animal model or human patient), you can run the actual findings in the same computer program in place of the test data.
Time factor. From protocol design to analysis of results, the ability of the research team to come up with the study publication depends, in no small way, on the effort to learn and apply technology in less time than previously known. The pressure is on to produce a significant outcome. By this I mean, whether the results are positive or negative, it must sensibly add and improve on previously held concepts.
Cost. The team head and his colleagues must have a firm grasp on the financial needs of the experiment itself. If it’s tight on a budget, then it has to consider using free open source software, or usually, build a program. If there is a free software package that meets its requirements, then you’re good to go. Otherwise, you’ll have to shop around for cost-effective software.

This will be all for now. On the next post, we’ll be looking into different languages which make good candidates for scientific programming.

Happy coding.
Fernando Yaakov Lalana, M.D.

------------------------------------------------------------

Sites you may want to visit:

Applications Websites

QResearch:

http://www.qresearch.org/SitePages/Home.aspx

Scientific Computing:

http://www.scientificcomputing.com/

Empower 3 Chromatography:

http://www.waters.com/waters/en_US/Empower-3-Chromatography-Data-Software/nav.htm?cid=513188&xcid=x6202&locale=en_US

Databases

Science Direct:

http://www.sciencedirect.com/

Bioinformatics Factsheet:

http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html

Entrez, The Life Sciences Search Engine:

http://www.ncbi.nlm.nih.gov/sites/gquery

Application to try out…

Find the Greatest Common Divisor [Web App]:

http://www.math.sc.edu/~sumner/numbertheory/euclidean/euclidean.html

VSEncryptor 64-bit [Encryption Software, Windows Free,]:

http://download.cnet.com/VSEncryptor-64-Bit/3000-2092_4-75629878.html?tag=mncol;2

08 March 2013

Problem Solving as Programming Principle: Begin with the end in mind.

Problems can be solved by different algorithms:

First, a solution is reached by the obvious way. Then, secondly, it is getting to the solution by the methodological or structural way. Then, you have the “smart” way, meaning, using clever insight into the different aspects of the algorithm, thereby devising the best approach to the problem. I would say that this is the best problem solving algorithm. Lastly, but not the least, is the serendipitous way. But this last method relies almost on a hunch. Hunches are not bad at all. But the problem with that is the inability to write the solution so as to get the right results given different values.

Once, a young lady came up with a question which had to do with an item that she bought at a discounted price. She knew how much she finally paid, but did not know the actual discount percentage. Then her older sister said that the percentage was such and such. She just "knew" the answer but could not explain how she got it. I, in the meantime, wrote out the equation on paper and did the calculations by testing the algorithms I came up with. Well, I did get the same answer as the older sister. We were both right, the difference being that I could repeat the calculations with different variables. She, on the other hand, could not verify her answer.

This is what we are after, devising the best algorithm so as to be able to come up with the ideal solution regardless of the variables, meaning, reproducible results. We cannot rely on hunches or even educated guesses. As in evidence-based medicine, we rely on logical, verifiable methods, taking into consideration all the possible variations as well as mitigating circumstances. Even if one had a eureka moment, we still have to verify the methods involved. That is the hallmark of science- reproducible results. It is a discipline.

To code or not to code…

Given the conditions above, there is then the question: do I need to develop programs? The path to this activity has six points to follow along the way. We will go over those six points briefly.

1.) Do I need a program for myself, for my own needs? Or for others’ needs? Question: What are my goals? If you were to write a program for yourself, or for the department you were working in, then you have a stronger motivation, whether or not you were compensated materially for it.

2.) What do I hope to achieve, or to make progress in? Think more, and think better. Considering the path that lies along the programming road, you will do well to concentrate on the different activities that are necessary for this task. (e.g. seminars, classes, online training). Become a project manager at heart.

3.) Can I do it on my own? Or, do I invite others to collaborate in the project? Feel competent! If you must work alone, do so, remembering that your battles will be yours alone. There are circumstances when working alone is of utmost importance. Like, for example writing a program as part of school tests. Otherwise, there is strength in numbers, and it would be best for the project to form some team for collaboration. Which brings us to the next point…

4.) Equip yourself, equip others. Study well what you’ll need. Prepare, prepare, prepare. As mentioned before, get a good look at what you need to do to continue working strong. And if you’re part of a team, your good insights will strengthen everybody as well. No man is an island in this scenario.

5.) Follow set guidelines, follow through with development milestones. Be disciplined, not stressed out. Stick to the roadmap, and in so doing you will be avoiding latency. In the aspect of agile development, cooperation throughout the course will give a more holistic view, as well as strength.

6.) Give credit to whom it is due, to others and yourself as well. This may be the most important factor for the success of the project and the group itself. Let everybody own the project. That way the outcome is far better than a lackadaisical product.

Reward hard work along the way.

Now that we have laid out the rules, let’s take a look at some code which will be useful, not just for today’s purpose, but will serve as a template if you will.

The code is in C++, and is a snippet for converting temperature from the Centigrade format to the Fahrenheit one, and vice versa. Use the tools we talked about in the previous post, and which are repeated for you below.

Convert Temperatures from Celsius to Fahrenheit and vice versa.

#include <iostream.h>

#include <conio.h>

void main()

{

clrscr();

int choice;

float ctemp,ftemp;

cout << "1.Celsius to Fahrenheit" << endl;

cout << "2.Fahrenheit to Celsius" << endl;

cout << "Choose between 1 & 2 : " << endl;

cin>>choice;

if (choice==1)

{

cout << "Enter the temperature in Celsius : " << endl;

cin>>ctemp;

ftemp=(1.8*ctemp)+32;

cout << "Temperature in Fahrenheit = " << ftemp << endl;

}

else

{

cout << "Enter the temperature in Fahrenheit : " << endl;

cin>>ftemp;

ctemp=(ftemp-32)/1.8;

cout << "Temperature in Celsius = " << ctemp << endl;

}

getch();

}

---------------------------------------------------------------------------------------

Tools for C, C++

C++ Standards Foundation website http://isocpp.org/,

C++ FAQ from its founder’s website http://www.stroustrup.com/

C++ Tutorials http://www.cplusplus.com/doc/tutorial/

C and C++ Programming tutorials http://www.cprogramming.com/tutorial.html

MinGW-Minimalist GNU for Windows http://www.mingw.org/

MinGW –W64 for 32 and 64 bit Windows http://mingw-w64.sourceforge.net/

Dev-C++ (Windows, Linux) http://sourceforge.net/projects/dev-cpp/?source=directory

ARM Linux GCC – OS X Mountain Lion (Mac, of course) http://www.benmont.com/tech/crosscompiler.html

For Commercial Options:

Microsoft Visual Studio (2012) http://www.microsoft.com/visualstudio/eng/team-foundation-service

Embarcadero C++ Builder XE3 for Windows 8 and Mac http://www.embarcadero.com/products/cbuilder

-----------------------------------------------------------

Other application tools you can use:

For structural biology, check out:
COMPLEAT – Protein Complex Enrichment Analysis Tool

http://www.flyrnai.org/compleat/

For statistical operations,
the R language is a really good tool.

The Comprehensive R Archive Network

http://cran.r-project.org/

R bioinformatics Packages (for genetics)-Bioconductor

http://www.bioconductor.org/help/workflows/

R Graph Gallery

A collection of graphics entirely generated with R.

http://gallery.r-enthusiasts.com/

R Projects of interest

http://www.r-project.org/other-projects.html

Good Read:

Science Magazine online http://www.sciencemag.org/

Stay Curious. Stay Busy.

Fernando Yaakov Lalana, M.D.

Featured Molecule

Stabilizing DNA Single Strands : DNA is typically found in the familiar double helix, with two complementary strands interacting through base pairing. However, during many genetic processes such as transcription, repair, and recombination, the helix is unwound and the two strands are exposed. This can be a problem. Single strands have a strong tendency to fold up into local double-helical structures such as hairpins, which can inhibit these processes. PSI researchers have determined the structure of a new single-stranded DNA-binding domain, revealing a surprising connection between the kingdoms of life.

Health IT News Spot

Despite the almost-daily reports of medical-related data breaches — some of them affecting thousands of patients — there are many health-care organizations that still refuse to pony up the money to make sure the data in their care is encrypted.

The people leading these organizations often will say that encryption tools, technologies and services, while readily available and offered by several providers (Protected Trust included), are too costly, that the expense can’t be justified or that the risk of a data breach at their place is pretty low.

Healthcare IT News

Bookshelf

B) Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, Lynn B. Jorde, Peter F. R. Little, Michael J. Dunn, Shankar Surramantasi, Copyright: Wiley

F) Concepts of Genetics, 8th Edition, William S. Klug, Michael R. Cummings; Copyright 2006,2003,2000,1997 William S. Klug, Michael R. Cummings;Published by Pearson Education, Inc.,Pearson Prentice Hall

Copyright Notice/Disclaimer

The ServiceMark "SISTEMA bio~SIGNA" and the logo bearing this Mark is currently in the registration process with the Intellectual Property Office. All the blogger's [Fernando Yaakov Lalana, M.D.] posts are a collation of original material, and others are derived from sources in the Public Domain. Care has been taken to avoid plagiarism or inaccuracies. The material presented is not intended for diagnostic or critical decisions. This blogger will not be held liable for misinterpretations of the material. If any material is deemed erroneous by a reader, kindly notify this blogger through email stating the errata and the material will be removed as soon as possible. In addition, I respect your privacy. Thank you.