Fill This Form To Receive Instant Help

Help in Homework
trustpilot ratings
google ratings


Homework answers / question archive / company that does DNA sequence analysis, and needs a few simple command-line tools

company that does DNA sequence analysis, and needs a few simple command-line tools

Computer Science

company that does DNA sequence analysis, and needs a few simple command-line tools. One of them takes as input a string that consists of A's, C's, T's, and G's (the four nitrogen bases in DNA). Your boss asks you to write assembly program that simply counts how many A's, C's, T's, and G's the input string contain.

Write assembly program called hw5_ex1 (source file hw5_ex1.asm) that reads characters from standard input until the "end of file" is reached. The input provided to the program contains A, C, T, and G characters. The file also may have new line characters (ASCII code 10 decimal), which should be skipped/ignored. The program then must print the count for each character. You can assume (in this whole assignment) that the input doe not contain any other kinds of characters.

The program can take input from the keyboard, in which case the user hits ^D (CTRL D), to signify the "end of file" (at the beginning of a line, i.e., after a new line). Or it can take input from a file, for instance by using cat. Here are example invocations, which you should match ('%' is the command-prompt):

 

% ./hw5_ex1
ACCCTGG
^D
A: 1
C: 3
T: 1
G: 2

 

 

Hint:

  • Remember the "The problem with fgetc()" slides in these slides? Turns out read_char behaves the same way. It puts a 4-byte return value into EAX. If that value is -1 (i.e., 0xFFFFFFFF), then this means "end of file". Otherwise, the 8 low bits, i.e., the content of AL, is the ASCII code of the character that was read.
  • It is tempting to use four individual counters in memory implemented as four distinct labels. You can do , but it would be better to use instead an "array" of 4 counters. That is, a single label in the .bss segment such as: CounterArray resd 4. (Note that data in the .bss segment is initialized to zero.) This is because it will make your program easier to adapt for the next question.
  • One annoyance is that the A, C, T, and G characters don't have consecutive ASCII codes. Therefore, you can't use their ASCII codes (easily) as an index into an array of counters. You have to come up with some solution to deal with this, and some of the possible solutions are easier than other.

Purchase A New Answer

Custom new solution created by our subject matter experts

GET A QUOTE