Assembler with LCC-Win32

Primer 2:

Flags Register

Flag Register Instructions

Conditional Jump Instructions

String Instructions

Floating Point

Floating Point Example

The FPU Control Register

Rounding Control

The FPU Status Register

Calling a function with asm

The If/Else/Endif Construct

 

Flags Register

The flags register, also known as the condition codes register is comprised of 16 bits shown below.  Actually for the 16 bits shown below it's called FLAGS when 32 bits it's called EFLAGS!

 

The carry, parity, zero, sign, and overflow flags are used primarily with conditional jump instructions but you can also use the SETcc instructions and pushf/popf instructions to manage these flags. 

 

Flag Register Instructions

clc Clears the Carry Flag
cld Clears the Direction Flag causing string instructions to increment the ESI and EDI index registers.
std Sets the Direction Flag to 1 causing string instructions to decrement ESI and EDI instead of increment.
clts Clears the Task Switched Flag in the Machine Status Register. This is a privileged operation and is generally used only by operating system code.
cmc Toggles (inverts) the Carry Flag

lahf

Copies bits 0-7 of the flags register into AH. This includes flags AF, CF, PF, SF and ZF other bits are undefined.
sahf Transfers bits 0-7 of AH into the Flags Register. This includes AF, CF, PF, SF and ZF.
pushf Transfers the Flags Register onto the stack.
popf Pops word from stack into the Flags Register and then increments ESP by 2
stc Sets the Carry Flag to 1.
cli Disables the mask-able hardware interrupts by clearing the Interrupt flag. NMI's and software interrupts are not inhibited.
sti Sets the Interrupt Flag to 1, which enables recognition of all hardware interrupts.

 

Conditional Jump Instructions

The following tables shows the condition jump instructions and which flag/s are relevant to each instruction. At first glance there are a bewildering number of instructions for branching to another section of code,  these tables help a little.

Jcc Instructions That Test Flags
Instruction Description Condition Opposite
JC Jump if carry Carry = 1 JNC
JNC Jump if no carry Carry = 0 JC
JZ Jump if zero Zero = 1 JNZ
JNZ Jump if not zero Zero = 0 JZ
JS Jump if sign Sign = 1 JNS
JNS Jump if no sign Sign = 0 JS
JO Jump if overflow Overflow = 1 JNO
JNO Jump if no Ovrflw Overflow = 0 JO
JP Jump if parity Parity = 1 JNP
JPE Jump if parity even Parity = 1 JPO
JNP Jump if no parity Parity = 0 JP
JPO Jump if parity odd Parity = 0 JPE

 

Jcc Instructions for Unsigned Comparisons
Instruction Description Condition Opposite
JA Jump if above (>) Carry=0, Zero=0 JNA
JNBE Jump if not below or equal (not <=) Carry=0, Zero=0 JBE
JAE Jump if above or equal (>=) Carry = 0 JNAE
JNB Jump if not below (not <) Carry = 0 JB
JB Jump if below (<) Carry = 1 JNB
JNAE Jump if not above or equal (not >=) Carry = 1 JAE
JBE Jump if below or equal (<=) Carry = 1 or Zero = 1 JNBE
JNA Jump if not above (not >) Carry = 1 or Zero = 1 JA
JE Jump if equal (=) Zero = 1 JNE
JNE Jump if not equal () Zero = 0 JE

 

Jcc Instructions for Signed Comparisons
Instruction Description Condition Opposite
JG Jump if greater (>) Sign = Overflow or Zero=0 JNG
JNLE Jump if not less than or equal (not <=) Sign = Overflow or Zero=0   JLE
JGE Jump if greater than or equal (>=) Sign = Overflow JGE
JNL Jump if not less than (not <) Sign = Overflow JL
JL Jump if less than (<) Sign Overflow JNL
JNGE Jump if not greater or equal (not >=) Sign Overflow   JGE
JLE Jump if less than or equal (<=) Sign Overflow or Zero = 1 JNLE
JNG Jump if not greater than (not >) Sign Overflow or Zero = 1 JG
JE Jump if equal (=) Zero = 1 JNE
JNE Jump if not equal () Zero = 0 JE

 

String Instructions

The 80x86 string instructions generally are used to copy or scan an array of bytes, words or longs - they are efficient and make these operations quite fast. 

One uses the direction flag to decide which direction the scanning takes. When the direction flag is cleared the string elements will be accessed from low address to high address, when this flag is set, the string instructions will perform in reverse direction. Instructions cld and std are used for this purpose.

Here is a routine to copy the bytes of a substring into a string. pos is the position to start, len is the length of the substring. The string instructions rep and movsb are used. There is no error checking.

void str_insert(char *szStr, char *szSub, int pos, int len)
{
    _asm ("pushl %esi");
    _asm ("pushl %edi");
    _asm ("cld");                  // cause addresses of esi & edi to increment
    _asm ("movl 0x8(%ebp), %edi"); // address of szStr
    _asm ("movl 0xc(%ebp), %esi"); // address of szSub
    _asm ("add 0x10(%ebp), %edi"); // add start pos to address of szStr
    _asm ("movl 0x14(%ebp), %ecx");// size of sub string in ECX
    _asm ("rep");                  // repeat while ecx != 0
    _asm ("movsb");                // copy the byte
    _asm ("popl %edi");
    _asm ("popl %esi");
}

Floating Point

There are eight floating point registers arranged as a stack and given the names st(0) to st(7). These registers handle floats, doubles and long doubles, although internally for all computations the three  floating point types are converted to long double. (80 bits)

One starts any computation by loading a value onto the stack, this loads into the first or highest position on the stack, subsequent loads push the previous one down and loads the new one at TOS (top of stack). The syntax convention for the three types use 't' for long double, 'l' for double and s for floats.

As an example we will load two values passed to function foo.

float __declspec(naked) foo(float a, float b)
{
    _asm("flds 8(%ebp)");    // load second arg first
    _asm("flds 4(%ebp)");    // load first arg second
        ""    ""
}

Using the __declspec(naked) qualifier means that you don't need to be concerned if it will be compiled optimised or not.

Many of the floating point instructions have two forms, one that just does whatever function or computation is required and the other doing the same computation plus pop-ing the stack. This is very useful and saves on separate instructions to clean up the stack. Yes, the stack should be clean when you have finished the routine. If the stack becomes full and you try to load another value, errors will occur.

When your function has finished whatever it was doing one value should be left on TOS, this will be pop-ed automatically when the function returns with that value from TOS.

Floating Point Example

We'll jump straight in with a small example that returns the highest value of two doubles.

#include <stdio.h>

double __declspec(naked) asm_fmax(double a, double b)
{
    _asm("fldl 12(%esp)");   // b        // b TOS
    _asm("fldl 4(%esp)");    // a : b    // a TOS and b moved down one
    _asm("fucomip %st(1)");  // b        // previous instruction pop-ed TOS
    _asm("jna fpmaxld");                 // b will be returned
    _asm("fstp %st(0)");     // | |      // nothing on stack
    _asm("fldl 4(%esp)");    // a        // now a is back and will be returned
_asm("fpmaxld:")                         // a the return double
    _asm("ret")
}

int main(void)
{
    double d1, d2;
    d1 = -100.0L;
    d2 = -90.0L;
    printf("%f\n", asm_fmax(d1, d2));
    printf("%f\n", asm_fmax(d2, d1));
    retrun 0;
}

The comments should help in understanding what's happening. As before the first passed arg is at 8(%ebp) and because a double is 8 bytes the second arg is at 16(%ebp). The code first loads arg 2 to TOS then loads arg 1 to TOS. 

FCOMIP then compares the two values and 'sets' or 'not sets' the carry flag depending on which value is greater. If we exit with the _asm("jna fpmaxld"); instruction, 'b' is the greater and will be the only value on stack, so when the routine exits b will be returned and the stack will be clean. If however 'a' is the greater of the two values the program will fall through the _asm("jna fpmaxld"); instruction where the TOS is pop-ed and we then load 'a' on the stack so that it will be returned on exit.

 

The FPU Control Register

 

Bits 0-5 are the exception masks. If any of these bits are set, the corresponding condition is ignored by the FPU. If any bit is zero, and the corresponding condition occurs, then the FPU immediately generates an interrupt so the program can handle the exception condition.

Bit 0 is the invalid operation mask. Problems which arise generating an invalid exception include pushing more than eight values onto the stack or attempting to pop an item off an empty stack, trying to take the square root of a negative number, or loading a non-empty register.

Bit 1 masks the denormalised interrupt which occurs if an attemp is made to use a denormalised value. A denormalised value is generally a very small number just beyond the range of the FPU's precision capabilities.

Bit 2 masks the zero divide exception. The FPU will generate an interrupt for a divide by zero or will produce a NaN (Not A Number) depending on this mask.

Bit 3 three masks the overflow exception. This can occur for example if one tries to store a long double in a float variable that is too large for the float.

Bit 4 masks the underflow exception. This can occur for example if one tries to store a long double in a float variable that is too small for the float.

Bit 5 controls whether the precision exception can occur. Floating point values are often an approximation and not an exact representation of the precise figure. This precision exception occurs if the FPU produces an imprecise result, generally the result of an internal rounding operation. Dividing one by ten (1/10) will produce an imprecise result, therefore this exception often occurs since imprecise results are very common.

Bits 6-7, 12-15 are reserved.

There are two instructions that allow the loading and storing of the control register word, FLDCW (load control word) and FSTCW (store control word) both instructions work with memory not CPU registers.

One way to store this control word is thus

_asm("fstcw %myshortvar)"); // store fpu control register

Another way is to make room for the store offset from EBP

_asm("subl $8, %ebp"); // make room
_asm("fstcw 4(%ebp)"); // store fpu control register

Be sure to return EBP to the condition you found it after you've finished with it.

_asm("addl $8, %ebp"); // restore original ebp

Here is a routine to truncate a floating point value, rounding up.

long double trunc_l(long double a)
{
    _asm("fldt 8(%ebp)");       // load a onto fpu stack
    _asm("subl $8, %ebp");      // make room
    _asm("fstcw 4(%ebp)");      // store fpu control register
    _asm("movl $0x0C00, %eax"); // bits 10 & 11 of control register control the
                                // rounding (set bits 11 & 10 [trunc])
    _asm("orl 4(%ebp), %eax");  //
    _asm("movl %eax, (%ebp)");  // store new word at memory
    _asm("fldcw 4(%ebp)");      // load modified control register from memory
    _asm("frndint");            // roundup
    _asm("fldcw 4(%ebp)");      // restore original control word
    _asm("addl $8, %ebp");      // restore original ebp
}

Rounding Control

Bits 10 & 11

Function

00 To nearest or even
01 Round down
10 Round up
11 Truncate

 

The FPU Status Register

Bits 0-5 the exception flags. These bits are in the same order as the exception masks in the control register. If the corresponding condition exists, then the bit is set. These bits are independent of the exception masks in the control register. The FPU sets or clears these bits regardless of the corresponding mask setting in the control register.

Bit 6 indicates a stack fault. A stack fault occurs whenever there is a stack overflow or underflow. When this bit is set, the C1 condition code bit determines whether there was a stack overflow (C1=1) or stack underflow (C1=0) condition.

Bit 7 is set if any error condition bit is set. It is the logical OR of bits zero through five. A program can test this bit to quickly determine if an error condition exists.

Bits eight, nine, ten, and fourteen are the coprocessor condition code bits.

 

Status Register Condition Codes
Class C3 C2 C0
Unsupported 0 0 0
NaN 0 0 1
Normal Finite Number 0 1 0
Infinity 0 1 1
Zero 1 0 0
Empty 1 0 1
Denormal Number 1 1 0

 

The condition code C1 (bit 9) is the sign bit and can be tested using BT (bit test) instruction.

The FPU instruction FXAM tests the value in st(0) and sets these condition codes accordingly. Here is a routine to test if the value in st(0) is a normal finite number.

int is_normal(double a)
{
    _asm("fldl 8(%ebp)");
    _asm("fxam");
    _asm("fstsw %ax");
    _asm("andw $0x4500, %ax"); // mask out all but relevant bits
    _asm("cmpw $0x400, %ax");  // check for bit 10 only
    _asm("jnz notnorm4");      // not normal
    _asm("movl $1, %eax");     // yes normal
    _asm("jmp exit5");
_asm("notnorm5:");
    _asm("xorl %eax, %eax");   // clear eax
_asm("exit5:");
    _asm("fstp %st(0)");
}

 

Calling a Function With Asm

I can see no good reason for doing this when one is writing in C but here it is anyway.

#include <windows.h>
#include <stdio.h>

void MySetRect(RECT * rc, int left, int top, int right, int bottom)
{
    _asm("movl 8(%ebp),%eax"); // load first var
    _asm("movl 12(%ebp),%ecx"); // load left into ecx
    _asm("movl %ecx,(,%eax)"); // mov left into address pointed to by eax
    _asm("movl 16(%ebp),%ecx"); // load top
    _asm("movl %ecx,4(%eax)"); // mov top
    _asm("movl 20(%ebp),%ecx"); // load right
    _asm("movl %ecx,8(%eax)"); // mov right
    _asm("movl 24(%ebp),%ecx"); // load bottom
    _asm("movl %ecx,12(%eax)"); // mov bottom
}

int main(void)
{
    RECT rc;
    RECT * prc = &rc;
    int left = 10, top = 20, right = 400, bottom = 300;
    _asm("pushl %bottom");
    _asm("pushl %right");
    _asm("pushl %top");
    _asm("pushl %left");
    _asm("pushl %left");
    _asm("pushl %prc");
    _asm("call %MySetRect");
    printf("left %d\n", left);
    printf("top %d\n", top);
    printf("right %d\n", right);
    printf("bottom %d\n", bottom);
    return 0;
}

The arguments should be push-ed in reverse order and finally call the routine %MySetRect. The same method cam be used to call Windows API's.

The If/Else/Endif Construct

First the if/endif construct without an else

{
    int a=8, b=8;
    _asm("movl %b, %ecx");   // load the two vars into regs
    _asm("movl %a, %eax");
_asm("If:");                 // not required - for demonstration only
    _asm("cmpl %ecx, %eax"); // compare the two
    _asm("jne Endif");       // jump to endif if (a!=b)
        " " "                // do something if-ish
_asm("Endif:");
}

Now the if/else/endif

As said previously, all control is achieved by jumps of one sort or another.

{
    int a=8, b=8;
    _asm("movl %b, %ecx");   // load the two vars into regs
    _asm("movl %a, %eax");
_asm("If:");                 // not required - for demonstration only
    _asm("cmpl %ecx, %eax"); // compare the two
    _asm("jne Else");        // if (a!=b) jump to else
       " " "                 // do something if-ish
    _asm("jmp Endif");       // jump to endif
_asm("Else:");
       " " "                 // do something else-ish
_asm("Endif:");
}

 

Back to main page