Prima2

The flags register, also known as the condition codes register is comprised of 16 bits shown below. Actually for the 16 bits shown below it's called FLAGS when 32 bits it's called EFLAGS!

The carry, parity, zero, sign, and overflow flags are used primarily with conditional jump instructions but you can also use the SETcc instructions and pushf/popf instructions to manage these flags.

Flag Register Instructions

clc	Clears the Carry Flag
cld	Clears the Direction Flag causing string instructions to increment the ESI and EDI index registers.
std	Sets the Direction Flag to 1 causing string instructions to decrement ESI and EDI instead of increment.
clts	Clears the Task Switched Flag in the Machine Status Register. This is a privileged operation and is generally used only by operating system code.
cmc	Toggles (inverts) the Carry Flag
lahf	Copies bits 0-7 of the flags register into AH. This includes flags AF, CF, PF, SF and ZF other bits are undefined.
sahf	Transfers bits 0-7 of AH into the Flags Register. This includes AF, CF, PF, SF and ZF.
pushf	Transfers the Flags Register onto the stack.
popf	Pops word from stack into the Flags Register and then increments ESP by 2
stc	Sets the Carry Flag to 1.
cli	Disables the mask-able hardware interrupts by clearing the Interrupt flag. NMI's and software interrupts are not inhibited.
sti	Sets the Interrupt Flag to 1, which enables recognition of all hardware interrupts.

Conditional Jump Instructions

The following tables shows the condition jump instructions and which flag/s are relevant to each instruction. At first glance there are a bewildering number of instructions for branching to another section of code, these tables help a little.

Jcc Instructions That Test Flags
Instruction	Description	Condition	Opposite
JC	Jump if carry	Carry = 1	JNC
JNC	Jump if no carry	Carry = 0	JC
JZ	Jump if zero	Zero = 1	JNZ
JNZ	Jump if not zero	Zero = 0	JZ
JS	Jump if sign	Sign = 1	JNS
JNS	Jump if no sign	Sign = 0	JS
JO	Jump if overflow	Overflow = 1	JNO
JNO	Jump if no Ovrflw	Overflow = 0	JO
JP	Jump if parity	Parity = 1	JNP
JPE	Jump if parity even	Parity = 1	JPO
JNP	Jump if no parity	Parity = 0	JP
JPO	Jump if parity odd	Parity = 0	JPE

Jcc Instructions for Unsigned Comparisons
Instruction	Description	Condition	Opposite
JA	Jump if above (>)	Carry=0, Zero=0	JNA
JNBE	Jump if not below or equal (not <=)	Carry=0, Zero=0	JBE
JAE	Jump if above or equal (>=)	Carry = 0	JNAE
JNB	Jump if not below (not <)	Carry = 0	JB
JB	Jump if below (<)	Carry = 1	JNB
JNAE	Jump if not above or equal (not >=)	Carry = 1	JAE
JBE	Jump if below or equal (<=)	Carry = 1 or Zero = 1	JNBE
JNA	Jump if not above (not >)	Carry = 1 or Zero = 1	JA
JE	Jump if equal (=)	Zero = 1	JNE
JNE	Jump if not equal ()	Zero = 0	JE

Jcc Instructions for Signed Comparisons
Instruction	Description	Condition	Opposite
JG	Jump if greater (>)	Sign = Overflow or Zero=0	JNG
JNLE	Jump if not less than or equal (not <=)	Sign = Overflow or Zero=0	JLE
JGE	Jump if greater than or equal (>=)	Sign = Overflow	JGE
JNL	Jump if not less than (not <)	Sign = Overflow	JL
JL	Jump if less than (<)	Sign Overflow	JNL
JNGE	Jump if not greater or equal (not >=)	Sign Overflow	JGE
JLE	Jump if less than or equal (<=)	Sign Overflow or Zero = 1	JNLE
JNG	Jump if not greater than (not >)	Sign Overflow or Zero = 1	JG
JE	Jump if equal (=)	Zero = 1	JNE
JNE	Jump if not equal ()	Zero = 0	JE

String Instructions

The 80x86 string instructions generally are used to copy or scan an array of bytes, words or longs - they are efficient and make these operations quite fast.

One uses the direction flag to decide which direction the scanning takes. When the direction flag is cleared the string elements will be accessed from low address to high address, when this flag is set, the string instructions will perform in reverse direction. Instructions cld and std are used for this purpose.

Here is a routine to copy the bytes of a substring into a string. pos is the position to start, len is the length of the substring. The string instructions rep and movsb are used. There is no error checking.

void str_insert(char *szStr, char *szSub, int pos, int len)
{
    _asm ("pushl %esi");
    _asm ("pushl %edi");
    _asm ("cld");                  // cause addresses of esi & edi to increment
    _asm ("movl 0x8(%ebp), %edi"); // address of szStr
    _asm ("movl 0xc(%ebp), %esi"); // address of szSub
    _asm ("add 0x10(%ebp), %edi"); // add start pos to address of szStr
    _asm ("movl 0x14(%ebp), %ecx");// size of sub string in ECX
    _asm ("rep");                  // repeat while ecx != 0
    _asm ("movsb");                // copy the byte
    _asm ("popl %edi");
    _asm ("popl %esi");
}

Floating Point

There are eight floating point registers arranged as a stack and given the names st(0) to st(7). These registers handle floats, doubles and long doubles, although internally for all computations the three floating point types are converted to long double. (80 bits)

One starts any computation by loading a value onto the stack, this loads into the first or highest position on the stack, subsequent loads push the previous one down and loads the new one at TOS (top of stack). The syntax convention for the three types use 't' for long double, 'l' for double and s for floats.

Using the __declspec(naked) qualifier means that you don't need to be concerned if it will be compiled optimised or not.

Many of the floating point instructions have two forms, one that just does whatever function or computation is required and the other doing the same computation plus pop-ing the stack. This is very useful and saves on separate instructions to clean up the stack. Yes, the stack should be clean when you have finished the routine. If the stack becomes full and you try to load another value, errors will occur.

When your function has finished whatever it was doing one value should be left on TOS, this will be pop-ed automatically when the function returns with that value from TOS.

Floating Point Example

We'll jump straight in with a small example that returns the highest value of two doubles.

The comments should help in understanding what's happening. As before the first passed arg is at 8(%ebp) and because a double is 8 bytes the second arg is at 16(%ebp). The code first loads arg 2 to TOS then loads arg 1 to TOS.

FCOMIP then compares the two values and 'sets' or 'not sets' the carry flag depending on which value is greater. If we exit with the _asm("jna fpmaxld"); instruction, 'b' is the greater and will be the only value on stack, so when the routine exits b will be returned and the stack will be clean. If however 'a' is the greater of the two values the program will fall through the _asm("jna fpmaxld"); instruction where the TOS is pop-ed and we then load 'a' on the stack so that it will be returned on exit.

The FPU Control Register

Bits 0-5 are the exception masks. If any of these bits are set, the corresponding condition is ignored by the FPU. If any bit is zero, and the corresponding condition occurs, then the FPU immediately generates an interrupt so the program can handle the exception condition.

Bit 0 is the invalid operation mask. Problems which arise generating an invalid exception include pushing more than eight values onto the stack or attempting to pop an item off an empty stack, trying to take the square root of a negative number, or loading a non-empty register.

Bit 1 masks the denormalised interrupt which occurs if an attemp is made to use a denormalised value. A denormalised value is generally a very small number just beyond the range of the FPU's precision capabilities.

Bit 2 masks the zero divide exception. The FPU will generate an interrupt for a divide by zero or will produce a NaN (Not A Number) depending on this mask.

Bit 3 three masks the overflow exception. This can occur for example if one tries to store a long double in a float variable that is too large for the float.

Bit 4 masks the underflow exception. This can occur for example if one tries to store a long double in a float variable that is too small for the float.

Bit 5 controls whether the precision exception can occur. Floating point values are often an approximation and not an exact representation of the precise figure. This precision exception occurs if the FPU produces an imprecise result, generally the result of an internal rounding operation. Dividing one by ten (1/10) will produce an imprecise result, therefore this exception often occurs since imprecise results are very common.

There are two instructions that allow the loading and storing of the control register word, FLDCW (load control word) and FSTCW (store control word) both instructions work with memory not CPU registers.

_asm("subl $8, %ebp"); // make room
_asm("fstcw 4(%ebp)"); // store fpu control register

Be sure to return EBP to the condition you found it after you've finished with it.

long double trunc_l(long double a)
{
    _asm("fldt 8(%ebp)");       // load a onto fpu stack
    _asm("subl $8, %ebp");      // make room
    _asm("fstcw 4(%ebp)");      // store fpu control register
    _asm("movl $0x0C00, %eax"); // bits 10 & 11 of control register control the
                                // rounding (set bits 11 & 10 [trunc])
    _asm("orl 4(%ebp), %eax"); //
    _asm("movl %eax, (%ebp)"); // store new word at memory
    _asm("fldcw 4(%ebp)");      // load modified control register from memory
    _asm("frndint");            // roundup
    _asm("fldcw 4(%ebp)");      // restore original control word
    _asm("addl $8, %ebp");      // restore original ebp
}

Rounding Control

The FPU Status Register

Bits 0-5 the exception flags. These bits are in the same order as the exception masks in the control register. If the corresponding condition exists, then the bit is set. These bits are independent of the exception masks in the control register. The FPU sets or clears these bits regardless of the corresponding mask setting in the control register.

Bit 6 indicates a stack fault. A stack fault occurs whenever there is a stack overflow or underflow. When this bit is set, the C1 condition code bit determines whether there was a stack overflow (C1=1) or stack underflow (C1=0) condition.

Bit 7 is set if any error condition bit is set. It is the logical OR of bits zero through five. A program can test this bit to quickly determine if an error condition exists.

Status Register Condition Codes
Class	C3	C2	C0
Unsupported	0	0	0
NaN	0	0	1
Normal Finite Number	0	1	0
Infinity	0	1	1
Zero	1	0	0
Empty	1	0	1
Denormal Number	1	1	0

The condition code C1 (bit 9) is the sign bit and can be tested using BT (bit test) instruction.

The FPU instruction FXAM tests the value in st(0) and sets these condition codes accordingly. Here is a routine to test if the value in st(0) is a normal finite number.

int is_normal(double a)
{
    _asm("fldl 8(%ebp)");
    _asm("fxam");
    _asm("fstsw %ax");
    _asm("andw $0x4500, %ax"); // mask out all but relevant bits
    _asm("cmpw $0x400, %ax"); // check for bit 10 only
    _asm("jnz notnorm4");      // not normal
    _asm("movl $1, %eax");     // yes normal
    _asm("jmp exit5");
_asm("notnorm5:");
    _asm("xorl %eax, %eax");   // clear eax
_asm("exit5:");
    _asm("fstp %st(0)");
}

Calling a Function With Asm

I can see no good reason for doing this when one is writing in C but here it is anyway.

void MySetRect(RECT * rc, int left, int top, int right, int bottom)
{
    _asm("movl 8(%ebp),%eax"); // load first var
    _asm("movl 12(%ebp),%ecx"); // load left into ecx
    _asm("movl %ecx,(,%eax)"); // mov left into address pointed to by eax
    _asm("movl 16(%ebp),%ecx"); // load top
    _asm("movl %ecx,4(%eax)"); // mov top
    _asm("movl 20(%ebp),%ecx"); // load right
    _asm("movl %ecx,8(%eax)"); // mov right
    _asm("movl 24(%ebp),%ecx"); // load bottom
    _asm("movl %ecx,12(%eax)"); // mov bottom
}

int main(void)
{
    RECT rc;
    RECT * prc = &rc;
    int left = 10, top = 20, right = 400, bottom = 300;
    _asm("pushl %bottom");
    _asm("pushl %right");
    _asm("pushl %top");
    _asm("pushl %left");
    _asm("pushl %left");
    _asm("pushl %prc");
    _asm("call %MySetRect");
    printf("left %d\n", left);
    printf("top %d\n", top);
    printf("right %d\n", right);
    printf("bottom %d\n", bottom);
    return 0;
}

The arguments should be push-ed in reverse order and finally call the routine %MySetRect. The same method cam be used to call Windows API's.

The If/Else/Endif Construct

{
    int a=8, b=8;
    _asm("movl %b, %ecx");   // load the two vars into regs
    _asm("movl %a, %eax");
_asm("If:");                 // not required - for demonstration only
    _asm("cmpl %ecx, %eax"); // compare the two
    _asm("jne Endif");       // jump to endif if (a!=b)
        " " "                // do something if-ish
_asm("Endif:");
}

{
    int a=8, b=8;
    _asm("movl %b, %ecx");   // load the two vars into regs
    _asm("movl %a, %eax");
_asm("If:");                 // not required - for demonstration only
    _asm("cmpl %ecx, %eax"); // compare the two
    _asm("jne Else");        // if (a!=b) jump to else
       " " "                 // do something if-ish
    _asm("jmp Endif");       // jump to endif
_asm("Else:");
       " " "                 // do something else-ish
_asm("Endif:");
}

Bits 10 & 11	Function
00	To nearest or even
01	Round down
10	Round up
11	Truncate

Assembler with LCC-Win32

Primer 2:

Flags Register