The flags register, also known as the condition codes register is comprised of 16 bits shown below. Actually for the 16 bits shown below it's called FLAGS when 32 bits it's called EFLAGS!
The carry, parity, zero, sign, and overflow flags are used primarily with conditional jump instructions but you can also use the SETcc instructions and pushf/popf instructions to manage these flags.
clc | Clears the Carry Flag |
cld | Clears the Direction Flag causing string instructions to increment the ESI and EDI index registers. |
std | Sets the Direction Flag to 1 causing string instructions to decrement ESI and EDI instead of increment. |
clts | Clears the Task Switched Flag in the Machine Status Register. This is a privileged operation and is generally used only by operating system code. |
cmc | Toggles (inverts) the Carry Flag |
lahf |
Copies bits 0-7 of the flags register into AH. This includes flags AF, CF, PF, SF and ZF other bits are undefined. |
sahf | Transfers bits 0-7 of AH into the Flags Register. This includes AF, CF, PF, SF and ZF. |
pushf | Transfers the Flags Register onto the stack. |
popf | Pops word from stack into the Flags Register and then increments ESP by 2 |
stc | Sets the Carry Flag to 1. |
cli | Disables the mask-able hardware interrupts by clearing the Interrupt flag. NMI's and software interrupts are not inhibited. |
sti | Sets the Interrupt Flag to 1, which enables recognition of all hardware interrupts. |
The following tables shows the condition jump instructions and which flag/s are relevant to each instruction. At first glance there are a bewildering number of instructions for branching to another section of code, these tables help a little.
Instruction | Description | Condition | Opposite |
---|---|---|---|
JC | Jump if carry | Carry = 1 | JNC |
JNC | Jump if no carry | Carry = 0 | JC |
JZ | Jump if zero | Zero = 1 | JNZ |
JNZ | Jump if not zero | Zero = 0 | JZ |
JS | Jump if sign | Sign = 1 | JNS |
JNS | Jump if no sign | Sign = 0 | JS |
JO | Jump if overflow | Overflow = 1 | JNO |
JNO | Jump if no Ovrflw | Overflow = 0 | JO |
JP | Jump if parity | Parity = 1 | JNP |
JPE | Jump if parity even | Parity = 1 | JPO |
JNP | Jump if no parity | Parity = 0 | JP |
JPO | Jump if parity odd | Parity = 0 | JPE |
Instruction | Description | Condition | Opposite |
---|---|---|---|
JA | Jump if above (>) | Carry=0, Zero=0 | JNA |
JNBE | Jump if not below or equal (not <=) | Carry=0, Zero=0 | JBE |
JAE | Jump if above or equal (>=) | Carry = 0 | JNAE |
JNB | Jump if not below (not <) | Carry = 0 | JB |
JB | Jump if below (<) | Carry = 1 | JNB |
JNAE | Jump if not above or equal (not >=) | Carry = 1 | JAE |
JBE | Jump if below or equal (<=) | Carry = 1 or Zero = 1 | JNBE |
JNA | Jump if not above (not >) | Carry = 1 or Zero = 1 | JA |
JE | Jump if equal (=) | Zero = 1 | JNE |
JNE | Jump if not equal () | Zero = 0 | JE |
Instruction | Description | Condition | Opposite |
---|---|---|---|
JG | Jump if greater (>) | Sign = Overflow or Zero=0 | JNG |
JNLE | Jump if not less than or equal (not <=) | Sign = Overflow or Zero=0 | JLE |
JGE | Jump if greater than or equal (>=) | Sign = Overflow | JGE |
JNL | Jump if not less than (not <) | Sign = Overflow | JL |
JL | Jump if less than (<) | Sign Overflow | JNL |
JNGE | Jump if not greater or equal (not >=) | Sign Overflow | JGE |
JLE | Jump if less than or equal (<=) | Sign Overflow or Zero = 1 | JNLE |
JNG | Jump if not greater than (not >) | Sign Overflow or Zero = 1 | JG |
JE | Jump if equal (=) | Zero = 1 | JNE |
JNE | Jump if not equal () | Zero = 0 | JE |
The 80x86 string instructions generally are used to copy or scan an array of bytes, words or longs - they are efficient and make these operations quite fast.
One uses the direction flag to decide which direction the scanning takes. When the direction flag is cleared the string elements will be accessed from low address to high address, when this flag is set, the string instructions will perform in reverse direction. Instructions cld and std are used for this purpose.
Here is a routine to copy the bytes of a substring into a string. pos is the position to start, len is the length of the substring. The string instructions rep and movsb are used. There is no error checking.
void str_insert(char
*szStr, char *szSub, int pos, int len)
{
_asm ("pushl %esi");
_asm ("pushl %edi");
_asm ("cld");
// cause addresses of esi & edi to increment
_asm ("movl 0x8(%ebp), %edi"); // address of szStr
_asm ("movl 0xc(%ebp), %esi"); // address of szSub
_asm ("add 0x10(%ebp), %edi"); // add start pos to address of szStr
_asm ("movl 0x14(%ebp), %ecx");// size of sub string in ECX
_asm ("rep");
// repeat while ecx != 0
_asm ("movsb");
// copy the byte
_asm ("popl %edi");
_asm ("popl %esi");
}
There are eight floating point registers arranged as a stack and given the names st(0) to st(7). These registers handle floats, doubles and long doubles, although internally for all computations the three floating point types are converted to long double. (80 bits)
One starts any computation by loading a value onto the stack, this loads into the first or highest position on the stack, subsequent loads push the previous one down and loads the new one at TOS (top of stack). The syntax convention for the three types use 't' for long double, 'l' for double and s for floats.
As an example we will load two values passed to function foo.
Using the __declspec(naked) qualifier means that you don't need to be concerned if it will be compiled optimised or not.
Many of the floating point instructions have two forms, one that just does whatever function or computation is required and the other doing the same computation plus pop-ing the stack. This is very useful and saves on separate instructions to clean up the stack. Yes, the stack should be clean when you have finished the routine. If the stack becomes full and you try to load another value, errors will occur.
When your function has finished whatever it was doing one value should be left on TOS, this will be pop-ed automatically when the function returns with that value from TOS.
We'll jump straight in with a small example that returns the highest value of two doubles.
The comments should help in understanding what's happening. As before the first passed arg is at 8(%ebp) and because a double is 8 bytes the second arg is at 16(%ebp). The code first loads arg 2 to TOS then loads arg 1 to TOS.
FCOMIP then compares the two values and 'sets' or 'not sets' the carry flag depending on which value is greater. If we exit with the _asm("jna fpmaxld"); instruction, 'b' is the greater and will be the only value on stack, so when the routine exits b will be returned and the stack will be clean. If however 'a' is the greater of the two values the program will fall through the _asm("jna fpmaxld"); instruction where the TOS is pop-ed and we then load 'a' on the stack so that it will be returned on exit.
Bits 0-5 are the exception masks. If any of these bits are set, the corresponding condition is ignored by the FPU. If any bit is zero, and the corresponding condition occurs, then the FPU immediately generates an interrupt so the program can handle the exception condition.
Bit 0 is the invalid operation mask. Problems which arise generating an invalid exception include pushing more than eight values onto the stack or attempting to pop an item off an empty stack, trying to take the square root of a negative number, or loading a non-empty register.
Bit 1 masks the denormalised interrupt which occurs if an attemp is made to use a denormalised value. A denormalised value is generally a very small number just beyond the range of the FPU's precision capabilities.
Bit 2 masks the zero divide exception. The FPU will generate an interrupt for a divide by zero or will produce a NaN (Not A Number) depending on this mask.
Bit 3 three masks the overflow exception. This can occur for example if one tries to store a long double in a float variable that is too large for the float.
Bit 4 masks the underflow exception. This can occur for example if one tries to store a long double in a float variable that is too small for the float.
Bit 5 controls whether the precision exception can occur. Floating point values are often an approximation and not an exact representation of the precise figure. This precision exception occurs if the FPU produces an imprecise result, generally the result of an internal rounding operation. Dividing one by ten (1/10) will produce an imprecise result, therefore this exception often occurs since imprecise results are very common.
Bits 6-7, 12-15 are reserved.
There are two instructions that allow the loading and storing of the control register word, FLDCW (load control word) and FSTCW (store control word) both instructions work with memory not CPU registers.
One way to store this control word is thus
_asm("fstcw %myshortvar)"); // store fpu control register
Another way is to make room for the store offset from EBP
_asm("subl $8, %ebp"); // make room
_asm("fstcw 4(%ebp)"); // store fpu control register
Be sure to return EBP to the condition you found it after you've finished with it.
_asm("addl $8, %ebp"); // restore original ebp
Here is a routine to truncate a floating point value, rounding up.
long double trunc_l(long double a)
{
_asm("fldt 8(%ebp)");
// load a onto fpu stack
_asm("subl $8, %ebp"); // make room
_asm("fstcw 4(%ebp)");
// store fpu control register
_asm("movl $0x0C00, %eax"); // bits 10 & 11 of control register control the
// rounding (set bits 11 & 10 [trunc])
_asm("orl 4(%ebp), %eax"); //
_asm("movl %eax, (%ebp)"); // store new word at memory
_asm("fldcw 4(%ebp)");
// load modified control register from memory
_asm("frndint");
// roundup
_asm("fldcw 4(%ebp)"); // restore original control word
_asm("addl $8, %ebp"); // restore original ebp
}
Bits 10 & 11 |
Function |
00 | To nearest or even |
01 | Round down |
10 | Round up |
11 | Truncate |
Bits 0-5 the exception flags. These bits are in the same order as the exception masks in the control register. If the corresponding condition exists, then the bit is set. These bits are independent of the exception masks in the control register. The FPU sets or clears these bits regardless of the corresponding mask setting in the control register.
Bit 6 indicates a stack fault. A stack fault occurs whenever there is a stack overflow or underflow. When this bit is set, the C1 condition code bit determines whether there was a stack overflow (C1=1) or stack underflow (C1=0) condition.
Bit 7 is set if any error condition bit is set. It is the logical OR of bits zero through five. A program can test this bit to quickly determine if an error condition exists.
Bits eight, nine, ten, and fourteen are the coprocessor condition code bits.
Class | C3 | C2 | C0 |
---|---|---|---|
Unsupported | 0 | 0 | 0 |
NaN | 0 | 0 | 1 |
Normal Finite Number | 0 | 1 | 0 |
Infinity | 0 | 1 | 1 |
Zero | 1 | 0 | 0 |
Empty | 1 | 0 | 1 |
Denormal Number | 1 | 1 | 0 |
The condition code C1 (bit 9) is the sign bit and can be tested using BT (bit test) instruction.
The FPU instruction FXAM tests the value in st(0) and sets these condition codes accordingly. Here is a routine to test if the value in st(0) is a normal finite number.
int is_normal(double a)
{
_asm("fldl 8(%ebp)");
_asm("fxam");
_asm("fstsw %ax");
_asm("andw $0x4500, %ax"); // mask out all but relevant bits
_asm("cmpw $0x400, %ax"); // check for bit 10 only
_asm("jnz notnorm4"); // not normal
_asm("movl $1, %eax"); // yes normal
_asm("jmp exit5");
_asm("notnorm5:");
_asm("xorl %eax, %eax"); // clear eax
_asm("exit5:");
_asm("fstp %st(0)");
}
I can see no good reason for doing this when one is writing in C but here it is anyway.
#include <windows.h>
#include <stdio.h>
void MySetRect(RECT *
rc, int left, int top, int right, int bottom)
{
_asm("movl 8(%ebp),%eax"); // load first var
_asm("movl 12(%ebp),%ecx"); // load left into ecx
_asm("movl %ecx,(,%eax)"); // mov left into address pointed to by eax
_asm("movl 16(%ebp),%ecx"); // load top
_asm("movl %ecx,4(%eax)"); // mov top
_asm("movl 20(%ebp),%ecx"); // load right
_asm("movl %ecx,8(%eax)"); // mov right
_asm("movl 24(%ebp),%ecx"); // load bottom
_asm("movl %ecx,12(%eax)"); // mov bottom
}
int main(void)
{
RECT rc;
RECT * prc = &rc;
int left = 10, top = 20, right = 400, bottom = 300;
_asm("pushl %bottom");
_asm("pushl %right");
_asm("pushl %top");
_asm("pushl %left");
_asm("pushl %left");
_asm("pushl %prc");
_asm("call %MySetRect");
printf("left %d\n", left);
printf("top %d\n", top);
printf("right %d\n", right);
printf("bottom %d\n", bottom);
return 0;
}
The arguments should be push-ed in reverse order and finally call the routine %MySetRect. The same method cam be used to call Windows API's.
First the if/endif construct without an else
{
int a=8, b=8;
_asm("movl %b, %ecx"); // load the two vars into regs
_asm("movl %a, %eax");
_asm("If:");
// not required - for demonstration only
_asm("cmpl %ecx, %eax"); // compare the two
_asm("jne Endif"); // jump to endif if (a!=b)
" "
"
// do something if-ish
_asm("Endif:");
}
Now the if/else/endif
As said previously, all control is achieved by jumps of one sort or another.
{
int a=8, b=8;
_asm("movl %b, %ecx"); // load the two vars into regs
_asm("movl %a, %eax");
_asm("If:");
// not required - for demonstration only
_asm("cmpl %ecx, %eax"); // compare the two
_asm("jne Else");
// if (a!=b) jump to else
" "
"
// do something if-ish
_asm("jmp Endif"); // jump to endif
_asm("Else:");
" "
"
// do something else-ish
_asm("Endif:");
}