Please inform me of any mistakes in these writings and I will make the corrections, also any contributions (writing) you would like to send me or asm code you would like to contribute. e-mail me
You can download information about the instruction set here.
ftp://download.intel.com/design/pentiumii/manuals/24319102.pdf
Gaining Access To Parameters Passed To Functions.
First let me say that if you don't have an op-codes help file or document - get one.
Writing assembler with lcc-win32 requires an extended syntax which is a little clumsy compared to writing code for one of the other assemblers but an advantage is that all one's code can be debugged in Wedit's comfortable debugger. Provided one writes the assembler instructions as separate lines one can step through the code as one would with C code.
For example one could write the instructions with a new line character added so that all asm instructions become one long line.
_asm("fldt 8(%ebp)\n\
fld %st(0) \n\
fmul %st \n\
fld1 \n\
fsubp \n\
fsqrt \n\
fxch %st(1) \n\
fpatan \n");
Which is quite neat, but unfortunately the debugger will see that as one long line so that using 'step' (F8) will cause all those instructions to be executed as if they were only one line - at least as far as the debugger is concerned.
The same code written as separate lines would be as follows.
_asm("fldt 8(%ebp);
_asm("fld %st");
_asm("fmul %st(0)");
_asm("fld1");
_asm("fsubp");
_asm("fsqrt");
_asm("fxch %st(1)");
_asm("fpatan");
Now, each separate instruction can be 'stepped' through and if necessary one can note how the MPU's registers are changing to reflect what each instruction has done.
The extra characters _asm(" and "); can be quickly inserted using a Wedit keyboard macro so one keystroke will give _asm(""); one can then insert the instruction as required without too much bother.
Assuming one is compiling without optimisation and that integer variables are being passed, the first parameter is at 8(%ebp), the second at 12(%ebp) and so on. Storing the first parameter in say eax would be accomplished like so
_asm("movl 8(%ebp), %eax");
The second in ecx
_asm("movl 12(%ebp), %ecx");
These offsets from ebp are valid when compiling without optimisation.
When optimised one needs a different approach that I will show later.
When using the registers to hold ones own values it can happen that this will corrupt some of the values that the compiler is relying on to continue executing other code within the program.
Here is a representation of the normal registers showing the relationship of their different sizes and byte orientation. Looking at EAX
AX is the lower word of the EAX long word,
AH is the upper byte of the AX word and
AL is the lower byte of the AX word.
EAX | Accumulator | most arithmetic and logical computations |
EBX | Base | commonly used to hold indirect addresses |
ECX | Count | use it to count off the number of iterations in a loop |
EDX | Data | holds the overflow from certain arithmetic operations |
ESI | Source Index | use these two registers esi & edi as pointers |
EDI | Destination Index | also use with string instructions |
EBP | Base Pointer | access parameters and local variables |
ESP | Stack Pointer | generally not altered |
CS | Code Segment | points at the segment containing the currently executing code |
DS | Data Segment | points to the data segment of running process |
ES | Extra Segment | |
FS | ||
GS | ||
SS | Stack Segment | |
EFLAGS | Condition Codes | FLAGS -- carry, parity, zero, sign, and overflow flags |
EBX ESI EDI EBP and ESP should always be saved and
restored if used/changed in your routine. Although you have liberty to use registers for anything,
certain registers are best used for specific purposes - for example the jump instruction jecxz looks for zero in
ECX.
Always push ebx, esi, edi, ebp and esp if at any time they are changed by your code and pop them when you are finished using them and most certainly before leaving the routine in which they have been changed.
At the start of a routine you might use one or more of these registers
int foo(int a, int b)
{
_asm("pushl %ebx"); //
save ebx
_asm("pushl %edi"); //
save edi
_asm("movl 8(%ebp), %ebx");
// more stuff
_asm("popl %edi");
// restore edi
_asm("popl %ebx");
// restore ebx
}
Always pop in reverse order to push-ing.
Take a look at an asm listing generated with lcc-win32, this can be quite illuminating.
At the start of a routine ebp is push-ed and esp (stack pointer) is copied to ebp, this is why we can use ebp to gain access to the passed parameters.
pushl
%ebp
movl %esp,%ebp
= = = = = = = =
popl %ebp
ret
At the end you will notice that ebp is pop-ed.
For all integer returns you need to put the value in eax before returning. This is the most efficient way of returning a value from an asm routine. Floating point values are taken from the top of the floating point stack.
Of course one can explicitly return a value using the return statement. Here are two examples, one showing an automatic return, and one showing how to explicitly return the value.
First the automatic return, this uses the register eax or part of it depending on the size of the return value, byte, word, or long word.
char foo(char a, char b)
{
_asm("movb 8(%ebp), %al"); // load a into al
_asm("movb 12(%ebp), %cl"); // load b into cl
_asm("addb %cl, %al"); // add
them putting the result in al
// the automatic return will be al (low byte of ax (low word of
eax))
}
Now explicit returning
char foo(char a, char b)
{
char c;
_asm("movb 8(%ebp), %al"); // load a into al
_asm("movb 12(%ebp), %cl"); // load b into ecx
_asm("addb %cl, %al"); // add
them putting the result in al
_asm("movb %al, %c"); //
move (copy) al into c
return c;
}
LCC-Win32 uses the AT&T syntax for assembler instructions, one of the consequences is that the size of the operation is determined by the operant itself which is unlike the Intel syntax.
_asm("movl $6, %eax"); will mov (copy) a long word 0x00000006 into eax
_asm("movw $6, %eax"); will mov a word 0x0006 into eax (in fact into ax)
_asm("movb $6, %eax"); will mov a byte 0x06 into eax (actually into al)
Intel's syntax is different. First an AT&T instruction and then the Intel equivalent.
_asm("movl 8(%ebp), %eax"); // AT&T - copy what is at address (ebp+8) into eax
mov eax, dword ptr [ebp+8] // Intel - copy into eax what is at address (ebp+8)
Another difference is that the operands are reversed.
size_t str_len(char *s)
{
_asm("movl 8(%ebp),%ecx"); // first arg is at
8(%ebp)
_asm("xorl %eax,%eax"); //
clear eax
_asm("jecxz
exit"); // check for '0' NULL
in ecx
_asm("decl %eax"); // -1
_asm("lb1:");
_asm("incl %eax");
_asm("cmpb $0,(%ecx,%eax)"); // compares '0' (string end)
with the byte
// pointed to by ecx with an offset of eax.
_asm("jne
lb1"); // not
finished yet?
_asm("exit:");
}
int main(void)
{
char str[] = "which will be called the translation
environment";
printf("%d\n", str_len(str));
return 0;
}
The interesting line in the above code is _asm("cmpb $0,(%ecx,%eax)");
The address of the string is stored in ecx while eax is incremented each loop, cmpb compares zero '$0' with the char at address ecx + the value in eax. Then jump to label lb1 if the result is not zero, meaning the end of the NULL terminated string. Finally the length of the string will be in eax and returned automatically.
The x86 processor has various addressing modes that we will look at in more detail.
_asm("movw $0x08, %ax"); | The immediate addressing mode copies a value immediately to a register. |
_asm("movl %ecx, %eax"); |
|
_asm("movl (%eax), %ecx"); | The indirect addressing mode copies from the address pointed to by eax to the register ecx. |
_asm("movl 16(%eax), %ecx"); | The indexed addressing mode copies from the address pointed to by eax plus 16 to the register ecx. |
_asm("movl (1000), %eax"); | The direct addressing mode. Copies the contents of address 1000 to eax. |
_asm("movl(%edx,%ecx,4),%eax"); | The indirect addressing EDX + (ECX * 4) |
You might have noticed that the use of parenthesis with a register (%ecx) means -- at the address pointed to by the register, while without parenthesis, %ecx means operating on the register directly.
The example above in the str_len routine used what's called indirect addressing with offset.
_asm("cmpb $0,(%ecx,%eax)");
The address at ( offset($0) + (address pointed to by ecx + index (value in eax) ) ). This is quite a powerful combination and can be used effectively when writing asm code.
A label is similar to the same in other languages, it marks an address in the code where the program execution can jump to.
_asm("exit:");
In AT&T syntax, and generally I believe, one uses a colon with each label. The instruction to jump however does not use the colon.
_asm("jecxz exit");
Labels are used extensively in asm code as there is no other way to control program flow. Sometimes one can code without a 'jump' but for all cases where one would use control as in C with if, while etc one uses jumping to another section of the code. You can see with the example str_len above that a few controls are used to direct program flow with jecxz / cmpb / jne
When creating Labels one must produce unique names for all labels in the same translation unit. You may for example have many functions where EXIT is the place to leave the function, you would need to name all exits differently. So, fileone.c may have exit1 but no other exit1 but filetwo.c can contain exit1.
_asm("exit1:");
_asm("exit2:");
_asm("exit3:");
Earlier I showed that accessing parameters passed to a function was done by using an offset from the ebp register and copying the value to a register. Unfortunately the situation changes slightly when compiling optimised - I say unfortunately because ones asm code cannot easily be changed from non-optimised to optimised without resorting to either macros or pre-processor directives.
To recap, the first parameter was at offset 8 from ebp
_asm("movl 8(%ebp), %eax");
When compiling optimised this same parameter is at offset 4 of esp
_asm("movl 4(%esp), %eax");
Here is my solution, maybe someone can devise a better one.
#if __LCCOPTIMLEVEL > 0
#define _movl8_eax _asm("movl 4(%esp), %eax") // 1st int to eax
#define _movl12_eax _asm("movl 8(%esp), %eax") // 2nd int to eax
#define _movl16_eax _asm("movl 12(%esp), %eax") // 3rd int to eax
#define _movl20_eax _asm("movl 16(%esp), %eax") // 4th int to eax
#define _loadfp8t _asm("fldt 4(%esp)") // 1st long double onto stack
#define _loadfp20t _asm("fldt 16(%esp)") // 2nd long double onto stack
#define _loadfp8l _asm("fldl 4(%esp)") // 1st double onto stack
#define _loadfp16l _asm("fldl 12(%esp)") // 2nd double onto stack
#define _loadfp8s _asm("flds 4(%esp)") // 1st float onto stack
#define _loadfp12s _asm("flds 8(%esp)") // 2nd float onto stack
#else
#define _movl8_eax _asm("movl 8(%ebp), %eax") // 1st int to eax
#define _movl12_eax _asm("movl 12(%ebp), %eax") // 2nd int to eax
#define _movl16_eax _asm("movl 16(%ebp), %eax") // 3rd int to eax
#define _movl20_eax _asm("movl 20(%ebp), %eax") // 4th int to eax
#define _loadfp8t _asm("fldt 8(%ebp)") // 1st long double onto stack
#define _loadfp20t _asm("fldt 20(%ebp)") // 2nd long double onto stack
#define _loadfp8l _asm("fldl 8(%ebp)") // 1st double onto stack
#define _loadfp16l _asm("fldl 16(%ebp)") // 2nd double onto stack
#define _loadfp8s _asm("flds 8(%ebp)") // 1st float onto stack
#define _loadfp12s _asm("flds 12(%ebp)") // 2nd float onto stack
#endif
Here is the str_len example again using the macro _movl8_eax that will work for both optimised and not-optimised.
_movl8_eax will be replace by _asm("movl 8(%ebp),%ecx") by the compiler when not optimised
_movl8_eax will be replace by _asm("movl 4(%esp),%ecx") when optimised.
size_t str_len(char *s)
{
_movl8_eax;
// first arg in eax
_asm("xorl %eax,%eax"); //
clear eax
_asm("jecxz
exit"); // check for '0' NULL
in ecx
_asm("decl %eax"); // -1
_asm("lb1:");
_asm("incl %eax");
_asm("cmpb $0,(%ecx,%eax)"); // compares '0' (string end)
with the byte
// pointed to by ecx with an offset of eax.
_asm("jne
lb1");
// not
finished yet?
_asm("exit:");
}
To check if your routine is fast you need to check how long it takes to execute. Taking a single timing is not accurate so use a loop and average the time by the number of iterations.
Here's an example - converting an unsigned long to hex.
#include <stdio.h>
#include <stdlib.h>
#include <intrinsics.h>
void dw2hex(char
*pbuf, unsigned long dw)
{
_asm("pushl %esi"); //
save esi
_asm("pushl %edi"); //
save edi
_asm("movl 0x8(%ebp), %edi"); // pbuf into edi
_asm("movl 0xc(%ebp), %esi"); // unsigned long
into esi
_asm("movb $0, 8(%edi)");
// terminate the string after 8 bytes
_asm("xor %ecx, %ecx"); // clear ecx
_asm("movb $7, %cl"); // length of
string - 1
_asm ("loop:");
_asm("movw %si,
%ax"); // al is the target
_asm("andb $0xF,
%al"); // 00001111 mask out high
nibble
_asm("add $0x90,
%al"); // 0x90 char range
_asm("daa");
// decimal adjust for addition
_asm("cmpb $0x89,
%al"); // is it a digit 0-9
_asm("ja digit");
_asm("adc $0x60,
%al"); // add with carry 0x60 (0x40 for upper case)
_asm("jmp store");
_asm("digit:");
_asm("adc $0x40,
%al"); // add with carry 0x40 (digit
range)
_asm("store:");
_asm("daa");
// decimal adjust for addition
_asm("movb %al, (%ecx,%edi)"); // store the the
hex(al)
in the string
_asm("shr $4, %esi"); // next nibble
_asm("dec %ecx");
// dec counter
_asm("jns
loop");
// jump if no sign (ecx would go negative below zero)
_asm("popl %edi");
// restore
_asm("popl %esi");
// restore
}
int main(void)
{
long long t2,t1,t0;
int i;
char str[12];
unsigned long dw = 0x2f3a5e0f;
t2 = 0;
for (i =0; i<10000; i++){ // 10000 loop
t0 = _rdtsc();
dw2hex(str, dw);
t1 = _rdtsc();
t2 += t1 -
t0; // accumulate clock
cycles
}
xprintf("%lld\t%s dw2hex\n", t2/i, str); // divide clock
cycles by i
t2 = 0;
for (i =0; i<10000; i++){
t0 = _rdtsc();
ultoa(dw, str, 16);
t1 = _rdtsc();
t2 += t1 - t0;
}
xprintf("%lld\t%s ultoa\n", t2/i, str);
t2 = 0;
for (i =0; i<10000; i++){
t0 = _rdtsc();
sprintf(str, "%x", dw);
t1 = _rdtsc();
t2 += t1 - t0;
}
xprintf("%lld\t%s sprintf\n", t2/i, str);
return 0;
}