Assembler with LCC-Win32

Please inform me of any mistakes in these writings and I will make the corrections, also any contributions (writing) you would like to send me or asm code you would like to contribute. e-mail me

You can download information about the instruction set here.

ftp://download.intel.com/design/pentiumii/manuals/24319102.pdf

 

Primer 1:

Introduction

Gaining Access To Parameters Passed To Functions.

You Must Push And Pop

The Return Value

Sizes And Instructions

Time For An Example Routine

Addressing Modes

Labels

When Compiling Optimised

Timings

 

Introduction

First let me say that if you don't have an op-codes help file or document - get one.

Writing assembler with lcc-win32 requires an extended syntax which is a little clumsy compared to writing code for one of the other assemblers but an advantage is that all one's code can be debugged in Wedit's comfortable debugger. Provided one writes the assembler instructions as separate lines one can step through the code as one would with C code.

For example one could write the instructions with a new line character added so that all asm instructions become one long line.

_asm("fldt 8(%ebp)\n\
        fld %st(0) \n\
        fmul %st \n\
        fld1 \n\
        fsubp \n\
        fsqrt \n\
        fxch %st(1) \n\
        fpatan \n");

Which is quite neat, but unfortunately the debugger will see that as one long line so that using 'step' (F8) will cause all those instructions to be executed as if they were only one line - at least as far as the debugger is concerned.

The same code written as separate lines would be as follows.

    _asm("fldt 8(%ebp);
    _asm("fld %st");
    _asm("fmul %st(0)");
    _asm("fld1");
    _asm("fsubp");
    _asm("fsqrt");
    _asm("fxch %st(1)");
    _asm("fpatan");

Now, each separate instruction can be 'stepped' through and if necessary one can note how the MPU's registers are changing to reflect what each instruction has done.

The extra characters  _asm(" and  ");  can be quickly inserted using a Wedit keyboard macro so one keystroke will give _asm(""); one can then insert the instruction as required without too much bother.

 

Gaining Access To Parameters Passed To Functions.

Assuming one is compiling without optimisation and that integer variables are being passed, the first parameter is at 8(%ebp), the second at 12(%ebp) and so on. Storing the first parameter in say eax would be accomplished like so

_asm("movl 8(%ebp), %eax");

The second in ecx

_asm("movl 12(%ebp), %ecx");

These offsets from ebp are valid when compiling without optimisation.

When optimised one needs a different approach that I will show later.

 

You Must Push And Pop.

When using the registers to hold ones own values it can happen that this will corrupt some of the values that the compiler is relying on to continue executing other code within the program.

Here is a representation of the normal registers showing the relationship of their different sizes and byte orientation. Looking at EAX 

AX is the lower word of the EAX long word, 
AH is the upper byte of the AX word and 
AL is the lower byte of the AX word.

EAX Accumulator most arithmetic and logical computations
EBX Base commonly used to hold indirect addresses
ECX Count use it to count off the number of iterations in a loop
EDX Data holds the overflow from certain arithmetic operations
ESI Source Index use these two registers esi & edi as pointers
EDI Destination Index also use with string instructions
EBP Base Pointer access parameters and local variables
ESP Stack Pointer generally not altered
CS Code Segment points at the segment containing the currently executing code
DS Data Segment points to the data segment of running process
ES Extra Segment
FS
GS
SS Stack Segment
EFLAGS Condition Codes FLAGS -- carry, parity, zero, sign, and overflow flags


EBX ESI EDI EBP and ESP should always be saved and restored if used/changed in your routine. Although you have liberty to use registers for anything, certain registers are best used for specific purposes - for example the jump instruction jecxz looks for zero in ECX.

Always push ebx, esi, edi, ebp and esp if at any time they are changed by your code and pop them when you are finished using them and most certainly before leaving the routine in which they have been changed.

At the start of a routine you might use one or more of these registers

int foo(int a, int b)
{
    _asm("pushl %ebx");             // save ebx
    _asm("pushl %edi");             // save edi
    _asm("movl 8(%ebp), %ebx");

    // more stuff

    _asm("popl %edi");              // restore edi
    _asm("popl %ebx");              // restore ebx
}

 Always pop in reverse order to push-ing.

Take a look at an asm listing generated with lcc-win32, this can be quite illuminating. 

At the start of a routine ebp is push-ed and esp (stack pointer) is copied to ebp, this is why we can use ebp to gain access to the passed parameters.

    pushl %ebp
    movl %esp,%ebp
    = = = = = = = = 
    popl %ebp
    ret

At the end you will notice that ebp is pop-ed.

 

The Return Value

For all integer returns you need to put the value in eax before returning. This is the most efficient way of returning a value from an asm routine. Floating point values are taken from the top of the floating point stack.

Of course one can explicitly return a value using the return statement. Here are two examples, one showing an automatic return, and one showing how to explicitly return the value.

First the automatic return, this uses the register eax or part of it depending on the size of the return value, byte, word, or long word.

char foo(char a, char b)
{
    _asm("movb 8(%ebp), %al");  // load a into al
    _asm("movb 12(%ebp), %cl"); // load b into cl
    _asm("addb %cl, %al");      // add them putting the result in al
    // the automatic return will be al (low byte of ax (low word of eax))
}

Now explicit returning

char foo(char a, char b)
{
    char c;
    _asm("movb 8(%ebp), %al");  // load a into al
    _asm("movb 12(%ebp), %cl"); // load b into ecx
    _asm("addb %cl, %al");      // add them putting the result in al
    _asm("movb %al, %c");       // move (copy) al into c
    return c;
}

Sizes And Instructions

LCC-Win32 uses the AT&T syntax for assembler instructions, one of the consequences is that the size of the operation is determined by the operant itself which is unlike the Intel syntax.

_asm("movl  $6, %eax"); will mov (copy) a long word 0x00000006 into eax

_asm("movw  $6, %eax"); will mov a word 0x0006 into eax (in fact into ax)

_asm("movb  $6, %eax"); will mov a byte 0x06 into eax (actually into al)

Intel's syntax is different. First an AT&T instruction and then the Intel equivalent. 

_asm("movl 8(%ebp), %eax"); // AT&T - copy what is at address (ebp+8) into eax

mov eax, dword ptr [ebp+8]  // Intel - copy into eax what is at address (ebp+8)

Another difference is that the operands are reversed.

 

Time For An Example Routine. (str_len)


size_t str_len(char *s)
{
    _asm("movl 8(%ebp),%ecx");   // first arg is at 8(%ebp)
    _asm("xorl %eax,%eax");      // clear eax
    _asm("jecxz exit");          // check for '0' NULL in ecx
    _asm("decl %eax");           // -1
_asm("lb1:");
    _asm("incl %eax");
    _asm("cmpb $0,(%ecx,%eax)"); // compares '0' (string end) with the byte
                                 // pointed to by ecx with an offset of eax.
    _asm("jne lb1");             // not finished yet?
_asm("exit:");
}

int main(void)
{
    char str[] = "which will be called the translation environment";
    printf("%d\n", str_len(str));
    return 0;
}

The interesting line in the above code is _asm("cmpb $0,(%ecx,%eax)");

The address of the string is stored in ecx while eax is incremented each loop, cmpb compares zero '$0' with the char at address ecx + the value in eax. Then jump to label lb1 if the result is not zero, meaning the end of the NULL terminated string. Finally the length of the string will be in eax and returned automatically.

 Addressing Modes

The x86 processor has various addressing modes that we will look at in more detail.

_asm("movw $0x08, %ax"); The immediate addressing mode copies a value immediately to a register.
_asm("movl %ecx, %eax");
The register addressing mode copies one register to 
another ecx -> eax.
_asm("movl (%eax), %ecx"); The indirect addressing mode copies from the address pointed to by eax to  the register ecx.
_asm("movl 16(%eax), %ecx"); The indexed addressing mode copies from the address pointed to by eax plus 16 to the register ecx.
_asm("movl (1000), %eax");

The direct addressing mode. Copies the contents of address 1000 to eax.

_asm("movl(%edx,%ecx,4),%eax"); The indirect addressing EDX + (ECX * 4)

You might have noticed that the use of parenthesis with a register (%ecx) means -- at the address pointed to by the register, while without parenthesis, %ecx means operating on the register directly.

The example above in the str_len routine used what's called indirect addressing with offset.

_asm("cmpb $0,(%ecx,%eax)");

The address at ( offset($0) + (address pointed to by ecx + index (value in eax) ) ). This is quite a powerful combination and can be used effectively when writing asm code.

 

 Labels

A label is similar to the same in other languages, it marks an address in the code where the program execution can jump to.

_asm("exit:");

In AT&T syntax, and generally I believe, one uses a colon with each label. The instruction to jump however does not use the colon.

_asm("jecxz exit");

Labels are used extensively in asm code as there is no other way to control program flow. Sometimes one can code without a 'jump' but for all cases where one would use control as in C with if, while etc one uses jumping to another section of the code. You can see with the example str_len above that a few controls are used to direct program flow with jecxz  / cmpb / jne 

When creating Labels one must produce unique names for all labels in the same translation unit. You may for example have many functions where EXIT is the place to leave the function, you would need to name all exits differently. So, fileone.c may have exit1 but no other exit1 but filetwo.c can contain exit1.

_asm("exit1:");
_asm("exit2:");
_asm("exit3:");

_asm("f1exit:"); 
_asm("f2exit:"); 
_asm("f3exit:");

When Compiling Optimised

Earlier I showed that accessing parameters passed to a function was done by using an offset from the ebp register and copying the value to a register. Unfortunately the situation changes slightly when compiling optimised - I say unfortunately because ones asm code cannot easily be changed from non-optimised to optimised without resorting to either macros or pre-processor directives. 

To recap, the first parameter was at offset 8 from ebp

_asm("movl 8(%ebp), %eax");

When compiling optimised this same parameter is at offset 4 of esp

_asm("movl 4(%esp), %eax");

Here is my solution, maybe someone can devise a better one.

#if __LCCOPTIMLEVEL > 0
 #define _movl8_eax _asm("movl 4(%esp), %eax") // 1st int to eax
 #define _movl12_eax _asm("movl 8(%esp), %eax") // 2nd int to eax
 #define _movl16_eax _asm("movl 12(%esp), %eax") // 3rd int to eax
 #define _movl20_eax _asm("movl 16(%esp), %eax") // 4th int to eax
 #define _loadfp8t _asm("fldt 4(%esp)") // 1st long double onto stack
 #define _loadfp20t _asm("fldt 16(%esp)") // 2nd long double onto stack
 #define _loadfp8l _asm("fldl 4(%esp)") // 1st double onto stack
 #define _loadfp16l _asm("fldl 12(%esp)") // 2nd double onto stack
 #define _loadfp8s _asm("flds 4(%esp)") // 1st float onto stack
 #define _loadfp12s _asm("flds 8(%esp)") // 2nd float onto stack
#else
 #define _movl8_eax _asm("movl 8(%ebp), %eax") // 1st int to eax
 #define _movl12_eax _asm("movl 12(%ebp), %eax") // 2nd int to eax
 #define _movl16_eax _asm("movl 16(%ebp), %eax") // 3rd int to eax
 #define _movl20_eax _asm("movl 20(%ebp), %eax") // 4th int to eax
 #define _loadfp8t _asm("fldt 8(%ebp)") // 1st long double onto stack
 #define _loadfp20t _asm("fldt 20(%ebp)") // 2nd long double onto stack
 #define _loadfp8l _asm("fldl 8(%ebp)") // 1st double onto stack
 #define _loadfp16l _asm("fldl 16(%ebp)") // 2nd double onto stack
 #define _loadfp8s _asm("flds 8(%ebp)") // 1st float onto stack
 #define _loadfp12s _asm("flds 12(%ebp)") // 2nd float onto stack
#endif

Here is the str_len example again using the macro _movl8_eax that will work for both optimised and not-optimised. 

_movl8_eax will be replace by _asm("movl 8(%ebp),%ecx") by the compiler when not optimised 

_movl8_eax will be replace by _asm("movl 4(%esp),%ecx") when optimised.

size_t str_len(char *s)
{
    _movl8_eax;                  // first arg in eax
    _asm("xorl %eax,%eax");      // clear eax
    _asm("jecxz exit");          // check for '0' NULL in ecx
    _asm("decl %eax");           // -1
_asm("lb1:");
    _asm("incl %eax");
    _asm("cmpb $0,(%ecx,%eax)"); // compares '0' (string end) with the byte
                                 // pointed to by ecx with an offset of eax.
    _asm("jne lb1");             // not finished yet?
_asm("exit:");
}

 

Timings

To check if your routine is fast you need to check how long it takes to execute. Taking a single timing is not accurate so use a loop and average the time by the number of iterations.

Here's an example - converting an unsigned long to hex.

#include <stdio.h>
#include <stdlib.h>
#include <intrinsics.h>

void dw2hex(char *pbuf, unsigned long dw)
{
    _asm("pushl %esi");             // save esi
    _asm("pushl %edi");             // save edi
    _asm("movl 0x8(%ebp), %edi");   // pbuf into edi
    _asm("movl 0xc(%ebp), %esi");   // unsigned long into esi
    _asm("movb $0, 8(%edi)");       // terminate the string after 8 bytes
    _asm("xor %ecx, %ecx");         // clear ecx
    _asm("movb $7, %cl");           // length of string - 1
_asm ("loop:");
    _asm("movw %si, %ax");          // al is the target
    _asm("andb $0xF, %al");         // 00001111 mask out high nibble
    _asm("add $0x90, %al");         // 0x90 char range
    _asm("daa");                    // decimal adjust for addition
    _asm("cmpb $0x89, %al");        // is it a digit 0-9
    _asm("ja digit");
    _asm("adc $0x60, %al");         // add with carry 0x60 (0x40 for upper case)
    _asm("jmp store");
_asm("digit:");
    _asm("adc $0x40, %al");         // add with carry 0x40 (digit range)
_asm("store:");
    _asm("daa");                    // decimal adjust for addition
    _asm("movb %al, (%ecx,%edi)");  // store the the hex(al) in the string
    _asm("shr $4, %esi");           // next nibble
    _asm("dec %ecx");               // dec counter
    _asm("jns loop");               // jump if no sign (ecx would go negative below zero)
    _asm("popl %edi");              // restore
    _asm("popl %esi");              // restore
}

int main(void)
{
    long long t2,t1,t0;
    int i;
    char str[12];
    unsigned long dw = 0x2f3a5e0f;
    t2 = 0;
    for (i =0; i<10000; i++){     // 10000 loop
        t0 = _rdtsc();
        dw2hex(str, dw);
        t1 = _rdtsc();
        t2 += t1 - t0;            // accumulate clock cycles
    }
    xprintf("%lld\t%s dw2hex\n", t2/i, str); // divide clock cycles by i
    t2 = 0;
    for (i =0; i<10000; i++){
        t0 = _rdtsc();
        ultoa(dw, str, 16);
        t1 = _rdtsc();
        t2 += t1 - t0;
    }
    xprintf("%lld\t%s ultoa\n", t2/i, str);

    t2 = 0;
    for (i =0; i<10000; i++){
        t0 = _rdtsc();
        sprintf(str, "%x", dw);
        t1 = _rdtsc();
        t2 += t1 - t0;
    }
    xprintf("%lld\t%s sprintf\n", t2/i, str);

    return 0;
}

Back to main page