Prima1

Please inform me of any mistakes in these writings and I will make the corrections, also any contributions (writing) you would like to send me or asm code you would like to contribute. e-mail me

Primer 1:

Introduction

First let me say that if you don't have an op-codes help file or document - get one.

Writing assembler with lcc-win32 requires an extended syntax which is a little clumsy compared to writing code for one of the other assemblers but an advantage is that all one's code can be debugged in Wedit's comfortable debugger. Provided one writes the assembler instructions as separate lines one can step through the code as one would with C code.

For example one could write the instructions with a new line character added so that all asm instructions become one long line.

_asm("fldt 8(%ebp)\n\
        fld %st(0) \n\
        fmul %st \n\
        fld1 \n\
        fsubp \n\
        fsqrt \n\
        fxch %st(1) \n\
        fpatan \n");

Which is quite neat, but unfortunately the debugger will see that as one long line so that using 'step' (F8) will cause all those instructions to be executed as if they were only one line - at least as far as the debugger is concerned.

    _asm("fldt 8(%ebp);
    _asm("fld %st");
    _asm("fmul %st(0)");
    _asm("fld1");
    _asm("fsubp");
    _asm("fsqrt");
    _asm("fxch %st(1)");
    _asm("fpatan");

Now, each separate instruction can be 'stepped' through and if necessary one can note how the MPU's registers are changing to reflect what each instruction has done.

The extra characters _asm(" and "); can be quickly inserted using a Wedit keyboard macro so one keystroke will give _asm(""); one can then insert the instruction as required without too much bother.

Gaining Access To Parameters Passed To Functions.

Assuming one is compiling without optimisation and that integer variables are being passed, the first parameter is at 8(%ebp), the second at 12(%ebp) and so on. Storing the first parameter in say eax would be accomplished like so

You Must Push And Pop.

When using the registers to hold ones own values it can happen that this will corrupt some of the values that the compiler is relying on to continue executing other code within the program.

Here is a representation of the normal registers showing the relationship of their different sizes and byte orientation. Looking at EAX

AX is the lower word of the EAX long word,
AH is the upper byte of the AX word and
AL is the lower byte of the AX word.

EAX	Accumulator	most arithmetic and logical computations
EBX	Base	commonly used to hold indirect addresses
ECX	Count	use it to count off the number of iterations in a loop
EDX	Data	holds the overflow from certain arithmetic operations
ESI	Source Index	use these two registers esi & edi as pointers
EDI	Destination Index	also use with string instructions
EBP	Base Pointer	access parameters and local variables
ESP	Stack Pointer	generally not altered
CS	Code Segment	points at the segment containing the currently executing code
DS	Data Segment	points to the data segment of running process
ES	Extra Segment
FS
GS
SS	Stack Segment
EFLAGS	Condition Codes	FLAGS -- carry, parity, zero, sign, and overflow flags

EBX ESI EDI EBP and ESP should always be saved and restored if used/changed in your routine. Although you have liberty to use registers for anything, certain registers are best used for specific purposes - for example the jump instruction jecxz looks for zero in ECX.

Always push ebx, esi, edi, ebp and esp if at any time they are changed by your code and pop them when you are finished using them and most certainly before leaving the routine in which they have been changed.

int foo(int a, int b)
{
    _asm("pushl %ebx");             // save ebx
    _asm("pushl %edi");             // save edi
    _asm("movl 8(%ebp), %ebx");

    // more stuff

    _asm("popl %edi");              // restore edi
    _asm("popl %ebx");              // restore ebx
}

Take a look at an asm listing generated with lcc-win32, this can be quite illuminating.

At the start of a routine ebp is push-ed and esp (stack pointer) is copied to ebp, this is why we can use ebp to gain access to the passed parameters.

The Return Value

For all integer returns you need to put the value in eax before returning. This is the most efficient way of returning a value from an asm routine. Floating point values are taken from the top of the floating point stack.

Of course one can explicitly return a value using the return statement. Here are two examples, one showing an automatic return, and one showing how to explicitly return the value.

First the automatic return, this uses the register eax or part of it depending on the size of the return value, byte, word, or long word.

char foo(char a, char b)
{
    _asm("movb 8(%ebp), %al"); // load a into al
    _asm("movb 12(%ebp), %cl"); // load b into cl
    _asm("addb %cl, %al");      // add them putting the result in al
    // the automatic return will be al (low byte of ax (low word of eax))
}

char foo(char a, char b)
{
    char c;
    _asm("movb 8(%ebp), %al"); // load a into al
    _asm("movb 12(%ebp), %cl"); // load b into ecx
    _asm("addb %cl, %al");      // add them putting the result in al
    _asm("movb %al, %c");       // move (copy) al into c
    return c;
}

Sizes And Instructions

LCC-Win32 uses the AT&T syntax for assembler instructions, one of the consequences is that the size of the operation is determined by the operant itself which is unlike the Intel syntax.

Intel's syntax is different. First an AT&T instruction and then the Intel equivalent.

Time For An Example Routine. (str_len)

size_t str_len(char *s)
{
    _asm("movl 8(%ebp),%ecx");   // first arg is at 8(%ebp)
    _asm("xorl %eax,%eax");      // clear eax
    _asm("jecxz exit");          // check for '0' NULL in ecx
    _asm("decl %eax");           // -1
_asm("lb1:");
    _asm("incl %eax");
    _asm("cmpb $0,(%ecx,%eax)"); // compares '0' (string end) with the byte
                                 // pointed to by ecx with an offset of eax.
    _asm("jne lb1");             // not finished yet?
_asm("exit:");
}

int main(void)
{
    char str[] = "which will be called the translation environment";
    printf("%d\n", str_len(str));
    return 0;
}

The address of the string is stored in ecx while eax is incremented each loop, cmpb compares zero '$0' with the char at address ecx + the value in eax. Then jump to label lb1 if the result is not zero, meaning the end of the NULL terminated string. Finally the length of the string will be in eax and returned automatically.

Addressing Modes

The x86 processor has various addressing modes that we will look at in more detail.

_asm("movw $0x08, %ax");	The immediate addressing mode copies a value immediately to a register.
_asm("movl %ecx, %eax");	The register addressing mode copies one register to another ecx -> eax.
_asm("movl (%eax), %ecx");	The indirect addressing mode copies from the address pointed to by eax to the register ecx.
_asm("movl 16(%eax), %ecx");	The indexed addressing mode copies from the address pointed to by eax plus 16 to the register ecx.
_asm("movl (1000), %eax");	The direct addressing mode. Copies the contents of address 1000 to eax.
_asm("movl(%edx,%ecx,4),%eax");	The indirect addressing EDX + (ECX * 4)

You might have noticed that the use of parenthesis with a register (%ecx) means -- at the address pointed to by the register, while without parenthesis, %ecx means operating on the register directly.

The example above in the str_len routine used what's called indirect addressing with offset.

The address at ( offset($0) + (address pointed to by ecx + index (value in eax) ) ). This is quite a powerful combination and can be used effectively when writing asm code.

Labels

A label is similar to the same in other languages, it marks an address in the code where the program execution can jump to.

In AT&T syntax, and generally I believe, one uses a colon with each label. The instruction to jump however does not use the colon.

Labels are used extensively in asm code as there is no other way to control program flow. Sometimes one can code without a 'jump' but for all cases where one would use control as in C with if, while etc one uses jumping to another section of the code. You can see with the example str_len above that a few controls are used to direct program flow with jecxz / cmpb / jne

When creating Labels one must produce unique names for all labels in the same translation unit. You may for example have many functions where EXIT is the place to leave the function, you would need to name all exits differently. So, fileone.c may have exit1 but no other exit1 but filetwo.c can contain exit1.

When Compiling Optimised

Earlier I showed that accessing parameters passed to a function was done by using an offset from the ebp register and copying the value to a register. Unfortunately the situation changes slightly when compiling optimised - I say unfortunately because ones asm code cannot easily be changed from non-optimised to optimised without resorting to either macros or pre-processor directives.

#if __LCCOPTIMLEVEL > 0
#define _movl8_eax _asm("movl 4(%esp), %eax") // 1st int to eax
#define _movl12_eax _asm("movl 8(%esp), %eax") // 2nd int to eax
#define _movl16_eax _asm("movl 12(%esp), %eax") // 3rd int to eax
#define _movl20_eax _asm("movl 16(%esp), %eax") // 4th int to eax
#define _loadfp8t _asm("fldt 4(%esp)") // 1st long double onto stack
#define _loadfp20t _asm("fldt 16(%esp)") // 2nd long double onto stack
#define _loadfp8l _asm("fldl 4(%esp)") // 1st double onto stack
#define _loadfp16l _asm("fldl 12(%esp)") // 2nd double onto stack
#define _loadfp8s _asm("flds 4(%esp)") // 1st float onto stack
#define _loadfp12s _asm("flds 8(%esp)") // 2nd float onto stack
#else
#define _movl8_eax _asm("movl 8(%ebp), %eax") // 1st int to eax
#define _movl12_eax _asm("movl 12(%ebp), %eax") // 2nd int to eax
#define _movl16_eax _asm("movl 16(%ebp), %eax") // 3rd int to eax
#define _movl20_eax _asm("movl 20(%ebp), %eax") // 4th int to eax
#define _loadfp8t _asm("fldt 8(%ebp)") // 1st long double onto stack
#define _loadfp20t _asm("fldt 20(%ebp)") // 2nd long double onto stack
#define _loadfp8l _asm("fldl 8(%ebp)") // 1st double onto stack
#define _loadfp16l _asm("fldl 16(%ebp)") // 2nd double onto stack
#define _loadfp8s _asm("flds 8(%ebp)") // 1st float onto stack
#define _loadfp12s _asm("flds 12(%ebp)") // 2nd float onto stack
#endif

Here is the str_len example again using the macro _movl8_eax that will work for both optimised and not-optimised.

_movl8_eax will be replace by _asm("movl 8(%ebp),%ecx") by the compiler when not optimised

size_t str_len(char *s)
{
    _movl8_eax;                  // first arg in eax
    _asm("xorl %eax,%eax");      // clear eax
    _asm("jecxz exit");          // check for '0' NULL in ecx
    _asm("decl %eax");           // -1
_asm("lb1:");
    _asm("incl %eax");
    _asm("cmpb $0,(%ecx,%eax)"); // compares '0' (string end) with the byte
                                 // pointed to by ecx with an offset of eax.
    _asm("jne lb1");             // not finished yet?
_asm("exit:");
}

Timings

To check if your routine is fast you need to check how long it takes to execute. Taking a single timing is not accurate so use a loop and average the time by the number of iterations.

void dw2hex(char *pbuf, unsigned long dw)
{
    _asm("pushl %esi");             // save esi
    _asm("pushl %edi");             // save edi
    _asm("movl 0x8(%ebp), %edi");   // pbuf into edi
    _asm("movl 0xc(%ebp), %esi");   // unsigned long into esi
    _asm("movb $0, 8(%edi)");       // terminate the string after 8 bytes
    _asm("xor %ecx, %ecx");         // clear ecx
    _asm("movb $7, %cl");           // length of string - 1
_asm ("loop:");
    _asm("movw %si, %ax");          // al is the target
    _asm("andb $0xF, %al");         // 00001111 mask out high nibble
    _asm("add $0x90, %al");         // 0x90 char range
    _asm("daa");                    // decimal adjust for addition
    _asm("cmpb $0x89, %al");        // is it a digit 0-9
    _asm("ja digit");
    _asm("adc $0x60, %al");         // add with carry 0x60 (0x40 for upper case)
    _asm("jmp store");
_asm("digit:");
    _asm("adc $0x40, %al");         // add with carry 0x40 (digit range)
_asm("store:");
    _asm("daa");                    // decimal adjust for addition
    _asm("movb %al, (%ecx,%edi)"); // store the the hex(al) in the string
    _asm("shr $4, %esi");           // next nibble
    _asm("dec %ecx");               // dec counter
    _asm("jns loop");               // jump if no sign (ecx would go negative below zero)
    _asm("popl %edi");              // restore
    _asm("popl %esi");              // restore
}

int main(void)
{
    long long t2,t1,t0;
    int i;
    char str[12];
    unsigned long dw = 0x2f3a5e0f;
    t2 = 0;
    for (i =0; i<10000; i++){     // 10000 loop
        t0 = _rdtsc();
        dw2hex(str, dw);
        t1 = _rdtsc();
        t2 += t1 - t0;            // accumulate clock cycles
    }
    xprintf("%lld\t%s dw2hex\n", t2/i, str); // divide clock cycles by i
    t2 = 0;
    for (i =0; i<10000; i++){
        t0 = _rdtsc();
        ultoa(dw, str, 16);
        t1 = _rdtsc();
        t2 += t1 - t0;
    }
    xprintf("%lld\t%s ultoa\n", t2/i, str);

    t2 = 0;
    for (i =0; i<10000; i++){
        t0 = _rdtsc();
        sprintf(str, "%x", dw);
        t1 = _rdtsc();
        t2 += t1 - t0;
    }
    xprintf("%lld\t%s sprintf\n", t2/i, str);

    return 0;
}

Assembler with LCC-Win32