Morphex's blogologue (Life, technology, music, politics, business, mental health and more)

This is the blog of Morten W. Petersen, aka. morphex in various places. I blog about my life, and what I find interesting and/or important. This is a personal blog without any editor or a lot of oversight so treat it as such. :)

My email is morphex@gmail.com.

An OGG/Vorbis player, implemented in Javascript.

My Kiva bragging page
My shared (open source) code on GitHub

Morphex's Blogodex

News
Slashdot

Zope hosting by Nidelven IT

Morten Petersen on Linkedin

Morten Petersen on Facebook

Morten Petersen on SoundCloud

Morten Petersen on MixCloud

Blogologue on Twitter



Older entries



Atom - Subscribe - Categories

Facebook icon Share on Facebook Google+ icon Share on Google+ Twitter icon Share on Twitter LinkedIn icon Share on LinkedIn

C & XML

These last couple of months I've been learning a bit of Assembler and C programming, as these days I have the time available. I've always found Python and other high-level languages fast enough for what I needed to do, but I've always wondered a bit about C and Assembler.

What I've learned so far is that the computer is in fact a very large calculator, and pretty much everything that happens is that instructions are called (for example adding two numbers), and that numbers are moved around in memory, disk, peripherals etc. I've found it useful to learn about Assembler and C because it gives me a more detailed and correct view of how things work in computing.

With my programming and system administration background, I found it easy to dive into C and Assembler, and I also appreciate a lot more what for example Python does as a high-level programming language.

I've been looking for some gig or project to create a C and Assembler project for, and what I've landed on so far is that I want to create an XML parser. An XML parser that validates the Unicode used, as well as insures that the document is "well formed". I haven't gotten that far yet, but I've pretty much decided that the parser should (for now at least) be restricted to an UTF-32-LE encoding, and that whenever I work with pointers the rule is to initialize to null when they are created as well as after free() has been called.

I think this is good fun and I do it whenever I have the time and energy, here's the code so far:

#include <stdlib.h>
#include <stdio.h>

int main() {
  char *buffer = NULL;
  int read = 0;
  buffer = malloc(1024*sizeof(char));
  FILE *file = NULL;
  file = fopen("test.xml.2", "rb+");
  read = fread(buffer, sizeof(char), 1024, file);
  if ((char)buffer[0] == (char)0xFF && (char)buffer[1] == (char)0xFE &&
      (char)buffer[2] == (char)0x00 && (char)buffer[3] == (char)0x00) {
    // We have a UTF-32-LE Byte Order Mark                                       
    printf("BOM found\n");
  } else {
    printf("BOM not found, %x\n", buffer[0]);
    exit(1);
  }
  printf("%i\n", read);
  fwrite(buffer, read, 1, stdout);
  printf("\n");
  free(buffer); buffer = NULL;
  return 0;
}


[Permalink] [By morphex] [Technology (Atom feed)] [09 May 11:44 Europe/Oslo]