Morphex's blogologue (Life, technology, music, politics, business, mental health and more)

This is the blog of Morten W. Petersen, aka. morphex in various places. I blog about my life, and what I find interesting and/or important. This is a personal blog without any editor or a lot of oversight so treat it as such. :)

My email is morphex@gmail.com.

An OGG/Vorbis player, implemented in Javascript.

My Kiva bragging page
My shared (open source) code on GitHub

Morphex's Blogodex

News
Slashdot

Zope hosting by Nidelven IT

Morten Petersen on Linkedin

Morten Petersen on Facebook

Morten Petersen on SoundCloud

Morten Petersen on MixCloud

Blogologue on Twitter



Older entries



Atom - Subscribe - Categories


Developing my XML project in C

So, these last couple of months I've been dabbling with C and XML, to get to know C better. Out of all of this there might also come a nice XML parser and writer that will be freely available for anyone to use.

I wish I learned more C earlier, as a lot of things related to IT have fallen into place now that I've been forced to work on low-level stuff.

There have been quite a number of posts to comp.lang.c, and lots of useful information has been exchanged back and forth.

Anyway, the project is here:

https://github.com/morphex/smash_xml

I guess the bulk of the main code and tests are now around 25KB, which is quite a bit of code. One of the next steps is to create the main parsing loop, which will break down the XML file into its internal C representation. There will also have to be some data types created so that things work well.

One of the more interesting points that have come up is whether to use iteration or recursion when working with the internal representation.

In C, when you do a function call, things are pushed onto something called the stack. And with recursion, more and more things get pushed onto the stack, and if the recursion is deep enough, the stack is exhausted with unpredictable results.

So I think I'm opting for an iterative design in the C code, and using little if any recursion. I'm sure that's going to tick some people off, but having predictable and intelligible results when running the code is important for this project, because one can work with malicious input data.

Right now it feels like I'm past the most "painful" parts of learning C, and look forward to learning and writing more in the time to come.


[Permalink] [By morphex] [C & XML (Atom feed)] [13 Jul 00:51 Europe/Oslo]

C & XML

These last couple of months I've been learning a bit of Assembler and C programming, as these days I have the time available. I've always found Python and other high-level languages fast enough for what I needed to do, but I've always wondered a bit about C and Assembler.

What I've learned so far is that the computer is in fact a very large calculator, and pretty much everything that happens is that instructions are called (for example adding two numbers), and that numbers are moved around in memory, disk, peripherals etc. I've found it useful to learn about Assembler and C because it gives me a more detailed and correct view of how things work in computing.

With my programming and system administration background, I found it easy to dive into C and Assembler, and I also appreciate a lot more what for example Python does as a high-level programming language.

I've been looking for some gig or project to create a C and Assembler project for, and what I've landed on so far is that I want to create an XML parser. An XML parser that validates the Unicode used, as well as insures that the document is "well formed". I haven't gotten that far yet, but I've pretty much decided that the parser should (for now at least) be restricted to an UTF-32-LE encoding, and that whenever I work with pointers the rule is to initialize to null when they are created as well as after free() has been called.

I think this is good fun and I do it whenever I have the time and energy, here's the code so far:

#include <stdlib.h>
#include <stdio.h>

int main() {
  char *buffer = NULL;
  int read = 0;
  buffer = malloc(1024*sizeof(char));
  FILE *file = NULL;
  file = fopen("test.xml.2", "rb+");
  read = fread(buffer, sizeof(char), 1024, file);
  if ((char)buffer[0] == (char)0xFF && (char)buffer[1] == (char)0xFE &&
      (char)buffer[2] == (char)0x00 && (char)buffer[3] == (char)0x00) {
    // We have a UTF-32-LE Byte Order Mark                                       
    printf("BOM found\n");
  } else {
    printf("BOM not found, %x\n", buffer[0]);
    exit(1);
  }
  printf("%i\n", read);
  fwrite(buffer, read, 1, stdout);
  printf("\n");
  free(buffer); buffer = NULL;
  return 0;
}


[Permalink] [By morphex] [Technology (Atom feed)] [09 May 11:44 Europe/Oslo]