The 107th Euler Problem aims to calculate the most efficient way to connect a given network. This is definitely a problem that can be solved using graphs, i.e. it’s a typical minimum spanning tree (MST) problem (Prim’s algorithm).

Please note that the code below is just a snapshot of the original code. If one wants to check the entire code, please go to my github repository.

I started by creating a class to represent the graph. My representation uses an adjacency matrix, however for space-saving purpose I could have used adjacency lists instead.

public class Graph { public int nvertices; public int[][] edges; ...

Then I created a *Prim’s* class which has a method to calculate the minimum spanning tree of a given graph. The method, named *prim*, is explained below together with the class:

public class Prim { public Graph g; public int start; private final int MAXINT = Integer.MAX_VALUE; private int[] distance; private int[] parent;

As one can see, there are a number of global variables: the graph, the starting node, the maximum integer value (explained below) and two arrays – one containing the distances and another containing the parents. These are very important variables for the prim’s method below:

public int prim() { //Variables for loops and saving information int i, v, weight, dist; //Total amount saved by creating a MST int save = 0; //Number of vertices int nVertices = this.g.getNvertices(); //Current elements in the final tree boolean[] intree = new boolean[nVertices]; //Initialization of the arrays for (i=0; i < nVertices; i++) { intree[i] = false; this.distance[i] = this.MAXINT; this.parent[i] = -1; }

These first lines initialize the helper variables and the arrays which will hold the information regarding the MST. Furthermore, the following code is the responsible for creating the MST and calculate the total saved:

//Starting point this.distance[this.start] = 0; v = this.start; //Loop through the vertices while(intree[v] == false) { intree[v] = true; // Search all adjacent nodes and update distances for(int j=0; j<nVertices; j++) { weight = this.g.edges[v][j]; if (weight != 0) { if((this.distance[j] > weight) && intree[j] == false) { if (this.distance[j] != this.MAXINT) save += this.distance[j]; this.distance[j] = weight; this.parent[j] = v; } else if (intree[j] == false) save += weight; } } //Find out next vertice to explore v = 0; dist = this.MAXINT; for(i=0; i< nVertices; i++) { if ((intree[i] == false) && (dist > this.distance[i])) { dist = this.distance[i]; v = i; } } } //Return the total saved return save;

For the scope of this problem, the *save* variable is important since it is accumulating the total savings during the process of building the MST. Then it was a matter of parsing the input file into a matrix and pass it on to the Prim’s class in order to perform the calculations. The Main class is also available in my github repository.

This problem is about breaking the XOR cipher using only the cipher text and having the following information:

- The key used to encrypt the original text has length 3;
- The key contains lower case characters.

In such a simple encryption algorithm, it is actually possible to find the original text and the key by brute-force all combinations, however I decided to try something else.

It is possible to find the key by doing the following:

- First divide the cypher characters into three groups depending on which key character they were encoded from;
- Create a distribution of the elements of each group;
- Take the first element of each distribution;
- XOR those elements with the most used English character (the space, according to wikipedia).

Imagining that the key is “KEY”, then for a given plain text, the key repeats itself until the end. So, for a given plain text INPUTTEXT, the character “K” of the key encodes the characters “I”, “U” and “E” of the plain text; the character “E” of the key encodes the characters “N”, “T” and “X” and so forth. That is the reason behind the 1st step.

The 2nd step aims to verify which characters are used the most. The distribution will show that and then we just need to select the most used character of each of the three groups, as described in the step 3.

Finally, in the 4th step we XOR each character from the step 3 with the space (ascii code 32) and we obtain the key. Then it is just a matter of decoding the text using that key.

Translating this process into Haskell, I wrote the following function to perform the first 2 steps mentioned above:

isolate l = let size = length l tmp = f (size-1) l1 = [ (!!) l (i-1) | i <- [ 3*j+1 | j <- [0..tmp] ]] l2 = [ (!!) l (i-1) | i <- [ 3*j-1 | j <- [1..tmp] ]] l3 = [ (!!) l (i-1) | i <- [ 3*j | j <- [1..tmp] ]] in (select1 l1, select1 l2,select1 l3) where f :: Int -> Int f n = fromInteger $ round $ fromIntegral n/3

This function returns the three sorted distributions (select1 comes from Numeric.Probability.Example.Collection). Then, I extracted the most used character of the three groups and I obtained the following:

(71,0.174563591022444) (79,0.21250000000000013) (68,0.19250000000000012)

So, these three characters (71, 79 and 68) are the most used characters, in order. Since the most used character in the English language is the space (ascii 32), it is a matter of XORing the three characters with space to obtain the key:

xor_int :: Int -> Int -> Int xor_int a b = a `xor` b

get_key l = map (chr . xor_int 32) l

*Main> get_key [71,79,68] "god"

Since I have the key now, it is a matter of decoding the whole cipher text and obtain the plain text. For that, I used the following function:

main = do y <- readFile "cipher1.txt" l <- parseList y let k = decode [103,111,100] l print k where parseList :: String -> IO [Int] parseList = readIO decode key text = let l = take (length text) (cycle key) k = xor_list text l in map chr k where xor_list [] [] = [] xor_list (x:xs) (y:ys) = (x `xor_int` y) : xor_list xs ys

And the result is:

“(The Gospel of John, chapter 1) 1 In the beginning the Word already existed. He was with God, and he was God. 2 He was in the beginning with God. 3 He created everything there is. Nothing exists that he didn’t make. 4 Life itself was in him, and this life gives light to everyone. 5 The light shines through the darkness, and the darkness can never extinguish it. 6 God sent John the Baptist 7 to tell everyone about the light so that everyone might believe because of his testimony. 8 John himself was not the light; he was only a witness to the light. 9 The one who is the true light, who gives light to everyone, was going to come into the world. 10 But although the world was made through him, the world didn’t recognize him when he came. 11 Even in his own land and among his own people, he was not accepted. 12 But to all who believed him and accepted him, he gave the right to become children of God. 13 They are reborn! This is not a physical birth resulting from human passion or plan, this rebirth comes from God.14 So the Word became human and lived here on earth among us. He was full of unfailing love and faithfulness. And we have seen his glory, the glory of the only Son of the Father.”

In order to solve this problem, I just created a brute force algorithm in Haskell:

import List import Char result = maximum [g (a^b) | a <- [1..99], b <- [1..99]] where g = sum . map digitToInt . show main = print $ result

The execution is pretty fast:

I solved the 79th Euler problem with pen and paper, since the input data was rather small (50 attempts, with repetitions).

The ideia is to find the shortest possible secret passcode based on a log fie containing 50 successful login attempts. Furthermore, the characters are asked in order, which eases the process of finding the solution.

So, I started with the first one: 319. This means that the number 3 will appear before 1 and 9; the number 1 will appear after 3 and before 9; finally, the number 9 will appear after 3 and 1. Then I continued with all the numbers in the log file. At the end, I just run a quick walkthrough the log file again to be sure. I ended up with the right solution: the password is 73162890.

I was playing around with some sorting algorithms and I decided to try to compare the implementation of the QuickSort algorithm in Haskell and in C.

**C in-place partition implementation**

Starting by C, I implemented an in-place partition algorithm. This means that no auxiliary memory is used by the program besides the memory needed to store the array. Here is the definition of QuickSort:

void quicksort(long unsigned int *array, long int min, long int max) { long int pivotIndex; if (max > min) { pivotIndex = partition(array,min,max); quicksort(array,min,pivotIndex-1); quicksort(array,pivotIndex+1,max); } }

The basic idea is to select a pivot and move all the numbers smaller than the pivot to the left side of the array and the bigger numbers to the right side of the array. Then, proceed with the same algorithm for both sub-lists.

The partition procedure swaps the elements of the array around according to the pivot:

long int partition(long unsigned int *array, long int min, long int max) { long int left = min, right = max, pivotValue = array[min], pivot = min; while ( left < right ) { while( array[left] <= pivotValue ) left++; while( array[right] > pivotValue ) right--; if ( left < right ) swap(array,left,right); } array[min] = array[right]; array[right] = pivotValue; return right; }

Finally, a swap procedure:

void swap(long unsigned int *array,long int from, long int to) { long unsigned int temp = array[from]; array[from] = array[to]; array[to] = temp; }

As one can see, it is a lot of code just to implement the QuickSort algorithm. The execution time is of course related to the choice of a good pivot. In order to test it and check its performance, I have generated a file with 10.000 random integers using this script. I also include the number of integers in the top of the script to make it easier to allocate memory :-) Then I read this file into an array, sort it and write it back to a file. Here is the main function (please note that no error handling was made in order to keep the code easier to read for its purpose):

int main() { long int size = 0, min = 0; long unsigned int *array; long int i, j = 0; FILE *fpin = fopen("numbers.txt","r"); FILE *fpout = fopen("sorted_numbers.txt","w"); long unsigned int line = 0; char buffer[256]; //Step 0 - Read first line and get the size of the file if(feof(fpin) == 0) { fscanf(fpin,"%d\n",&size); } array = (long unsigned int*) malloc(size*sizeof(long unsigned int)); //Step 1 - read input file into array while( fgets(buffer,256,fpin) != NULL ) { array[j++] = strtoul(buffer,NULL,0); } fclose(fpin); //Step 2 - sort array quicksort(array,min,size-1); //Step 3 - write array into output file for(i=0; i < size; i++) { fprintf(fpout,"%lu\n",array[i]); } fclose(fpout); return 0; }

Here are the results:

This is really fast! However, it didn’t work in the first try. I had to change the stack size in order to run it. For that purpose, I used the command *ulimit -S -s unlimited*.

**Haskell implementation**

It is shocking how compact this definition can be in Haskell. Here is my implementation:

qsort :: Ord a => [a] -> [a] qsort [] = [] qsort (x:xs) = let smaller = filter (<x) xs bigger = filter (>= x) xs in qsort smaller ++ [x] ++ qsort bigger

This Haskell implementation is around 85% smaller than the C version!! It was much easier to test and I found out almost no errors compared to the C version.

Here is a main function in order to run it:

main = do y <- readFile "numbers.txt" l <- parseList y let s = qsort l writeFile "sorted_numbers.txt" (show s) print "done." where parseList :: String -> IO [Integer] parseList = readIO

I compiled this with the command: *ghc -O3 -o Sort Sort.hs* and I executed it with: time ./Sort +RTS -H64k -k64k

Even though the definition is much smaller than the C version, the execution time was not faster:

real 0m0.099s

user 0m0.092s

sys 0m0.006s

I found it quite inefficient when compared to the C version, so I compiled it with profiling information: *ghc -prof -auto-all -O3 -o Sort Sort.hs.* Here is the result:

As one can see, *qsort* uses few resources compared to the main function, which is much better than I expected.

- In conclusion, although the Haskell version is much cleaner and faster to develop/test, it is though slower than the C version. I also experimented with 90.000 integers and the time difference is much more evident, with the C version achieving 0m0.035s while the Haskell version achieved 0m0.635s of real execution time.