Sunday, 3 August 2014

OpenCL on the Parallella using Eclipse

Writing OpenCL code on the Parallella board using Eclipse for people who just want to write code and not become Linux experts.

Finally, what I would call my first "execution" blog post. I got my board a couple of months ago and while it's been fun playing with it, achieving the ultimate goal - writing code - has been frustrating to say the least.

I've tried a number of different approaches only to realise (yet again) that keeping it simple while you don't know what you are doing is by far the best approach. 

Where I ended up last week was - "how about I try and get some existing code running in eclipse on the parallella". Hopefully, the code will be compilable and correct so the only think I'll have to get right are the settings. To this end I found a short program that generated a colourised mandelbrot set posted by dar on the Parallella site. Don't try and find this code because it has some pretty glaring errors. If you want to try this procedure use the code below.

This really is a beginners guide. If you are already familiar with Eclipse you won't need most of this. I also don't make any claim about how efficient the code is.


0. Before you begin


The prerequisites are:

  • the Parallella SDK
  • the Brown Deer Technology SDK
  • the environment variables PATH and LD_LIBRARY_PATH set correctly
  • the Eclipse development environment


The first three come with the Ubuntu 14.04 release and I'm assuming future releases will also include them. For the record, the environment variables on my system are:

LD_LIBRARY_PATH=/usr/local/browndeer/lib:/usr/local/lib:/opt/adapteva/esdk/tools/host/lib:


PATH=/usr/local/browndeer/bin:/opt/adapteva/esdk/tools/host/bin:/opt/adapteva/esdk/tools/e-gnu/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games

Installing eclipse is a matter of:

sudo apt-get install eclipse

Once you down load eclipse you need to C/C++ development addin CDT. To get this you use Help | Get New Software and pack your patience - it is SLOOOOW.

On tricky thing: for a program to gain access to the Epiphany co-processor it must be run by root. That means that you must log in as root (using sudo won't work) and therefore you need a root password. To reset the root password you run:

sudo passwd

which will give you the usual new password prompt followed by the are you really sure prompt. Then to run you program you use the command:

su

from your non-root login (linaro by default). This will then prompt you for the root password. Once you have logged on as root, check the environment variables PATH and LD_LIBRARY_PATH. They are needed at run time so root must have them set as described above.

1. Your first Eclipse Project


1.1 Get Yourself a New Project

To start with you need a new C/C++ project. The File | New | Project pops up the box to choose what type of project you want. I always choose C++ but for this particular project a standard C Project would be fine. I called my project mandelbrot (not very original I know).

1.2 Set Up your Compile Settings

Not a lot to do here but absolutely critical.

Your compile settings are accessible from the Project | Properties menu. For Properties to be active you need to have the project tab and your project within it selected.

Your tool chain should look like this:



Note I've got the C++ compiler and linker in there.

Your includes should look like this:



This will tell the compiler to find all of the Brown Deer Technology stuff.

Finally, your linker settings should look like this:


I'm pretty sure that this is all you need. I did fumble around a lot with other settings so if this does not work please let me know.

1.3 Some Code

For CL projects you need host code and Epiphany code. The host code is compiled using the gcc/g++ compilers/linker etc in the tool chain. The Epiphany code is compiled at run time by the Brown Deer JIT compiler.

I created a source file folder (src) under the mandelbrot project folder. For the embedded paths in the code to work you should do this as well. The host code I ended up with was:

// The modifications porting this code to OpenCL are
// Copyright (c) 2012 Brown Deer Technology, LLC.
//
// Mandelbrot.c
// Written by User:Evercat
//
// This draws the Mandelbrot set and spits out a .bmp file.
// Should be quite portable (endian issues have been taken
// care of, for example)
//
// Released under the GNU Free Documentation License
// or the GNU Public License, whichever you prefer:
// 9 February, 2004.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdcl.h>
#include <errno.h>

#define OUTFILE "./mandelbrot.bmp"

#define WIDTH 1024
#define HEIGHT 768

#define CENTRE_X -0.5
#define CENTRE_Y 0
#define ZOOM 300

#define ITERATIONS 1024  // Higher is more detailed, but slower...

// Plotting functions and parameters...

#define bailoutr(n) (5*n  )
#define bailoutg(n) (20*n  )
#define bailoutb(n) 0

#define min(a,b) ((a<b)?a:b)

// Colours for the set itself...

#define IN_SET_R 0
#define IN_SET_G 0
#define IN_SET_B 0

void drawbmp(int width, int height, unsigned char* pixels, char * filename);

/////////////////////////////////// MAIN PROGRAM ///////////////////////////////////

int main (void)
{

   float startx; float endx;
   float starty; float endy;
   float dx; float dy;
   float dx_over_width,dy_over_height;

   char kern[] = "../src/mandel_kern.cl";
   void * openHandle;

   int iterations = ITERATIONS;
   int width = WIDTH;
   int height = HEIGHT;

   char strInfo[20];
   FILE * pFile;

   pFile = fopen(kern, "r");
   if (pFile == NULL)
   {
  printf("Opening the Kernel file: %s produced an error(%d). Make sure that the source code variable kern has a valid path to the cl code and that the code is readable.\n", kern, errno);
  exit(0);
   }
   else
  fclose(pFile); // only open the file to check that it is there and readable

   pFile = fopen("./debug", "w");

   fprintf(pFile, "About to malloc pixels\n");
   cl_uchar* pixels = (cl_uchar*) clmalloc(stdacc, width * height * 3, 0);

   startx = CENTRE_X - ((float) WIDTH / (ZOOM * 2));
   endx = CENTRE_X + ((float) WIDTH / (ZOOM * 2));

   starty = CENTRE_Y - ((float) HEIGHT / (ZOOM * 2));
   endy = CENTRE_Y + ((float) HEIGHT / (ZOOM * 2));

   fprintf(pFile, "Plotting from (%f, %f) to (%f, %f)\n", startx, starty, endx, endy);

   dx = endx - startx;
   dy = endy - starty;
   dx_over_width = dx / width;
   dy_over_height = dy / height;


   fprintf(pFile, "Opening kernel file:%s\n", kern);
   openHandle = clopen(stdacc, kern, CLLD_NOW);

   fprintf(pFile, "Getting the kernel with clsym\n");
   cl_kernel krn = clsym(stdacc, openHandle, "mandel_kern", CLLD_NOW);

   clGetKernelInfo(krn, CL_KERNEL_FUNCTION_NAME, sizeof(strInfo), strInfo, NULL);
   fprintf(pFile, "The kernel is called: %s\n", strInfo);

   fprintf(pFile, "Calling clndrange\n");
   clndrange_t ndr = clndrange_init1d(0, height, 16);

   fprintf(pFile, "Calling clforka\n");
   clforka(stdacc, 0, krn, &ndr, CL_EVENT_WAIT,
      iterations, width, startx, starty, dx_over_width, dy_over_height, pixels);

   fprintf(pFile, "Transferring memory contents from the Epiphany using clmsync\n");
   clmsync(stdacc, 0, pixels, CL_MEM_HOST|CL_EVENT_WAIT);

   fprintf(pFile, "Calling drawbmp\n");
   drawbmp(width, height, pixels, OUTFILE);

   fprintf(pFile, "Saved bitmap to %s. Done.\n", OUTFILE);
   clfree(pixels);
   fclose(pFile);

   return 0;
}


void drawbmp (int width, int height, unsigned char* pixels, char * filename) {

   unsigned int headers[13];
   FILE * outfile;
   int extrabytes;
   int paddedsize;
   int x; int y; int n;

   extrabytes = 4 - ((width * 3) % 4); // How many bytes of padding to add to
                                       // each horizontal line - the size of
                                       // which must be a multiple of 4 bytes.
   if (extrabytes == 4)
      extrabytes = 0;

   paddedsize = ((width * 3) + extrabytes) * height;

   // Headers...

   headers[0]  = paddedsize + 54;      // bfSize (whole file size)
   headers[1]  = 0;                    // bfReserved (both)
   headers[2]  = 54;                   // bfOffbits
   headers[3]  = 40;                   // biSize
   headers[4]  = width;  // biWidth
   headers[5]  = height; // biHeight
                                       // 6 will be written directly...
   headers[7]  = 0;                    // biCompression
   headers[8]  = paddedsize;           // biSizeImage
   headers[9]  = 0;                    // biXPelsPerMeter
   headers[10] = 0;                    // biYPelsPerMeter
   headers[11] = 0;                    // biClrUsed
   headers[12] = 0;                    // biClrImportant

   outfile = fopen (filename, "wb");

   // Headers begin...
   // When printing ints and shorts, write out 1 character at time to
   // avoid endian issues.

   fprintf (outfile, "BM");

   for (n = 0; n <= 5; n++)
   {
      fprintf(outfile, "%c", headers[n] & 0x000000FF);
      fprintf(outfile, "%c", (headers[n] & 0x0000FF00) >> 8);
      fprintf(outfile, "%c", (headers[n] & 0x00FF0000) >> 16);
      fprintf(outfile, "%c", (headers[n] & (unsigned int) 0xFF000000) >> 24);
   }

   // These next 4 characters are for the biPlanes and biBitCount fields.

   fprintf(outfile, "%c", 1);
   fprintf(outfile, "%c", 0);
   fprintf(outfile, "%c", 24);
   fprintf(outfile, "%c", 0);

   for (n = 7; n <= 12; n++)
   {
      fprintf(outfile, "%c", headers[n] & 0x000000FF);
      fprintf(outfile, "%c", (headers[n] & 0x0000FF00) >> 8);
      fprintf(outfile, "%c", (headers[n] & 0x00FF0000) >> 16);
      fprintf(outfile, "%c", (headers[n] & (unsigned int) 0xFF000000) >> 24);
   }

   // Headers done, now write the data...

   for (y = height - 1; y >= 0; y--)  // BMPs are written bottom to top.
   {
      for (x = 0; x <= width - 1; x++)
      {
         // Also, it's written in (b,g,r) format...

         fprintf(outfile, "%c", pixels[(x * 3) + 2 + (y * width * 3)]);
         fprintf(outfile, "%c", pixels[(x * 3) + 1 + (y * width * 3)]);
         fprintf(outfile, "%c", pixels[(x * 3) + 0 + (y * width * 3)]);
      }
      if (extrabytes) // See above - BMP lines must be of lengths divisible by 4
      {
         for (n = 1; n <= extrabytes; n++)
         {
            fprintf(outfile, "%c", 0);
         }
      }
   }

   fclose (outfile);
   return;
}

Create a new file in your project and paste this into it.

The Epiphany code is:

#define set_red(n) (5*n  )
#define set_green(n) (20*n  )
#define set_blue(n) 0

__kernel void mandel_kern(
   int iterations,
   int width,
   float startx, 
   float starty, 
   float dx, 
   float dy, 
   __global uchar* pixels
)
{
int threeXwidth = 3 * width;
   unsigned char line[threeXwidth];
   int i, j, n, m, pixelBase;
   float x, y, r, s, nextr, nexts;

   j = get_global_id(0);

   for (i = 0; i < width; i++) 
   {

      x = startx + i*dx;
      y = starty + j*dy;

      r = x; 
      s = y;

      for (n = 0; n < iterations; n++) 
      {

         nextr = ((r * r) - (s * s)) + x;
         nexts = (2 * r * s) + y;
         r = nextr;
         s = nexts;
         
         if ((r * r) + (s * s) > 4) break;

      }

      if (n == iterations) n=0;

      line[(i * 3) + 0 ] = min(255,set_red(n));
      line[(i * 3) + 1 ] = min(255,set_green(n));
      line[(i * 3) + 2 ] = min(255,set_blue(n));   

   }

pixelBase = j * threeXwidth;
  for (m =0; m < threeXwidth; m++)
  pixels[pixelBase + m] = line[m];

}

Paste it into a new file called mandelbrot.cl. The compiler used by eclipse won't do anything with this file, it is here just for convenience.

1.4 Save and Build

If you've gotten this far you should now be able to save and build the program without any problems.

2 The Fun Starts Here


2.1 Debugging Host Code

Nothing works first time right? Right! I know that we are all geniuses but even we slip up occasionally.

Compile time on the host side is pretty much as usual. Get the includes and libraries right and it will all compile and link.

Run time debugging is when things start to get tricky. Prior to the execution of any code on the Epiphany accelerator using the Eclipse front end to the gdb debugger works as you would expect. However, the Epiphany expects the calling application to have root privileges. I started my project as an ordinary user (linaro) and when it came to starting Eclipse as root it got a bit sticky. First, logging in as root in a terminal window and then calling Eclipse didn't work at all and typing sudo eclipse from Run on the start menu got Eclipse running but then struck all sorts of permission problems. I reverted to old school debugging.

After the release of the 2015.1 release of Parallella Ubuntu, applications no long need root privileges to run. See my post on setting up the run time environment to use Eclipse for host debugging. However, you can still use old school debugging if you wish.


Notice I have opened a file called ./debug and I write a line to it pretty much before every call in the program. This is the cleaned up version of the debug writes that I used to figure out what was going on. I write them to a file to separate my output from the text that the stdcl libraries produce on the console (especially when the cl code actually runs). This is generally enough and if the Brown Deer documentation was a little more comprehensive would see you through.

Another lack in the documentation is how you go about checking if your call actually worked. (Notice that the only check I do is to open the cl file to see if it is there and readable.) If something does not work you get a Segmentation Fault a bit later. The Brown Deer's stdcl library takes away a lot of the verbosity of "generic" OpenCL code. For example, you don't have to create a context of your own, the library provides you with the stdacc global variable and that is the context for the Epiphany accelerator. However, what is lacking in the documentation is how to check if the calls worked e.g. if the clopen command found and successfully compiled the code. It might be there. If I find it I'll edit this post or write a new one.


2.2 Debugging Epiphany Code

I'm glad that I didn't really have to do this much.

Compile time with the JIT compiler is at run time for the host code. The output for the compiler is displayed onto the calling console. It reminded me of learning Pascal at university in the early 80's. I was fortunate that the cl code was correct enough for the compile time error to be obvious and with a little nutting out the changes I had to make were not too onerous. I think the lesson here is "Keep your kernels as simple as you can".

For run time debugging on the Epiphany there is a version of gdb called e-gdb. I did have not to go that far for this project but I think it will be key when things get more complicated. Andreas Olofsson got back to me about the developments in this area. There is significant work being done and things will get a lot easier in the near future. I think that for the moment, getting your code running as best you can in a friendly environment before you package it up into a kernel. Just make sure that you are only using the somewhat limited functions that are available if cl (the little copy loop at the end of the cl code above replaces a memcpy command in the original code).

I think I'll write another blog post once I get some idea about debugging on the Epiphany.


3. Final Thoughts


This guide is really only a toe in the water. So far I have only used the stdcl library and not gone anywhere near the COPRTHR library. There's lots of good stuff there that I have not gotten into yet.

The ARM cores on the Parallella are fine for a little snippet like the example but a bit too low powered for significant development. Cross compilation from a more powerful machine is the next step and I hope to write another blog entry on that in the near future.

Oh and one final thing... the output: