Parallella Ideas, Designs and Executions: code blocks

Showing posts with label code blocks. Show all posts

Tuesday, 19 April 2016

Getting COPRTHR and MPI running

In this post I get to grips with COPRTHR with the new MPI library installed and solve the puzzle as to why direct calls to the epiphany SDK did not work. There were some tricks to getting COPRTHR working but some unexpected benefits as well.

Introduction

Working towards my goal of a general purpose infrastructure to wrap around my neural network simulator, I did some investigation into the Message Passing Interface (MPI) and direct memory access (DMA) memory transfers. The plan is to use MPI to provide the signalling between the host and epiphany and between epiphany cores. The DMA part comes in as a way to parallelise the processing so that you are transferring the next data set into local memory while you are processing the last set. DMA is also a lot quicker for larger data sets than individual reads.

To get MPI working I decided to bite the bullet and get into COPRTHR. While this was a little confusing at first once I'd gotten everything wired properly it all made sense. If you understand stdcl the overall structure is the same and so far, I have not found anything that you can do in stdcl that you cannot do in COPRTHR.

I also want to use MPI because it allows asynchronous data transfer between cores. While direct memory writes are fast, the core being written to plays no part in the process and thus other coordination methods are required to ensure that the data has arrived before it is used. It also requires that the core doing the writing writes to the correct location. This is fine if all cores have the same code base but if they do not there needs to be signaling between sender and receiver to ensure that the sender is not writing over the wrong data (or program code!). MPI provides the abstraction needed to ensure an orderly, arms-length transfer between cores.

There was an unexpected benefit in that COPRTHR has a call that allows you to allocate a chunk of memory from within you kernel. While you still have the option to dynamically change your program and recompile on the fly as I described in a previous post, there is a call that allows you access to unused memory on the epiphany which, if used carefully, can store data for local processing.

Preliminaries

The programs developed for this blog post were done on the 2015.01.30 image using COPRTHR version 1.6.2. The 2016.03 image was release recently so I'll install that soon and update anything that fails (which I don't expect it to do). Please also be aware that version 2.0 of COPRTHR is in the works so check your version before proceeding.

You can have a look at the example code or to clone the source files execute:

git clone https://github.com/nickoppen/coprthr_basic.git

Installing MPI

MPI has been around for a number of years and is a set of calls, implemented on different systems that allows message passing between those systems. Brown Deer brought out a COPRTHR extension recently that implemented those calls for message passing between epiphany cores. While it is an extension to version 1.6.2 I expect that COPRTHR version 2 will have MPI built in with greater functionality (although no details have been release as yet).

Get the COPRTHR mpi extension from here:

http://www.browndeertechnology.com/code/bdt-libcoprthr_mpi-preview.tgz

Just download it, un-compress it somewhere out of the way and run the install script as root:

sudo ./install.sh

Using eSDK calls in an OpenCL program

You can actually use eSDK calls directly from your OpenCL kernel code. However, I found that there is an inconsistency between the layout of the eSDK and the hard wired compile scripts used by the clcc compiler.

To test if this inconsistency exists on your system, fetch the latest version of the parallella examples from github which includes a couple MPI examples:

cd
git fetch parallella-examples
cd parallella-examples/mpi-nbody
make

If this compiles without errors then you are fine.

If you get errors such as:

undefined reference to `e_dma_copy(void*, void*, unsigned long)

undefined reference to `e_mutex_unlock(unsigned int, unsigned int, int*) and

undefined reference to `e_mutex_unlock(unsigned int, unsigned int, int*)

try the following:

ls /opt/adapteva/esdk/tools/e-gnu/epiphany-elf/sys-include

if this gives you "No such file or directory" run the following (note: this is a single command without any breaks):

sudo ln -s /opt/adapteva/esdk/tools/e-gnu/epiphany-elf/include /opt/adapteva/esdk/tools/e-gnu/epiphany-elf/sys-include

Then run:

make clean
make

again from the mpi-nbody directory to check if that fixed it.

If that does not work?

This was the problem with my installation. The way I discovered it (thanks to jar) was by using the cldebug tool:

cldebug -t ./temp -- clcc --coprthr-cc -mtarget=e32 -D__link_mpi__ --dump-bin -I/opt/adapteva/esdk/tools/e-gnu/epiphany-elf/include -I ./ -DECORE=16 -DNPARTICLE=2048 -DSTEPS=32 -DSIZE=1 -DMPI_BUF_SIZE=1024 -DCOPRTHR_MPI_COMPAT mpi_tfunc.c

This looks very complicated but it really just calls the clcc command on the kernel file. The output looks even more horrendous but it contains the output of all the steps that the Brown Deer pre-processor goes through to produce a working executable.

If there is still something missing in your compile, sift through all of this text and look for something that is wrong. Given that I was missing a reference, I focused on include files and directories which occur after -I and -i switches. I found that it was including sys-include which didn't exist on my system. Instead, the .h files (e_dma.h and e_mutex.h) were in a directory called include. To fix it with minimum disruption, I created a soft link (ln -s) of the existing include and called it sys-include. Thus sys-include was an alias for include and the compiler was happy.

Compiling

Below are the steps changes you need to make in Code::Block to get a program to compile using COPRTHR. If you have not used Code::Blocks I suggest that you have a read through my post about setting it up from scratch. In this post I'm only going to cover the changes that you need to make to the stdcl set up described there. If you are an emacs/make fan, I've included a Makefile along with the example code.

Compiler Options

Kernel Compilation

The first two changes are critical for correct compilation of your kernel code. Go to the Compiler Settings menu (Settings | Compiler) and selection the compiler you have set up for compiling with clcc (mine is called BD OpenCL).

Under the Compiler Settings tab choose the Other options tab and type the switches shown below:

Then under the #defines tab define the following two definitions:

Host Application Compilation

Go back to the gcc compiler settings and change the Linker settings tab to include the COPRTHR link libraries (I'm not sure what the m library is for. It is in all the examples so I included it for good measure.):

Then under Search directories maker sure that the compiler can find your included header files:

And that the linker can find the COPRTHR libraries:

Project Properties

As always in a new project, you need to define a build target but in this case the output attributes are not important:

This is the tricky bit. The compiler, with the switches defined above, generates a binary file in the Execution working directory (your project directory by default). This file has the rather verbose extension of bin.3.e32 tacked onto the end of your kernel file name. Forget the .o file generated by the compiler, it is the bin.3.e32 file that you need at execution time. If your host application is being generated and run from your bin/Debug or bin/Release directory, you must copy the bin.3.e32 file to the same place.

To do this use the Post-build steps with a copy command and while you are at it, make the file readable by everyone. (Note: my kernel file is called pfunc.c. Replace this with your kernel file name.)

Also, while you are in this box and after saving your new settings, pop up to the root project settings (coprthr_basic in this example) and switch off All Warnings (-Wall). It will just be annoying.

Then, because we are not interested in the .o or the linker output, switch off linking for the kernel file or files.

Using COPRTHR Host-side

The overall structure of a COPRTHR program is the same as an stdcl program as described in my previous post. The steps you need to go through are also the same but with different calls.

Loading and Compiling

Stdcl has the commands clopen for opening the epiphany and to load a pre-compiled and JIT compiled kernels and clsopen for a kernel stored as an internal string.

COPRTHR has two commands:

int coprthr_dopen( const char* path, int flags); to open the epiphany (which should be matched with a coprthr_dclose()) where path is the #defined value COPRTHR_DEVICE_E32 and flags is COPRTHR_O_THREAD. The integer returned is the device descriptor that is needed in a number of subsequent calls.

and at least one of the following:

coprthr_program_t coprthr_cc_read_bin( const char* path, int flags ); to open a pre-compiled object file (our bin.3.e32 file), where path is the file location of the binary and flags is zero. The return value is a handle to the compiled file.

or

coprthr_program_t coprthr_cc( const char* src, size_t len, const char* opt, char** log ); compiles a string (src) using the compile options (opt) returning handle to the program. The log parameter is a pointer the compiler output. Pass in a NULL pointer and a suitable amount of space will be allocated (which must be then freed).

e.g.

int dd = coprthr_dopen(COPRTHR_DEVICE_E32, COPRTHR_O_THREAD);
printf("dd=%d\n",dd);
if (dd<0)
{
printf("device open failed\n");
exit(0);
}

coprthr_program_t prg;
prg = coprthr_cc_read_bin("./pfunc.cbin.3.e32", 0);
printf("prg=%p \n",prg);
if (!(prg))
{
printf("file mpi_pfunc.cbin.3.e32 not found\n");
coprthr_dclose(dd);
exit(0);

}

There is no call to compile source code in a file so read it into a string and compile it from there.

Allocating Shared Memory

Similar to the stdcl call clmalloc, COPRTHR has the call the call:

coprthr_mem_t coprthr_dmalloc(int dd, size_t size, int flags); where a memory space of size is allocated on the device (dd) returning a handle of type coprthr_mem_t. The argument flags is not used.

I call it a handle rather than a pointer because it is really not working memory. If you try and use it as working memory you will be bitterly disappointed. The only way to write to and read from it is with the calls described in the next section.

Shared memory can be resized using the coprthr_drealloc(dd, size, flags) call and should be freed using coprthr_dfree(int dd, coprthr_mem_t mem) when you are done.

e.g.

coprthr_mem_t p_data_mem = coprthr_dmalloc(dd, WIDTH*sizeof(float), 0);

Writing to and Reading From Shared Memory

In stdcl you would declare a variable of type cl_T * where T is a compatible simple type (e.g.int or float), call clmalloc(), initialise it in your host application and then call clmsync to transfer the data to shared memory.

COPRTHR does things a little differently. The handle returned by coprthr_dmalloc refers to memory in shared space only. To initialise it you need to declare, allocate and write to memory in your host program and then call:

coprthr_event_t coprthr_dwrite(int dd, coprthr_mem_t mem, size_t offset, void* ptr, size_t len, int flags); where dd is your device descriptor, mem is your memory handle, offset is how far into the shared memory you want to start writing, ptr is a pointer to your host storage the contents of which you want written into shared memory, len is the length (in bytes) of the data you want written and flags is one of COPRTHR_E_NOWAIT or COPRTHR_E_WAIT. The return value is an event handle which can be used in the call coprthr_dwaitevent(dd, event).

The contents of your host memory will be written to shared memory. In a similar way, once your kernel has finished its computation and written its results back, you call:

coprthr_event_t coprthr_dread(int dd, coprthr_mem_t mem, size_t offset, void* ptr,

size_t len, int flags); where the arguments and return values are the same except of course the data gets read from shared memory into your host memory.

There is also coprthr_dcopy(...) which copies data from one device's shared memory into another but that is not relevent on the Parallella given that there is only one device.

e.g.

float host_buffer[WIDTH];
for (i=0; i < WIDTH; i++) host_buffer[i] = -1 * i; /// write data to shared DRAM coprthr_mem_t p_data_mem = coprthr_dmalloc(dd, WIDTH*sizeof(float), 0); coprthr_dwrite(dd, p_data_mem, 0, host_buffer, WIDTH*sizeof(float), COPRTHR_E_WAIT);

followed later by:

coprthr_dread(dd, p_data_mem, 0, host_buffer, WIDTH*sizeof(float),COPRTHR_E_WAIT);

Retrieving a Kernel

To call a kernel you need to retrieve a link to it from the program file you just read in or compiled. Similar to stdcl clsym call COPRTHR has:

coprthr_sym_t coprthr_getsym( coprthr_program_t program, const char* symbol); where program is the returned program handle and symbol is the name of the kernel. The return value is the handle to the kernel.

e.g.

coprthr_sym_t thr = coprthr_getsym(prg,"p_func_thread_mpi");
if (thr == (coprthr_sym_t)0xffffffff)
{
printf("kernel p_func_thread not found\n");
coprthr_dclose(dd);
exit(0);

}

Calling a Kernel

There are three ways to call a kernel or kernels. The MPI library uses the call:

coprthr_mpiexec(int dd, int thrCount, comprthr_sym_t thr, void * args, int size, int flags); where dd is the device descriptor, thrCount is the number of threads you want to launch, thr is the handle to the kernel you are calling, args is a type-less pointer to a structure containing the kernels arguments (see next paragraph), size is the size (in bytes) of the structure you are passing and flags (presumably because I have not seen them used) is one of COPRTHR_E_NOWAIT or COPRTHR_E_WAIT. I also presume that the call returns a variable of type coprthr_event_t.

To use the args variable in this call you need to typedef a structure that contains the variables you want to pass to your kernel. Do this in a separate include file and include it in both the host file and kernel file. Declare a variable of this type in your host program in host memory. The simple data types (int, float etc) can be written to the structure but the arrays need to be allocated in shared memory and initialised as described above. The structure is then passed to the kernel as a type-less pointer which can then be cast back to the structure type inside your kernel code.

e.g.

my_args_t args = {
.n = 2.0,
.width = WIDTH,
.p_mem = coprthr_memptr(p_data_mem, 0),
};

coprthr_mpiexec(dd, ECORES, thr, &args, sizeof(args), 0);

The alternative ways are the COPRTHR calls that were available before the MPI library appeared. These are:

coprthr_event_t coprthr_dexec( int dd, coprthr_kernel_t krn, unsigned int nargs, void** args, unsigned int nthr, void* reserved, int flags ); where dd is the device descriptor, krn is the handle to the kernel, nargs is the number of elements in the args variable (see next paragraph), args is a pointer to an array of pointers of values to be passed to the kernel, nthr is the number of times the kernel should be executed, reserved is not used and flags is one of COPRTHR_E_NOWAIT or COPRTHR_E_WAIT. The return value is an event that can be used used in the call coprthr_dwaitevent(dd, event).

and:

coprthr_event_t coprthr_dnexec( int dd, unsigned int nkrn, coprthr_kernel_t v_krn[], unsigned int v_nargs[], void** v_args[], unsigned int v_nthr[], void* v_reserved[], int flags); which allows a number of different kernels to be executed at once. Each array is a is a collection of the same arguments in the dexec call.

The big difference between mpiexec and dexec is how the arguments are passed. The dexec call only accepts arrays and those arrays have to be in shared memory. This means that passing singleton data via a one element array which is a bit clumsy. But remember, if you use mpiexec to call the kernel, you don't have to use any MPI.

e.g.

float n[] = { 2.0 };
int width[] = { WIDTH };
coprthr_mem_t p_arg_n = coprthr_dmalloc(dd, sizeof(float), 0);
coprthr_mem_t p_arg_width = coprthr_dmalloc(dd, sizeof(int), 0);
coprthr_dwrite(dd, p_arg_n,0,&n, sizeof(float), COPRTHR_E_WAIT);
coprthr_dwrite(dd, p_arg_width, 0, &width, sizeof(int), COPRTHR_E_WAIT);
void * p_args[] = { &p_arg_n, &p_arg_width, &p_data_mem };

coprthr_dexec(dd, thr, 3, p_args, ECORES, 0, COPRTHR_E_WAIT);

Changes to Kernel Code

Finally, now that you have called your kernel, you need to retrieve your arguments. Remember that shared memory is slow so if you want to use them more than once it is better to make a local copy.

Kernels called with coprthr_mpiexec you need to cast the single kernel argument as your defined structure:

__kernel void p_func_thread_mpi( void * p_args )

{

my_args_t* pargs = (my_args_t*)p_args;

...

}

With coprthr_dexec you need to de-reference the singleton arguments:

__kernel void p_func_thread_std( float * p_arg_n,  int* p_arg_width, void * p_mem )

{

 float n = p_arg_n[0];

 int cnt = p_arg_width[0] / ECORES;

 float * g_pmem = (float *)p_mem;  /// to be used in dma copy

...

}

Reserving Local Memory

In stdcl there is no way to dynamically allocate space other than with source code modification and dynamic compilation. COPRTHR has two calls that allows you to use free space in local memory:

void * coprthr_tls_sbrk(int size); which returns the contents of the system variable containing the first byte of free memory and sets the free memory address system variable to be size bytes further along

and:

coprthr_tls_brk(void * addr); sets the system variable containing address of the first byte of free memory to be addr.

These calls should be used as following (if you are into neatness):

/// store the starting position of free space

void * memfree = coprthr_tls_sbrk(0);

/// grab a chunk of free space for local storage

float * localSpace = (float*)coprthr_tls_sbrk(cnt * sizeof(float));

/// do something with the space

/// reset the free space pointer to where it was initially

coprthr_tls_brk(memfree);

WARNING: The value returned by coprthr_tls_sbrk is an actual memory address. Remember this is bare metal programming with no comfy operating system preventing you writing data where you shouldn't.

Copying using DMA

Once you have a reference to shared memory (called global memory in OpenCL terms) and allocated some space, you need to copy it into memory local to the core. The eSDK documentation shows the basic synchronous copy command:

int e_dma_copy(void *dst, void *src, size_t bytes); which copies bytes of data from src to dst returning 0 if successful. This is a synchronous copy using the E_DMA_1 channel.

In our example the src is the pointer to global memory and the dst is the memory address returned by coprthr_tls_sbrk. You'd better make sure that bytes in e_dma_copy is the same as size in coprthr_tls_sbrk.

e.g.

e_dma_copy(localSpace, g_pmem + (cnt * rank), cnt*sizeof(float));

Asynchronous use of the DMA channels requires the use of:

int e_dma_start(e_dma_desc_t *descriptor, e_dma_id_t chan);

int e_dma_busy(e_dma_id_t chan);

void e_dma_wait(e_dma_id_t chan);

void e_dma_set_desc(e_dma_id_t chan, unsigned config, e_dma_desc_t *next_desc, unsigned stride_i_src, unsigned stride_i_dst, unsigned count_i, unsigned count_o, unsigned stride_o_src, unsigned stride_o_dst, void *addr_src, void *addr_dst, e_dma_desc_t *descriptor);

Which I will cover in a future post.

Final Thoughts

This post is more a how-to post on using COPRTHR and a toe-in-the-water look at MPI and DMA copying. While I'm convinced that the combination will lead to a better structured and faster kernels, I'm not sure by how much and at what cost in terms of the complexity of the program.

My next step with be to compare MPI with direct memory writes along the lines of my data passing experiment. Until then, please let me know if I've missed anything.

Monday, 11 May 2015

Tweaks for the new Parallella image (20150130)

Upgrading to the new Parallella image: Recommended - but I have some tips and warnings.

Why Upgrade? - Performance

The first thing that you will notice when you have everything running is that it all goes a little faster. For me, the lag associated with using the Parallella has gone from just noticeable to mostly unnoticeable. The big applications still take a while to load but the day to day stuff does not annoy me any more.

Also, the reboot is a lot more reliable.

Where to get it

The Getting Started pages have been radically upgraded as has the download page. Getting the right image is easy and loading this image is a single write operation that is well explained. To expand the file system to fill the whole SD card I used gparted on my laptop which also serves as my file backup machine.

Tip and Warnings In Summary:

Get your heating under control
Install LXDE and tightvncserver (and code::blocks if you are using it)
Upgrade to the Brown Deer OpenCL version 1.6.2
Tweak your paths

Heat

First let me say that blurting out a comment that the new image is hopelessly unreliable and that you are reverting to the old one is a dumb thing to do. Just get your fan running so that the Zync stays under 70C (156F) and the new (and extremely rigorous) temperature daemon does not shut your Epiphany down. Nuff said.

PS. If you don't want to run xtemp continuously, I suggest you try olajep temperature script:

#!/bin/bash
dir=$(dirname $(grep -rl xadc /sys/bus/iio/devices/*/name))
raw_file=${dir}/in_temp0_raw
offset_file=${dir}/in_temp0_offset
scale_file=${dir}/in_temp0_scale

raw=`cat ${raw_file}`
offset=`cat ${offset_file}`
scale=`cat ${scale_file}`

c_temp=`echo "scale=1;(($raw + $offset) * $scale) / 1000" | bc`
f_temp=`echo "scale=1;(($c_temp * 9) / 5) + 32" | bc`

echo
echo "Zynq Temp: $c_temp C / $f_temp F"
echo

which will print the current temperature out on the command line.

Packages

I'm using a micro-server Parallella and I am NOT a command line jock. I need a mouse, windows, buttons, menus and all that good stuff. The image I downloaded did not come with LXDE (the Lightweight X-windows Desktop Environment) which was confusing at first. So, once you have booted up I suggest you run:

sudo apt-get update

to update the existing packages, followed by:

sudo apt-get upgrade

to pick up any upgrades. Then:

sudo apt-get install lxde tightvncserver

to install all that Xerox-Parc windowing goodness.

OpenCL

The image also came with version 1.6.0 of the Brown Deer OpenCL compiler. Attempting to run executables compiled with v1.6.0 will not work. To get the latest version:

su

to log in as root. To put the source files somewhere out of the way I used to root home directory, so:

cd

The clone the sources files:

git clone https://github.com/browndeer/coprthr.git

and from there it was plain sailing:

cd coprthr
./configure --enable-epiphany
make
make install (or sudo make install if you have not logged in as root)

I didn't have to install any required packages or point to any oddly placed libraries. (NB: this was my experience - as always if something goes wrong you'll have to sniff around to figure out the solution)

Paths

When I upgraded, I took a copy of my various projects and copied them, byte-for-byte to the new file system. After installing code::blocks and sorting out the compiler, I really expected them to work. What I found was that there were numerous "file not found" errors when I didn't get them before. I'm not sure why it could be either the new environment or the new compiler but you need to add a few more paths.

To find local include files the compiler search paths needs to include . (dot, the current directory).

Click Add and type .

To use the JIT compiler while Debugging your host code you need to include a PATH variable in the invocation command.

The full string in the Executable path: text box is:

sudo -E LD_LIBRARY_PATH=/usr/local/browndeer/lib:/opt/adapteva/esdk/tools/host/lib PATH=/usr/local/browndeer/bin:/opt/adapteva/esdk/tools/host/bin:/opt/adapteva/esdk/tools/e-gnu/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin EPIPHANY_HDF=/opt/adapteva/esdk/bsps/current/platform.hdf /usr/bin/gdb

This is the path from before with a cut down PATH variable inserted.

I have not checked this on Eclipse but I would presume that it would be the same.

Final Thoughts...

The only other thing I can think of is that your default login is now parallella (pwd: parallella) and so your home path is /home/parallella and not /home/linaro. If you use any explicit paths in your code or anywhere else, you'll need to change that.

If I have forgotten anything else I've tweaked I'll edit this post.

On balance, I think the upgrade was worthwhile. I use the parallella as a development machine with no massive libraries (e.g. OpenCV) and so code::blocks is the biggest thing I need. I run BOINC but getting that going is easy. If the UI response does not bother you and you have a lot of stuff installed (and are in the middle of something monumental) than maybe you might want to keep going with the old image for now.

Up Next...

More neural networks.

Thursday, 6 November 2014

Developing OpenCL with Code::Blocks on the Parallella

Part 1: Compiling and Linking OpenCL with Code::Blocks

In this post I'll go through the steps you need to set up Code::Blocks IDE to compile and link both host code and accelerator (OpenCL) code using GNU GCC and the Brown Deer offline compiler, clcc and linker clld.

While Brown Deer provide an OpenCL JIT compiler, using it during development can be tedious given that you can only tell if your OpenCL code is compiling correctly by running your application. Using the offline compiler from within Code::Blocks allows you to stay in one environment for both host and accelerator code.

I've been learning Code::Blocks because I was finding Eclipse tediously slow. This might improve in future but for now Code::Blocks performs better on the Parallella and critically, the compile time is as good as you would get if you were using the command line. I found that Eclipse was a little better organised but once I'd figured out my way around Code::Blocks it was fine. Most of the oddities are in the compiler and project setup and you don't tend to spend much time there once these are set.

I have to acknowledge the contribution of AndyC firstly for bringing my attention to Code::Blocks and then for helping me along the way when I got stuck. His original post is here: http://forums.parallella.org/viewtopic.php?f=13&t=1658.

I'm using the Ubuntu 14.04 distro with Brown Deer OpenCL ver. 1.6.0 and Code::Blocks 13.12.

Prerequisites

1. Download Code::Blocks using: sudo apt-get install codeblocks

2. Rebuild the Brown Deer compiler! Yes I was surprised too. The installation in the standard image has a problem with the linker. The getting started guide is here: http://www.browndeertechnology.com/docs/Parallella_quick_start_guide.pdf. It was written for version 1.5 but section 2.3 is relevant to version 1.6 as well.

Get the sources: git clone https://github.com/browndeer/coprthr.git and pay careful attention to the prerequisites, libconfig, libelf and libevent (GCC will be installed already). When I ran configure it could not find event.h. If this happens to you, first check if you can find the library with whereis:

linaro-nano:~> whereis libevent
libevent: /usr/lib/libevent.so /usr/lib/libevent.la /usr/lib/libevent.a

This shows that libevent was installed but I had to look into configure to discover that it was expecting event.h to be in .../include/event2. The ... turned out to be /usr so I had to use the --with-libevent=/usr. So my final configure command was:

./configure --enable-epiphany --with-libevent=/usr

As with anything on Linux, if it does not work first time you have to sniff around a bit.

The make and sudo make install worked fine for me.

Code::Blocks

Code::Blocks has various ways of organising your work. In this post I'm going to concentrate on the setup you would use if you are primarily writing in C++ and OpenCL. Therefore I'm going to set the global settings to be those use for OpenCL development. These will be inherited by all subsequent projects.

I'm also developing a command line project. Other types of project may need variations on these settings.

Setting up the Compilers

The first thing to do is to set up the compilers that you are going to use. You can do this before creating a project.

The object of this section is to set up a C++ compiler and an OpenCL compiler. If you only have one file with your accelerator code and want to keep your life a little simpler, read the sections on setting up GNU GCC and the basic build targets (Debug and Release) and use the method described in the last Project Setup section entitled Aside: Single File Compilation.

GNU GCC

The "standard" compiler on the Parallella is the GNU GCC compiler and we'll use this for the host code.

To get to the compiler settings go to the Settings menu and choose Compiler... which will bring up this box:

GNU GCC is the default selected compiler and there are some standard flags set. These can stay where they are.

Go to the Linker settings tab and add the Brown Deer libraries that you will be using. Here I've only added stdcl and ocl, but for example, if you are using COPRTHR then add coprthr.

Next go to the Search directories tab and add:

and similarly add /usr/local/browndeer/lib to the Linker sub-tab.

The Toolchain executables and Custom variables tabs can be alone. In the next tab along, Build options I set as following but these are not terribly important:

[Note: if you only want to compile one file with accelerator code in it, now is the time to skip forward to the section on Creating a Project and then the section titled: Aside: Single File Compilation.]

Brown Deer OpenCL

Next we have to set up the tool chain to compile the cl code. This is a little more complicated because the offline compiler has to be invoked with root privileges. We'll get to that at the very end.

First, create a copy of the GNU GCC compiler settings using the Copy button in the top section of the Global compiler settings box. This will first ask you for the name of the new compiler which I have called BD OpenCL:

This will create an exact copy of everything that we had for the GNU GCC compiler. From here on, make sure you select the BD OpenCL compiler otherwise you can get into a real mess.

First, switch off the standard flags:

Next, remove the settings from the Linker settings and Search directory settings that we set for GNU GCC. These are not needed.

The next tab to look at is the Toolchain executables. This is critical. Ignoring the Debugger entry for now it must look like this:

The next tab, Custom variables is a way of setting "environment like" variables from within Code::Blocks. I found that I had to be explicit with all environment variables because my Linux user environment got a little mangled on the way in. I defined the following three:

These are merely local copies of your bash environment variables $PATH, $LD_LIBRARY_PATH and $EPIPHANY_HDF.

This next bit brings it all together.

Code::Blocks puts all of the settings together using a template and then passes the resulting string to a shell process. The template is found at the very end of the compiler settings - on the Other settings tab. Click the Advanced options button and don't be put off by the scary warning box:

Once you have pressed the Yes button with confidence, you get:

See, not that scary after all.

The template is the text in the Command line macro box. Because it is chopped off the whole compiler string is:

sudo EPIPHANY_HDF=$EPIPHANY_HDF LD_LIBRARY_PATH=$LD_LIBRARY_PATH PATH=$PATH bash -c "$compiler $options $includes $file -o $object"

Take note of the Command macros: list on the right hand side of the box. If the Compiler complains of something being missing (e.g. if you are want to use a resource compiler) you will have to find the macro from the list and include it in the appropriate place in the Command line macro.

Also, a quick word about sudo. Sudo is a funny little thing that gives you super user powers and little else. If you type sudo env at the command prompt you will see that sudo's environment is very sparse. The -E switch is supposed to preserve the callers environment but I found that the environment that Code::Blocks stripped out LD_LIBRARY_PATH and even by explicitly setting it I found that -E did not seem to pass it on. Therefore, the most reliable way of creating the correct environment for clcc was to set it explicitly at invocation.

Similarly, the linker template screen is:

The full string is:

sudo EPIPHANY_HDF=$EPIPHANY_HDF LD_LIBRARY_PATH=$LD_LIBRARY_PATH PATH=$PATH:/bin bash -c "$linker $link_options $libdirs $link_objects $libs -o $exe_output $object"

Note that the Command: I've selected is Link object files to console executable. There are a number of different options here. You need to figure out which one is invoked based on your build target (see below).

Project Setup

We have not quite gotten to the stage of being able to compile an application yet. There are some project specific steps that are needed first. Unfortunately they are a little circular and difficult to lay out in a nice linear fashion so you might have to go through this process a couple of times before everything is in place.

Creating a Project

The File | New > Project... will do the job without any tricks. I've only used Console Applications so far and I generally assume that I'm writing in C++. You can leave the default compiler as GNU GCC.

Adding Files

At this point is is a good idea to add some files.

Adding a C++ file for the host code is also straight forward. Use the File | New > File... menu. When you get to the screen below, just add the file name with path and select the C++ build targets (more on this below).

Add a CL file is essentially the same. I just choose an Empty C/C++ file and give it a name ending in .cl. Leave the build targets unchecked.

Once added it will appear in the Others folder in your project.

Build Targets

I think of a build target as an output that you want to produce in your project. For the purposes of this post, I want to produce a Linux console application containing the host code and an elf (executable loadable file) containing the accelerator code.

You have already created at least one build target when you created your project. The default targets are called Debug and Release and are created as a result of this step in the new project sequence:

To add, change or remove a build target you can either choose the Project | Properties menu or right click on the project and choose Properties. Either way you will get this screen:

The only changes I've made here are the Platforms (remove Windows and Mac) and Object names generation because I like to call my main cpp file and cl file the same name.

The default build targets will be pretty much created correctly. The only little wrinkle is that the project creation dialog creates a "Hello World!" called main.cpp and this is set as the build target file. As I said above, I like to call my main file after my project so I have to create a new cpp file in the project and assign to the Debug and Release build target and then delete main.cpp.

[This next section is only relevant if you have set up the Brown Deer OpenCL compiler as a separate build step. If you are going to use the single .cl file option then skip forward to the section titled: Aside: Single File Compilation.]

To set up the build target for your cl files click on the Build targets tab and then click the Add button. In my example below, I've deleted the default Release build target and called my new target CL. Fill out the fields as shown here and select you cl file(s) in the Build target files box at the bottom:

Now that you have a build target for your accelerator code, you can select that when you create a new cl source file.

Project Build Options

Notice the Build options... button. This is where you choose which compiler is used for the selected build target and the default compiler switches for the project. Click Build options... or select Project | Build options from the main screen.

Two things to change here. With the CL build target selected, first set the Selected compiler to the OpenCL compiler we set up above, in this case we called it BD OpenCL. Second, set the Policy to be "Use target options only". The reason for this is that common compiler options (e.g. -Wall) are set globally and this setting overrides these global settings. To change the global settings click on the project name (in this case "ndfork") in the left hand panel.

This dialog is where you set all of your project specific settings. If there are include files, libraries, compiler variables that are only relevant to this project and not to all projects then dig around here and find where to set them. The Pre/post build steps tab can be useful if you want to do some behind-the-scenes shuffling before or after compilation. The output will appear on the compile log.

Final Step (the bit I always forget...)

The last thing you need to do is to set the build target and build steps for your cl file(s). In the project panel right-click on the cl file and choose Properties and then click on the Build tab:

Select both Compile file and Link file and CL in the Belongs in targets box. For the cl file that you added before setting up the CL build target none of these boxes will be checked. Once you have added the CL build target you can select it as the build target when you add the file. The Compile file and Link file check boxes are not selected by default and so, if you wish to compile and link them, you must check these for every file you add.

Aside: Single File Compilation

By now you are probably quite astounded about the amount of complexity there is getting the compilers and project set up. I was. If you are only going to have one cl file for your accelerator code and want to avoid some of that complexity then there is a way to compile your accelerator code without setting up a separate compiler and build targets. It is "back door" way of compiling that by-passes the selected compiler.

If you right-click on the cl file, click Properties and select the Advanced tab you can define a Custom build. With the GNU GCC Compiler selected check the Use custom command to build this file: box. Then in the text box below it you need to put the whole compile string:

Note that the compiler has a full path and the Options (-c and -v) are now explicit. This is because we are now using the settings for the GNU GCC compiler and are in its environment. Therefore the $options variable refer the the GNU GCC options. The $file and $output will still work because they refer to this file.

I've left in the $EPIPHANY_HDF, $LD_LIBRARY_PATH and $PATH variables to be passed to sudo. If you want to do this you still have to set them up as custom variables (it would be better to set them using Project | Build options... rather than as a global setting associated with the compiler). If you include /usr/local/browndeer/bin in the $PATH then you can leave it out of the bash call.

For those who skipped the OpenCL compiler setup, the values of the variables are those from your bash environment.

Compiling

Now that everything is in place we can compile a target.

From the main screen I use the tool bar but the Build menu provides the same functions or you can use Ctrl-F9 which must be the dumbest short-cut ever devised.

There are a couple of quirks to the OpenCL compilation.

The offline compiler, clcc creates an intermediate file which is then passed to the epiphany compiler in the standard SDK. This means that the file name associated with the error will look something like UfLa0a.cpp. This means that double-clicking on the error will not highlight it in the editor. Oddly enough, if you #include another cl file which has an error, the file name associated with that error is correct and double-clicking works.

The intermediate stages also means that any error causes a cascade of errors after it. Just start from the top of the list recompiling as you go... the actual errors should be pretty obvious.

Adding a Virtual Target

If we leave things as they are now, we will have to build twice, once for each build target. If this is too tedious, Code::Blocks allows the creation of virtual targets which are combinations of "real" targets.

To create a virtual target go to the build targets tab in the project properties box (Project | Properties > Build targets). From here click the Virtual targets... button:

All pretty simple but there is one trick here. Click on your cl target first (CL in my example) and close. Click Virtual targets... again and click on your host code target (Debug in the example). This is to get the cl target to compile first which (if it compiles correctly) will produce the object file ready for linking (see linking below).

One word of warning here... check your target files after you do this. My cl file in the CL target was switched off and switched on in the Debug target.

Now you can choose your virtual target on the button bar and have both targets compile sequentially.

Checking your ELF file

Brown Deer have included a version of the nm utility called clnm. This will show you what the content of the elf file is:

If everything worked properly you should see your kernels in the list and some entries showing your source code if you have not used the -b (only binaries) compile option. (I'm not exactly sure why the kernels appear twice.)

Using Compiled Accelerator Object Code

At this point we have generated a .elf file and confirmed its contents. ~~I believe that there is a way of using the .elf file directly but I can't find a description of how to do it. If anyone can give me a hand on this I'd appreciate it.~~ While working on the debugging I had a thought, "Maybe I can link the elf file just like the object file?". After all, an elf seems to be analogous to a DLL on Windows. Turns out that this is the case. So the method described below works for either the object file or the elf file.

Linking

The method described in the Brown Deer documentation uses the object file containing your accelerator code and links it into your executable. To do this you need to add the accelerator code object file onto the end of the link stage of the main executable. Go to the Project | Build options... click on the options for your main executable and then the Linker settings tab:

The Other linker options string must be the path to the cl object code. This will be added onto the end of the list of object files that are passed to the linker.

Calling

Calling your kernel from your host code is a tiny change. Your call to clopen using the JIT compiler would look like this:

void* clhandle = clopen(context, (char*)string_to_cl_file, CLLD_NOW);

With the code linked in:

void* clhandle = clopen(context, 0, CLLD_NOW);

The path to your accelerator code has been replace with a ZERO! That's it... nothing more to do. The short pause that happens as the JIT compiler goes about its business will disappear and everything else will look the same.

Final Thoughts

I hope that this guide has saved you some time and I hope Code::Blocks suits your way of working. As always, please let me know if I've missed anything or if any bits are confusing in any way.

Up next in Part 2 I'm going to get into some Debugging action. I split the two up because this guide is big enough as it is. I have not gotten into it yet so I hope that there will not be any back tracking to do on the compiler setup.

Cheers.