GPU Project 06 – HLS IPs

Hi!

Well, due to being very busy at work I hand’t had a chance to actually post progress on the project, but we most definitely have progress! If you have been following these posts, you can see that last time we sketched out the overall architecture of the video card. 

In order to render point clouds we only need the three blocks that are highlighted. Basically, a mechanism for pulling in raw vertexes from memory, a block that can transform the 3d points to a 2d screen space, and a block that can take those points and draw them on a frame buffer. So, here they are!

Vertex Pump

I debated a lot on how I should create this block. Basically, I did not know if I wanted to read data from DDR or from a small block ram. The block ram approach uses far lets LUTs, but places a rather short upper limit on the number of vertexes that the design can render. The DDR approach can render an insane ammount of vertexes, but at the expense of a lot of extra logic to interface to the bus (and all the routing involved). In the end, since I wanted to iterate designs quickly I settled on the BRAM approach. I figured that once everything else is done and debugged I could go back and upgrade this block to interface to DDR is needed. This is the code for the Vertex pump:

void vertex_pump_master( uint32_t number_of_vertices, int32_t objects[1024],
 POINT_STREAM &points_out,
 uint32_t fb_address_a, uint32_t *fb_address
){


#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE s_axilite port=number_of_vertices
#pragma HLS INTERFACE s_axilite port=fb_address_a

#pragma HLS RESOURCE variable=objects core=RAM_1P_BRAM
#pragma HLS INTERFACE bram port=objects

#pragma HLS INTERFACE axis port=points_out
#pragma HLS DATA_PACK variable=points_out

#pragma HLS INTERFACE ap_none port=fb_address

 ap_uint<16> index = 0;
 ap_uint<2> state = 0;
 coordinate_t new_point;

 *fb_address = fb_address_a;

 for(index = 0; index < (number_of_vertices<<2); index ++){
 switch( state ){
 case 0:
 new_point.x = ((ap_fixed<32, 22, AP_TRN, AP_SAT >)objects[index])>>10;
 break;
 case 1:
 new_point.y = ((ap_fixed<32, 22, AP_TRN, AP_SAT >)objects[index])>>10;
 break;
 case 2:
 new_point.z = ((ap_fixed<32, 22, AP_TRN, AP_SAT >)objects[index])>>10;
 break;
 case 3:
 points_out << new_point;
 break;
 }
 state++;
 }
}

It has an AXI lite bus that the CPU can use to configure the number of vertexes, and fire up the process. It pulls data in from an external BRAM and basically pumps it out as an AXI-Lite stream. It takes several clock cycles to read the x, y and z coordinates from the external BRAM. Very simple, but best to keep designs modular and simple to maintain.

Screen transform

This block converts the 3d points into a 2d representation on the screen. Probably the most important part of the design so far. Basically, I took the c# code that I had already debugged and plugged it in an axi stream pipe. I used the hls::stream construct to get it done without complications. This guarantees that the inputs are only read once, the outputs are only written once and things just work correctly in a pipeline. This is the code that is used to describe it:

points >> camera_space;

bx = camera_space.x * (width_fx / camera_space.z);
 by = camera_space.y * (height_fx / camera_space.z);

pixel_out.x = (h_width + bx);
 pixel_out.y = (h_height - by);
 pixel_out.color = 0xFFFFFFFF;

pixels << pixel_out;

Note the awesome >> << notation.

Point painter

This block takes in pixels and writes them out to the frame buffer. This block is necessary because we are drawing pixels in random locations on the screen, so we need non-raster access to the frame buffer.

void point_painter( hls::stream<screen_point_t> pixels, uint32_t * frame_buffer
){

   #pragma HLS INTERFACE ap_ctrl_none port=return
   #pragma HLS INTERFACE m_axi port=frame_buffer offset=direct

   #pragma HLS INTERFACE axis port=pixels
   #pragma HLS DATA_PACK variable=pixels
   screen_point_t pixel_to_draw;
   pixels >> pixel_to_draw;
   uint32_t address = pixel_to_draw.y * WIDTH + pixel_to_draw.x;
   frame_buffer[address] = pixel_to_draw.color;
}

After synthesizing the modules and importing them into Vivado I created a block design with the necessary support blocks. This is how it looks like:

The point pointer is connected to a high performance slave port of the PS section.

Next time we will write some software to bring it alive! The GPU depends entirely on having the right information to render anything on the screen, so we will need to alternate between HW and SW development. See you next week!

Leave a Reply