A tutorial on non-separable 2D convolutions in Vivado HLS

2020-04-27T16:39:40-07:00

Hello, I am working on a project to apply different filters on an input image (Bilateral filter etc..) through HW.
Do you think you can enlighten me on how you can use the last IP core with an image (openCV) through the test bench? I don’t really understand the image to an AXI stream conversion for this example.

Regards
Gilles

LikeLike

Reply

2020-04-27T19:00:47-07:00

Hi Gilles,
Great question. The key is to use a Matrix datatype from OpenCV. HLS has an OpenCV library in the form of “hls_opencv.h”, which you can find more about here:
https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_1/ug1233-xilinx-opencv-user-guide.pdf

I also created a small code example to illustrate how it can be used:

	// Basile Van Hoorick, April 2020

	#include "hls_opencv.h"

	// (NOTE: the defined constants and included imports in this gist might be incomplete, but shouldn't be too hard to infer)

	int test_bf()
	{
	cv::Mat src = cv::imread(INPUT_PATH, CV_LOAD_IMAGE_GRAYSCALE);
	cv::Mat dst = cv::Mat(DHEIGHT, DWIDTH, CV_8U);
	hls::stream<depth_t> stream_in("stream_in");
	hls::stream<depth_t> stream_out("stream_out");

	// Tested data values should be in a range of 0.0 to 1.0
	// Input image pixels are in a range of 0 to 255
	// Read & convert image
	for (int y = 0; y < DHEIGHT; y++)
	{
	for (int x = 0; x < DWIDTH; x++)
	{
	depth_t value = depth_t(double(src.at<uchar>(y, x) + 0.5) / double(255.0));
	stream_in.write(value);
	}
	}

	// Apply bilateral filter
	cout << "Starting bilateral_filter…" << endl;
	bilateral_filter_3x3(stream_out, stream_in);
	cout << "bilateral_filter finished!" << endl;

	// Convert & write image
	for (int y = 0; y < DHEIGHT; y++)
	{
	for (int x = 0; x < DWIDTH; x++)
	{
	depth_t value = stream_out.read();
	dst.at<uchar>(y, x) = uchar(double(value) * double(255.0) + 0.5);
	}
	}
	cv::imwrite(OUTPUT_PATH, dst);

	return 0;
	}

view raw

bilateral_filter_test.cpp

hosted with ❤ by GitHub

By the way, if this is relevant to you, I have also covered the optimization of a bilateral filter in my thesis – if you are interested, feel free to check out:
https://lib.ugent.be/fulltxt/RUG01/002/786/298/RUG01-002786298_2019_0001_AC.pdf

Hope this helps!

LikeLike

Reply

2020-04-28T16:08:55-07:00

Thank you very much for the example ! And yes there is a lot of information in your thesis that helps me understand more of the HLS workflow and the design of these filters.

Cheers !

LikeLike

2020-04-28T16:15:16-07:00

Would you also by any chance have a good reference(or example) to understand DMA/BRAM/DRAM implementation into a PetaLinux OS? So you would be able to acces while running the OS platform.

Big thanks for the help given already !

Regards
Gilles L.

LikeLike

2020-04-28T17:49:41-07:00

Writing software around your hardware platform is a very configuration- and device-specific thing that varies from case to case. A lot of documentation in this world is unfortunately not so easy to find nor of the best quality. Nevertheless, try looking up guides/articles that are tailored toward your OS or development board, there should be at least some basic examples that demonstrate communication between the CPU and FPGA. The block design in Vivado is a challenging aspect in my opinion, but I found that YouTube tutorials helped me understand AXI DMA. Good luck!
Best, Basile

LikeLike

2020-05-12T21:34:40-07:00

Hello again,
Last time you helped me to understand the conversion in the testbench and everything validates perfectly for my design. Deploying this filter core in hardware on a bare metal application is where I don’t understand the image transfer and storage in the hardware (Minized DDR). I have been searching forums/references/papers, but most of the information I find is Video based with a camera input and output rather than an image. So you are kind of ‘my last hope’.

I’ve tried with a DMA linked to the IP-core (filter) in my block design and would think when I setup a transfer with image data in array format(640×480 unsigned char) to the filter it would be in the AXI Stream format protocol. Yet doing this makes the transfer get stuck where I think it is not in the correct format to the filter ? The filter does expect a stream as below :

typedef hls::stream<ap_axiu > AXI_STREAM;

void filter(AXI_STREAM& video_in, AXI_STREAM& video_out)
{
//Create axi streaming interfaces for core.
#pragma HLS INTERFACE axis port=video_in
#pragma HLS INTERFACE axis port=video_out
hls::Mat2AXIvideo(img_4, video_out);
……
……
}

I have tried with a AXIS subset converter to ensure the data.last = 1 when the transfer is done, but I might be searching in the complete wrong direction. If you have any insights on this I would greatly appreciate it. I am breaking my head on how to process a simple image in the hardware with memory management..

Again thank for the fast reply & help before
Gilles

LikeLike

Reply

2020-05-12T22:08:03-07:00

Hi,
The block design itself (IP core + AXI DMA) sounds good. I personally used a datatype struct with the last signal only, but yours should work as well. I don’t think an AXIS subset converter is needed here, but don’t quote me on this. Are you using Xilinx SDK for the software platform and calling XAxiDma_SimpleTransfer()? If so, did you try disabling cache?

LikeLike

Reply

2020-05-13T19:07:07-07:00

Hello
Did you use this struct as the input streams to the block or as side channels inside the core function?
You are right about the converter, it it unnecessary. I am using Vitis with the XAxiDma_SimpleTransfer(). Calling Xil_DCacheDisable(); & Xil_ICacheDisable(); doesn’t work either.

To understand the transfers I created a simple gain core function with an array of 1000 ints as inputs and debugged it with an ILA. From what I understand is as soon as I change the input array size from 1000 to 1 higher or 1 lower the DMA wont transfer anything at all. Maybe this means I am incorrectly sending/reading the image in Vitis ? I have used a matlab script to generate an input array of chars.
Would love to hear your opinion on this.

thanks for all the help
Gilles

LikeLike

2020-05-13T20:59:26-07:00

Hi Gilles,

If I understand your question correctly, I meant to say that I used hls_stream<data_t> for input and output where data_t is a struct consisting of for example a 32-bit int/float and 1-bit ‘last’ flag. Vitis is something I’ve never used or heard of, so the advice I can give is unfortunately going to be relatively superficial. I would try to ensure that the array size is always a multiple of 4, and also that the length register is sufficiently large (I remember the standard bitwidth being rather small, this can be adjusted somewhere in the Vivado block design, but I don’t recall the exact details). It’s good that you do have at least one working program though – one thing you could try as a general approach is to find an example (online / tutorial / whereever) that is most closely related to your project, and incrementally change things by small steps at a time toward your specific goal, thus pinpointing any issues along the way.

Hope this helps in some way!
Basile

LikeLike

2020-05-13T21:53:33-07:00

Hello

Transferring an image array should be the same as the integer array right?

Finding examples with a video input is sufficient, with an image not so much so it is hard to find a good start. Vitis is part of the Xilinx 2019.2 design suite. In older version this was the SDK environment and so they are almost the same with extra additions. Again thanks for the help and your time.

Gilles

LikeLike

Reply

2020-05-16T00:38:40-07:00

Well, in case of RGBA/ARGB/etc. with 4 bytes, then yes you have an array of 32-bit integers covering one pixel per value (make sure to double check the format though, because many data types exist for images, but of course you can control that yourself in the surrounding software).

LikeLike

Reply

2021-12-11T12:20:24-08:00

Hello, great Scholas!
By any chance, can someone kindly help out with this my final project task? This is my first time learning this course and it is a bit difficult for me to understand. Below are the details to perform the task in VIVADO.
Lab requirements:
◼ Implement a 2-D convolution using HLS C.
◼ Optimize the HLS with directives.

If by any means someone can be a help, I will be truly grateful. Kindly provide the source code if possible.

LikeLike

Reply

A tutorial on non-separable 2D convolutions in Vivado HLS

Background and task

Version 1: a naive implementation

Version 2: efficient streaming using line buffers

Final code

Published by basilevh

12 thoughts on “A tutorial on non-separable 2D convolutions in Vivado HLS”

Leave a reply to Gilles Lenaerts Cancel reply

Background and task

Version 1: a naive implementation

Version 2: efficient streaming using line buffers

Final code

Share this:

Related

Published by basilevh

12 thoughts on “A tutorial on non-separable 2D convolutions in Vivado HLS”

Leave a reply to Gilles Lenaerts Cancel reply