Vision in FIRST Robotics with the TX1/TX2 | Identifying Yellow Cubes with Python

In past FIRST Robotics competitions, retro reflective tapes with bright LEDs provided an easy and clear target for vision systems. Most of the work was needed in interpreting those values in the  robot code itself. This years game is focus on possession of power cubes. Our team saw the value of identifying those power cubes with vision. We purchased a TX1 and wanted to use GRIP as we had in years past. GRIP doesn’t run natively on ARM processors or the architecture of the TX1, aarch64. Fortunately, in recent versions, GRIP has released an export code function, designed to generate a pipeline for Java, C++, or Python.

You can view our project here.

Getting Started and Setting Up the TX1/TX2

The Jetson needs to be updated from an Ubuntu 14.04 machine at the time of writing (16.04 coming soon). I installed Ubuntu 14 in a VMWare Workstation virtual machine. From there you plug into the TX1 through the micro-USB port and put it into recovery mode. I pretty much followed this well made guide. Just make sure to select File System and OS, Drivers, Flash OS Image to Target, CUDA toolkit, and OpenCV for Tegra during install. You may also want VisionWorks depending on how you plan to implement vision but we did not explore that this year. If you do not install OpenCV for CUDA, your code will not utilize the GPU.

Edit: The OpenCV pipeline we used this year still did not use the GPU. None of the functions we used are supported. Compiled code will always work better with the GPU, please comment if you make a cpp or Java version of this or something similar.

Implementing the GRIP Pipeline

First make a GRIP pipeline from your computer. The input source doesn’t matter, you can plug the webcam into your computer.

Then export your GRIP pipeline in Python. NetworkTables and input source aren’t included in this pipeline.

Now you have your GRIP pipeline. Within this file is a class which we will reference from our runner, in my case

class GripPipeline:

From here we have to choose a camera input source. You may want to use a USB camera for the lowest latency. We streamed our camera to an HTTP port, so I could edit the GRIP project on my laptop using the stream. We used mjpg-streamer. This also includes controls for exposure and brightness to ensure color consistency. You may have build issues on the TX1, we fixed this by editing a certain set of files. You might be able to find the fix on StackExchange here. Newer versions of Nvidia JetPack use OpenCV 3 and this may not be an issue.

You might get errors when compiling as seen above. The OpenCV libraries installed by Nvidia are out of date and require the following changes in the file:


Add cv:: before each instance of CAP_*.

If you use a USB camera plugged into the TX1:

image = cv2.VideoCapture(0)

If you use an HTTP port:

while streamRunning:
  bytes +=
  a = bytes.find(\xff\xd8)
  b = bytes.find(\xff\xd9)
  if a != 1 and b != 1:
  jpg = bytes[a:b+2]
  bytes = bytes[b+2:]
  color = cv2.CV_LOAD_IMAGE_COLOR if version == 2 else cv2.IMREAD_COLOR #name better
  frame = cv2.imdecode(np.fromstring(jpg, dtype=np.uint8), color)


From here, run the frame through the GRIP pipeline, do some math depending on what you need to find, and publish the values to the network. If you need help look at my project or what RickyAvina did in 2017. Feel free to leave a comment down below!

View our two cube auto at SF here!

Edit: We won the Innovation in Control Award for our work on vision at the Sacramento Regional!

Posted in Computers and OSs, Robotics.

Leave a Reply

Your email address will not be published. Required fields are marked *