/dev/random

Streaming from the Raspberry Pi Camera with gstreamer

raspberrypi gstreamer linux guide

In this post, I will describe my setup for video streaming from a Raspberry Pi camera. I wanted something to observe birds around the house, and had a Pi 3B already laying around; so I thought it would be a nice project. I had the following requirements in mind:

Architecture

Streaming frontend

Due to the requirement of working on any device, I chose to deliver my stream via HLS and build a simple client as a Progressive Web App. While HLS may have some disadvantages, the stream can be played with almost any web browser and also a number of native applications. With the PWA web client, I can view the stream from any device without installing anything, but I also have the choice to install it as an app for easier access on my own phone and desktop. The client application is really basic in its current state, and most of it was simply created using ChatGPT, so I will not elaborate on it further in this post.

Backend

On the Raspberry Pi, I am using the new libcamera suite to interact with the camera; more specifically, the libcamera-vid application. The old tools raspistill, raspivid, etc. are now deprecated with the new Raspbian OS based on Bullseye.

As the raspberry Pi is running on WLAN for this use case, the connection is a limiting factor for video streaming. I tested it with iperf3 and got 30Mbps on a TCP test, with 1-3% of packet loss on the UDP tests. This means that I couldn't use any lossless compression or even MJPEG (at least not for high resolutions and framerates), so I need to encode the video to h264 directly on the Pi. The Raspberry Pi 3B has dedicated video encoding hardware, which allows for encoding at least 1080p30 video (see here Note: this is actually the specification for the 3B+, but it is identical to the 3B except for higher CPU clocks. Actually, all Pis before the 4 share the same video processor, the Broadcom VideoCore IV according to the documentation). Conveniently, the libcamera-vid application can use this encoder to produce h264 directly.

Currently, I want to use the video stream in two ways:

In the future, it may be interesting to analyze the video to integrate additional features, for example:

For these reasons, I decided that I would stream the video to my home server, which could then do further processing and HLS serving, so that the load on the Pi is as low as possible.

Raspberry Pi Camera stream

Since I want to be able to capture video at night, I ordered a night vision variant of the original Raspberry Pi Camera based on the OV5647 sensor. The only difference to the original camera module is the lack of an IR filter, which allows the camera to see at night with the help of two infrared LEDs. This night vision ability comes with an important drawback though: The camera cannot differentiate IR light from visible red light without the IR filter, so the image loses a lot of color accuracy. With regular color settings, the whole image has a strong pinkish-red hue. This can be corrected with special color tuning so that white looks relatively normal, but there is still almost no saturation to the image (it is almost grayscale). In the future, I might experiment with a dual-camera system to capture color-accurate video at daytime and switch to the IR view at night.

With the command libcamera-hello --list-cameras, we can see the different camera modes:

localadmin@raspberrypi:~ $ sudo libcamera-vid --list-cameras
Available cameras
-----------------
0 : ov5647 [2592x1944] (/base/soc/i2c0mux/i2c@1/ov5647@36)
    Modes: 'SGBRG10_CSI2P' : 640x480 [58.92 fps - (16, 0)/2560x1920 crop]
                             1296x972 [43.25 fps - (0, 0)/2592x1944 crop]
                             1920x1080 [30.62 fps - (348, 434)/1928x1080 crop]
                             2592x1944 [15.63 fps - (0, 0)/2592x1944 crop]

Be aware that the results are wrong if a libcamera application is currently running. Of these modes, I decided to use the 1296x972 mode, which covers the whole sensor area and uses pixel binning to give a half-resolution image. I tried using the 1920x1080 mode as well, which uses a cropped view of the full-resolution sensor image, but the percieved resolution was the same due to image noise and H264 encoding.

On the software side, I am using libcamera-vid to stream the image as H.264 over the network, using the following systemd unit:

[Unit]
Description=Pipe RPi camera stream into network

[Service]
ExecStart=libcamera-vid --tuning-file /usr/share/libcamera/ipa/raspberrypi/ov5647_noir.json --height 972 --width 1296 -t 0 --inline -n --rotation 180 --listen -o tcp://0.0.0.0:1
Restart=always

[Install]
WantedBy=multi-user.target

Processing and HLS streaming

To redistribute the video via HLS and save it to disk I am using a gstreamer pipeline, which does the job reasonably well. Similar to the streaming process, it is running as a systemd unit:

[Unit]
Description=Distribute raspberry pi camera stream via HLS
Requires=network.target

[Service]
ExecStart=gst-launch-1.0 tcpclientsrc host=raspberrypi.ad.krisnet.de port=1 do-timestamp=true ! h264parse ! avdec_h264 ! videorate ! clockoverlay time-format="%%c" ! x264enc tune=zerolatency speed-preset=veryfast ! h264parse ! hlssink2
WorkingDirectory=/var/www/html/stream/
Restart=always

The pipeline is constructed as follows:

Currently, I am not saving the stream anywhere, but this can be added easily using a tee, an appropriate muxer and a filesink.

Timestamping and Jitter issues

Getting the timing for the video frames correct was a major problem for me. As far as I understand, a raw H264 stream does not contain timestamps for the individual frames. There is some rudimentary timing information contained in the Sequence Parameter Set (SPS), but it only specifies a frame rate. At first, i did not know about the do-timestamp parameter for tcpclientsrc. Without it, I would get a lot of jitter, which can be well observed with the timer: Sometimes, up to 10 seconds worth of real time would pass in a single second in the output video. At other times, the stream freezes for multiple seconds, waiting for the input data to catch up. I assume that the problem is the following: Without timestamps, all components in the pipeline work as fast as possible, and frames drop out of the end as soon as they are ready. Additionally, there are probably pipeline components that take non-constant time for processing frames, which adds jitter. The H264 encoder is probably to blame here, but there may or may not be other components that do some buffering.

By adding do-timestamp, the jitter improves a lot - I have included a comparison below, showing just the timer in the video with both options. As far as I understand the synchronization concepts of gstreamer, it tries to keep all sinks in sync with the source and each other by applying latency:

For this mechanism, we obviously need correct timestamps on the video frames. Ideally, we would work with the actual timestamps from the video, but these are not available in the raw H264 stream. The TCP timestamps are reasonably good, but not perfect, since there is a considerable amount of jitter on the Wi-Fi connection. To test this, I played the video in VLC media player on my computer while capturing the packets in Wireshark. Below is an RTT graph of my TCP session, which shows that RTTs can go up to 33ms or more, which is already the time between two frames at 30fps. This indicates that at least some of the jitter problems could be resolved by using more correct timestamps. However, playing back the video on VLC would often freeze for multiple seconds at times, and I have no idea why - there must be another underlying problem preventing smooth playback.

Side note: Processing the camera video directly with GStreamer

For a number of reasons, I also experimented with using gstreamer on the Pi to burn in the clock overlay directly into the raw camera image:

I found two options for getting the raw camera video in gstreamer:

I implemented a similar pipeline to the one above to burn in the current date and time, with the major difference that I would now be using the hardware video encoder using v4l2h264encode and v4l2convert. There used to be omxh264encode as well, but it seems to be deprecated in favor of the V4L2 APIs.

libcamera-vid --tuning-file /usr/share/libcamera/ipa/raspberrypi/ov5647_noir.json --codec yuv420 -t 0 --width 1344 --height 1024 --framerate 30  -o - | gst-launch-1.0 fdsrc fd=0 ! rawvideoparse width=1344 height=1024 framerate=30/1 format=2 ! videoconvert ! videoflip method=rotate-180 ! clockoverlay ! v4l2h264enc extra-controls="controls,repeat_sequence_header=1" ! \'video/x-h264,level=(string)4.2\' ! tcpserversink port=1 host=0.0.0.0

Ultimately I abandoned this approach, since it was significantly slower than using libcamera directly. In addition, I realized a few days later that burning in timestamps on the Pi directly might be undesirable after all if I want to play with computer vision stuff.

Notes

This was the first time I really worked with video streaming and gstreamer in particular, so I am sharing some notes that I wish I would have known:

There is always a trade-off between latency and jitter

This may be obvious for anyone who has worked with media streaming before, but I was not aware of it. Hence, I tried to optimize my pipeline delays by reducing the number and size of HLS segments. With this pipeline, my browser stream was freezing every few seconds, waiting for the next segment. This segment is not always ready whenever the previous one is fully played, since it may sometimes take longer than usual to produce a frame in the pipeline due to hardware contention, network delays, etc. By increasing the number and size of segments back to the default, the buffering problem was gone: The added latency means that a segment taking longer than usual will still arrive in time for the browser to play it back because of the delay.

As I mentioned in the beginning, a bit of latency is fine for me in this application. In the future, I might want to add an alternative "low-latency" mode on the Pi, by transmitting a smaller image size, using a simpler encoding (MJPEG) and broadcasting this stream directly via UDP multicast. This can hopefully reduce encoding delay and remove any delay coming from TCP and my reencoding process on the server.

libcamera_vid with H264 encoding enough gpu_mem to function correctly

I was trying to push higher resolutions at some point during my experimentation, and got an error from libcamera. By googling, I stumbled over this GitHub Issue, where some comments indicate that they had issues due to their GPU allocation being too high, leaving too little room for CMA: Therefore, I had set gpu_mem=32 in /boot/config.txt at some point, and was surprised to get an error at the 1296x972 resolution which had worked just fine before. In this case, the libcamera log just contains *** Failed to start output streaming ***. In dmesg I received the error bcm2835_codec_start_streaming: Failed enabling i/p port, ret -2, which ultimately helped me to track down the issue after a lot of frustration

Pipeline negotiation issues

I would very often get the following log while experimenting with the pipeline:

Jul 17 02:52:31 birdcam gst-launch-1.0[9766]: Setting pipeline to PAUSED ...
Jul 17 02:52:31 birdcam gst-launch-1.0[9766]: Pipeline is PREROLLING ...
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Redistribute latency...
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Redistribute latency...
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: ERROR: from element /GstPipeline:pipeline0/GstTCPClientSrc:tcpclientsrc0: Internal data stream error.
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Additional debug info:
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: ../libs/gst/base/gstbasesrc.c(3127): gst_base_src_loop (): /GstPipeline:pipeline0/GstTCPClientSrc:tcpclientsrc0:
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: streaming stopped, reason not-negotiated (-4)
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: ERROR: pipeline doesn't want to preroll.
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Setting pipeline to NULL ...
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Freeing pipeline ...

On first sight, it appears that the error comes from the tcpclientsrc - but this is actually not the root error cause here. Instead, pay attention to the line streaming stopped, reason not-negotiated (-4). The error in this line may be different, but here it indicates a negotiation error. Negotiation is the process of finding appropriate formats for the data flow within a pipeline, and if there are two elements that cannot negotiate a common format for their connection, gstreamer will throw an error. Sadly, there is usually no additional information, so it is up to the user to find the error in the pipeline, usually by going through all links between pads and checking the capabilities with gst-inspect-1.0 or the documentation. In this case, I forgot an h264parse.

Related note: h264parse does not pass the parsed info correctly through a tee and queue. One has to pass the raw data through and parse after the queue.

Layer 8 Problems

I have to admit that I wasted more time than would have been necessary on debugging, mostly because I developed some sort of "tunnel vision" while working on the same problem for too long. Particularly with various nondescript gstreamer errors, I often found myself late at night adding elements to the pipeline or changing properties without really knowing whether the result would work out, just because I wanted to somehow fix the issue at all costs. Similarly, the act of just skimming over issue descriptions and copying instructions that "looked right" for my problem ultimately lead to me wasting half a day on debugging the issues caused by my gpu_mem modification. It would have helped if I had taken a break to come back more relaxed and look at the documentation properly. Just leaving this here as a warning to myself for future endeavors.

Comments


If you have any questions or comments, please feel free to reach out to me by sending an email to blog(at)krisnet.de.