Streaming from the Raspberry Pi Camera with gstreamer

17 Jul 2023 raspberrypi gstreamer linux guide

In this post, I will describe my setup for video streaming from a Raspberry Pi camera. I wanted something to observe birds around the house, and had a Pi 3B already laying around; so I thought it would be a nice project. I had the following requirements in mind:

The stream should be viewable from any device without having to install an app.
The stream should be as smooth as possible. In particular, I want to avoid jittering / droppped frames or waiting for the stream to buffer.
Low latency is secondary, but it should be as low as possible while still keeping the above requirements.

Architecture

Streaming frontend

Due to the requirement of working on any device, I chose to deliver my stream via HLS and build a simple client as a Progressive Web App. While HLS may have some disadvantages, the stream can be played with almost any web browser and also a number of native applications. With the PWA web client, I can view the stream from any device without installing anything, but I also have the choice to install it as an app for easier access on my own phone and desktop. The client application is really basic in its current state, and most of it was simply created using ChatGPT, so I will not elaborate on it further in this post.

Backend

On the Raspberry Pi, I am using the new libcamera suite to interact with the camera; more specifically, the libcamera-vid application. The old tools raspistill, raspivid, etc. are now deprecated with the new Raspbian OS based on Bullseye.

As the raspberry Pi is running on WLAN for this use case, the connection is a limiting factor for video streaming. I tested it with iperf3 and got 30Mbps on a TCP test, with 1-3% of packet loss on the UDP tests. This means that I couldn't use any lossless compression or even MJPEG (at least not for high resolutions and framerates), so I need to encode the video to h264 directly on the Pi. The Raspberry Pi 3B has dedicated video encoding hardware, which allows for encoding at least 1080p30 video (see here Note: this is actually the specification for the 3B+, but it is identical to the 3B except for higher CPU clocks. Actually, all Pis before the 4 share the same video processor, the Broadcom VideoCore IV according to the documentation). Conveniently, the libcamera-vid application can use this encoder to produce h264 directly.

Currently, I want to use the video stream in two ways:

Stream it directly via HLS to the web client
Record the stream to disk to watch it later

In the future, it may be interesting to analyze the video to integrate additional features, for example:

motion detection, so I only record to disk when something interesting is happening
using computer vision / machine learning to detect and identify birds
create time-lapse videos

For these reasons, I decided that I would stream the video to my home server, which could then do further processing and HLS serving, so that the load on the Pi is as low as possible.

Raspberry Pi Camera stream

Since I want to be able to capture video at night, I ordered a night vision variant of the original Raspberry Pi Camera based on the OV5647 sensor. The only difference to the original camera module is the lack of an IR filter, which allows the camera to see at night with the help of two infrared LEDs. This night vision ability comes with an important drawback though: The camera cannot differentiate IR light from visible red light without the IR filter, so the image loses a lot of color accuracy. With regular color settings, the whole image has a strong pinkish-red hue. This can be corrected with special color tuning so that white looks relatively normal, but there is still almost no saturation to the image (it is almost grayscale). In the future, I might experiment with a dual-camera system to capture color-accurate video at daytime and switch to the IR view at night.

With the command libcamera-hello --list-cameras, we can see the different camera modes:

localadmin@raspberrypi:~ $ sudo libcamera-vid --list-cameras
Available cameras
-----------------
0 : ov5647 [2592x1944] (/base/soc/i2c0mux/i2c@1/ov5647@36)
    Modes: 'SGBRG10_CSI2P' : 640x480 [58.92 fps - (16, 0)/2560x1920 crop]
                             1296x972 [43.25 fps - (0, 0)/2592x1944 crop]
                             1920x1080 [30.62 fps - (348, 434)/1928x1080 crop]
                             2592x1944 [15.63 fps - (0, 0)/2592x1944 crop]

Be aware that the results are wrong if a libcamera application is currently running. Of these modes, I decided to use the 1296x972 mode, which covers the whole sensor area and uses pixel binning to give a half-resolution image. I tried using the 1920x1080 mode as well, which uses a cropped view of the full-resolution sensor image, but the percieved resolution was the same due to image noise and H264 encoding.

On the software side, I am using libcamera-vid to stream the image as H.264 over the network, using the following systemd unit:

[Unit]
Description=Pipe RPi camera stream into network

[Service]
ExecStart=libcamera-vid --tuning-file /usr/share/libcamera/ipa/raspberrypi/ov5647_noir.json --height 972 --width 1296 -t 0 --inline -n --rotation 180 --listen -o tcp://0.0.0.0:1
Restart=always

[Install]
WantedBy=multi-user.target

The tuning file does the color correction for the camera without IR filter (noir)
--inline instructs the H264 encoder to include PPS/SPS headers with every I-Frame. This is needed for streaming, since it allows a receiver to start decoding at any I-frame.
--listen is required for the TCP server feature, it instructs libcamera to wait for an incoming connection before starting the actual stream.
-o tcp://0.0.0.0:1 starts a TCP server listening on any address at Port 1. Note that libcamera-vid terminates as soon as the connection is closed, so I added Restart=always in the systemd unit

Processing and HLS streaming

To redistribute the video via HLS and save it to disk I am using a gstreamer pipeline, which does the job reasonably well. Similar to the streaming process, it is running as a systemd unit:

[Unit]
Description=Distribute raspberry pi camera stream via HLS
Requires=network.target

[Service]
ExecStart=gst-launch-1.0 tcpclientsrc host=raspberrypi.ad.krisnet.de port=1 do-timestamp=true ! h264parse ! avdec_h264 ! videorate ! clockoverlay time-format="%%c" ! x264enc tune=zerolatency speed-preset=veryfast ! h264parse ! hlssink2
WorkingDirectory=/var/www/html/stream/
Restart=always

The pipeline is constructed as follows:

the tcpclientsrc connects to my raspberry pi given its hostname. do-timestamp applies a timestamp with the current time to the buffers as they are received.
h264parse parses some H264 stream information from the raw byte stream, and sets the caps appropriately for an H264 stream - these are required for the decoder element
avdec_h264 decodes the H264 stream to get raw video which we can manipulate
clockoverlay time-format="%%c" adds a clock overlay. The time-format accepts string suitable for strftime, where %c prints date and time appropriately for the current locale. Note that % is a special character in systemd units and has to be escaped
x264enc tune=zerolatency speed-preset=veryfast: We encode the stream back to H264 for HLS streaming. We choose tune=zerolatency to minimize the additional latency added by the encoder, and speed-preset=veryfast to avoid excessive stress on the CPU
h264parse is required for the following hlssink2 to split segments intelligently
hlssink2 takes care of everything necessary for HLS streaming: It splits the incoming H264 stream into segments, writes them into transport stream files, and creates the m3u8 playlist. It is possible to specify paths for the playlist and stream files specifically, but I found it easiest to simply set the WorkingDirectory to the stream output file using systemd.

Currently, I am not saving the stream anywhere, but this can be added easily using a tee, an appropriate muxer and a filesink.

Timestamping and Jitter issues

Getting the timing for the video frames correct was a major problem for me. As far as I understand, a raw H264 stream does not contain timestamps for the individual frames. There is some rudimentary timing information contained in the Sequence Parameter Set (SPS), but it only specifies a frame rate. At first, i did not know about the do-timestamp parameter for tcpclientsrc. Without it, I would get a lot of jitter, which can be well observed with the timer: Sometimes, up to 10 seconds worth of real time would pass in a single second in the output video. At other times, the stream freezes for multiple seconds, waiting for the input data to catch up. I assume that the problem is the following: Without timestamps, all components in the pipeline work as fast as possible, and frames drop out of the end as soon as they are ready. Additionally, there are probably pipeline components that take non-constant time for processing frames, which adds jitter. The H264 encoder is probably to blame here, but there may or may not be other components that do some buffering.

By adding do-timestamp, the jitter improves a lot - I have included a comparison below, showing just the timer in the video with both options. As far as I understand the synchronization concepts of gstreamer, it tries to keep all sinks in sync with the source and each other by applying latency:

Before the pipeline is playing, it will compute the overall latency produced by all elements in the pipeline as l
Every sink will delay an input buffer with timestamp t to play it at exactly t + l
Periodically, the actual pipeline latency may be reevaluated.

For this mechanism, we obviously need correct timestamps on the video frames. Ideally, we would work with the actual timestamps from the video, but these are not available in the raw H264 stream. The TCP timestamps are reasonably good, but not perfect, since there is a considerable amount of jitter on the Wi-Fi connection. To test this, I played the video in VLC media player on my computer while capturing the packets in Wireshark. Below is an RTT graph of my TCP session, which shows that RTTs can go up to 33ms or more, which is already the time between two frames at 30fps. This indicates that at least some of the jitter problems could be resolved by using more correct timestamps. However, playing back the video on VLC would often freeze for multiple seconds at times, and I have no idea why - there must be another underlying problem preventing smooth playback.

Side note: Processing the camera video directly with GStreamer

For a number of reasons, I also experimented with using gstreamer on the Pi to burn in the clock overlay directly into the raw camera image:

libcamera-vid already has around 10s of latency w.r.t the "real world", which is annoying when I want to reposition the camera. I was hoping to reduce latency somehow compared to libcamera-vid - possibly by using different encoder settings
The fact that we are decoding the stream and then re-encoding it on the server increases latency and reduces quality as well

I found two options for getting the raw camera video in gstreamer:

There is libcamerasrc, which can provide direct video from the image. It is possible to provide resolution and framerate with corresponding caps filters, and a tuning file by specifying the `` environment variable. There are a few other settings which are exposed as properties and can be viewed with gst-inspect-1.0 libcamerasrc, but not all of the settings of libcamera-vid are avaliable.
Since libcamera-vid can also output raw video in YUV420 format, it is possible to redirect the images to stdout and redirect it to gstreamer. This is the approach which I ultimately used, to get access to the other options.

I implemented a similar pipeline to the one above to burn in the current date and time, with the major difference that I would now be using the hardware video encoder using v4l2h264encode and v4l2convert. There used to be omxh264encode as well, but it seems to be deprecated in favor of the V4L2 APIs.

libcamera-vid --tuning-file /usr/share/libcamera/ipa/raspberrypi/ov5647_noir.json --codec yuv420 -t 0 --width 1344 --height 1024 --framerate 30  -o - | gst-launch-1.0 fdsrc fd=0 ! rawvideoparse width=1344 height=1024 framerate=30/1 format=2 ! videoconvert ! videoflip method=rotate-180 ! clockoverlay ! v4l2h264enc extra-controls="controls,repeat_sequence_header=1" ! \'video/x-h264,level=(string)4.2\' ! tcpserversink port=1 host=0.0.0.0

I am now using the yuv420 codec in libcamera
Note the increased video size of 1344x1024 compared to 1296x972. Due to limitations of the video hardware, the resolution must be a multiple of 64 as mentioned in this GitHub issue. Otherwise, the image will be corrupted.
format=2 in the parameters of rawvideoparse specifies YUV420
I am actually using a software videoconvert here instead of the hardware-backed v4l2convert, because the latter would run a lot slower. It is not clear to me why this would be the case, but it is possible that there is some contention on the video encoder hardware. libcamera-vid might need to do some color space conversion internally to convert the raw sensor data to YUV420 and use up some hardware blocks for that.
repeat_sequence_header=1 is essentially the same as the --inline option to libcamera. There are other options to the encoder, which can be viewed using v4l2-ctl -d 11 --list-ctrls-menu. This assumes that /dev/video11 is the video encoder device.

Ultimately I abandoned this approach, since it was significantly slower than using libcamera directly. In addition, I realized a few days later that burning in timestamps on the Pi directly might be undesirable after all if I want to play with computer vision stuff.

Notes

This was the first time I really worked with video streaming and gstreamer in particular, so I am sharing some notes that I wish I would have known:

There is always a trade-off between latency and jitter

This may be obvious for anyone who has worked with media streaming before, but I was not aware of it. Hence, I tried to optimize my pipeline delays by reducing the number and size of HLS segments. With this pipeline, my browser stream was freezing every few seconds, waiting for the next segment. This segment is not always ready whenever the previous one is fully played, since it may sometimes take longer than usual to produce a frame in the pipeline due to hardware contention, network delays, etc. By increasing the number and size of segments back to the default, the buffering problem was gone: The added latency means that a segment taking longer than usual will still arrive in time for the browser to play it back because of the delay.

As I mentioned in the beginning, a bit of latency is fine for me in this application. In the future, I might want to add an alternative "low-latency" mode on the Pi, by transmitting a smaller image size, using a simpler encoding (MJPEG) and broadcasting this stream directly via UDP multicast. This can hopefully reduce encoding delay and remove any delay coming from TCP and my reencoding process on the server.

libcamera_vid with H264 encoding enough `gpu_mem` to function correctly

I was trying to push higher resolutions at some point during my experimentation, and got an error from libcamera. By googling, I stumbled over this GitHub Issue, where some comments indicate that they had issues due to their GPU allocation being too high, leaving too little room for CMA: Therefore, I had set gpu_mem=32 in /boot/config.txt at some point, and was surprised to get an error at the 1296x972 resolution which had worked just fine before. In this case, the libcamera log just contains *** Failed to start output streaming ***. In dmesg I received the error bcm2835_codec_start_streaming: Failed enabling i/p port, ret -2, which ultimately helped me to track down the issue after a lot of frustration

Pipeline negotiation issues

I would very often get the following log while experimenting with the pipeline:

Jul 17 02:52:31 birdcam gst-launch-1.0[9766]: Setting pipeline to PAUSED ...
Jul 17 02:52:31 birdcam gst-launch-1.0[9766]: Pipeline is PREROLLING ...
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Redistribute latency...
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Redistribute latency...
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: ERROR: from element /GstPipeline:pipeline0/GstTCPClientSrc:tcpclientsrc0: Internal data stream error.
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Additional debug info:
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: ../libs/gst/base/gstbasesrc.c(3127): gst_base_src_loop (): /GstPipeline:pipeline0/GstTCPClientSrc:tcpclientsrc0:
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: streaming stopped, reason not-negotiated (-4)
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: ERROR: pipeline doesn't want to preroll.
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Setting pipeline to NULL ...
Jul 17 02:52:32 birdcam gst-launch-1.0[9766]: Freeing pipeline ...

On first sight, it appears that the error comes from the tcpclientsrc - but this is actually not the root error cause here. Instead, pay attention to the line streaming stopped, reason not-negotiated (-4). The error in this line may be different, but here it indicates a negotiation error. Negotiation is the process of finding appropriate formats for the data flow within a pipeline, and if there are two elements that cannot negotiate a common format for their connection, gstreamer will throw an error. Sadly, there is usually no additional information, so it is up to the user to find the error in the pipeline, usually by going through all links between pads and checking the capabilities with gst-inspect-1.0 or the documentation. In this case, I forgot an h264parse.

Related note: h264parse does not pass the parsed info correctly through a tee and queue. One has to pass the raw data through and parse after the queue.

Layer 8 Problems

I have to admit that I wasted more time than would have been necessary on debugging, mostly because I developed some sort of "tunnel vision" while working on the same problem for too long. Particularly with various nondescript gstreamer errors, I often found myself late at night adding elements to the pipeline or changing properties without really knowing whether the result would work out, just because I wanted to somehow fix the issue at all costs. Similarly, the act of just skimming over issue descriptions and copying instructions that "looked right" for my problem ultimately lead to me wasting half a day on debugging the issues caused by my gpu_mem modification. It would have helped if I had taken a break to come back more relaxed and look at the documentation properly. Just leaving this here as a warning to myself for future endeavors.

Comments

If you have any questions or comments, please feel free to reach out to me by sending an email to blog(at)krisnet.de.

Streaming from the Raspberry Pi Camera with gstreamer

Architecture #

Streaming frontend #

Backend #

Raspberry Pi Camera stream #

Processing and HLS streaming #

Timestamping and Jitter issues #

Side note: Processing the camera video directly with GStreamer #

Notes #

There is always a trade-off between latency and jitter #

libcamera_vid with H264 encoding enough gpu_mem to function correctly #

Pipeline negotiation issues #

Layer 8 Problems #