Contents

Part 1 - Video capture using GStreamer and gstreamer-netcore

Previous parts: [Part 0]

Video capture is the very first step in any video processing system, so it’s going to be our first task as well. As I mentioned in Part 0, we will be using an RTSP-capable IP camera as our video source. Ideally, for development purposes, it would be great to be able to capture video for a USB-connected web camera as well. Even better if we could do that with the same code.

Sounds too good? Yes, and it is absolutely achievable - with GStreamer.

What is GStreamer

GStreamer is a library for constructing graphs of media-handling components. The applications it supports range from simple audio/video playback and streaming to complex audio mixing and non-linear video editing and processing. GStreamer works on all major operating systems such as Linux, Android, Windows, Max OS X, iOS, as well as most BSDs, commercial Unixes, Solaris, and Symbian. It has been ported to a wide range of operating systems, processors and compilers. It runs on all major hardware architectures including x86, ARM, MIPS, SPARC and PowerPC, on 32-bit as well as 64-bit, and little endian or big endian.

GStreamer is widely used in many real-time media processing pipelines, including Video Analytics solutions from major vendors. They often make 1st-party components (plugins) directly embeddable into GStreamer pipeline graphs. There are also plugins made specifically for solving Computer Vision problems using OpenCV underneath.

GStreamer is a powerful tool for building media processing solutions. I strongly recommend learning it in details. One of the GStreamer’s greatest features is an ability to create media pipelines entirely in command line. Of course, as it’s also a well-documented C library, the classic way of using it in apps is also available.

/images/gstreamer-diag.png

This page has a plenty of examples of how you can build different GStreamer pipelines using command line with gst-launch-1.0 - the ultimate pipeline builder app shipping with GStreamer.

/images/gstreamer_example_pipeline.jpg

We will use GStreamer in our own, controlling many aspects of the pipeline, but still using a command-line-alike approach. It is actually another great feature of GStreamer. You can build a pipeline graph by connecting elements one by one or start with parsing a “command line”, and then access elements by their names or traversing the pipeline graph.

Before we code

Installing GStreamer is different depending on what OS you do that. On Linux, I’d do that with whatever package manager is recommended on that distro. On macOS, I’d use Homebrew - it’s the easiest way to get GStreamer, plugins and all dependencies. On Windows, you have to use MiniGW (not MSVC) build because this is the only build .NET wrapper is currently compatible with on Windows.

Component-wise, in addition to GStreamer itself, also install “base” and “good” plugins (here you can learn what it means).

Depending on your OS, their names may be “gst-plugins-base” and “gst-plugins-good” (on macOS with Homebrew), or “gstreamer1.0-plugins-base” and “gstreamer1.0-plugins-good” (on Ubuntu and Debian), or “gstreamer1-plugins-base-tools”, “gstreamer1-plugins-base-devel”, “gstreamer1-plugins-good” and “gstreamer1-plugins-good-extras” on Fedora.

On Windows, with .msi installer, keep all default components selected, and add “Gstreamer 1.0 libav wrapper”.

Check, check. One, two, three.

Before you continue, make sure the following command results in video playback. You may replace the url with a video of your choice or a local file using a file:/// URL.

1
gst-launch-1.0 playbin uri=https://ia800501.us.archive.org/10/items/BigBuckBunny_310/big_buck_bunny_640_512kb.mp4

Simple app with GStreamer and .NET Core

Time to write some code. I use .NET Core with C# in Visual Studio Code. In order to make it possible with GStreamer and .NET Core, I made gstreamer-netcore package which hopefully at some point will make to the the official gstreamer-sharp repo.

First, let’s start from a simple sample app.

1
2
3
4
mkdir HelloGstreamer # new directory for our app
cd HelloGstreamer  # switch to the app's directory 
dotnet new console  # create a new .NET Core console app 
dotnet add package gstreamer-sharp-netcore # add gstreamer-sharp-netcore NuGet package 

Open Program.cs and replace it with

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
using System;
using Gst; 

namespace HelloGstreamer
{
    class Program
    {
        public static void Main (string[] args)
        {
            // Initialize Gstreamer
            Application.Init(ref args);

            // Build the pipeline using "command line" syntax 
            var pipeline = Parse.Launch(
                "playbin uri=http://download.blender.org/durian/trailer/sintel_trailer-1080p.mp4");

            // Start playing
            pipeline.SetState(State.Playing);

            // Wait until error or EOS
            var bus = pipeline.Bus;
            var msg = bus.TimedPopFiltered (Constants.CLOCK_TIME_NONE, 
                MessageType.Eos | MessageType.Error);

            // Free resources
            pipeline.SetState (State.Null);
        }
    }
}

As you may see, this piece of code simply plays video with audio (even with some UI!), waiting until the end of the stream. Replace uri with your rtsp:// camera address (may need credentials as well), and you will see camera’s video stream.

Getting raw video samples

Going forward, we need a bit more complex pipeline where we can intercept and analyze raw video frames. RawSamples example shows how to do that with an AppSink. And by the way, it also demonstrates how to create a pipeline by adding elements one by one instead of declaratively creating one using Parse.Launch.

26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
Pipeline = new Gst.Pipeline("simplepipeline");
VideoSink = new AppSink("videoSink"); // create an appsink
Playbin = ElementFactory.Make("playbin", "playbin"); // create a 'playbin' element 

Playbin["uri"] = source;
Playbin["video-sink"] = VideoSink; // playbin will push video to our appsink 

VideoSink["caps"] = Caps.FromString("video/x-raw,format=RGBA"); // We only accept raw RGBA samples
VideoSink.EmitSignals = true;
VideoSink.NewSample += NewVideoSample; // this method will receive raw video samples 
VideoSink.Drop = true;
VideoSink.Sync = true;
VideoSink.Qos = true;

Pipeline.Add(Playbin); // add playbin element to the pipeline 

Further, in NewVideoSample method we receive raw video samples, and you can see how to get actual raw buffers there.

Advanced processing pipeline

In order to achieve greater performance and simplify usage of AppSink with IP (RTSP) and web cameras, I created a few helper classes and put them to our first code sample:

https://github.com/there-was-a-face/1-GStreamer-video.

There is a few key points in that sample:

  1. GstVideoStream class builds a pipeline for you and handles GStreamer-specific actions and events. It supports web cameras, URL-based streams (including rtsp://) or any arbitrary pipeline declarations as you were using gst-launch-1.0 with a single mandatory appsink element at the end.
  2. GstVideoStream used declarative pipeline syntax.
  3. Instead of waiting for samples in AppSink.NewSample, it’s pulling samples on timer - this approach turns to be faster.

Please go ahead and take a look at the sample.

A lot of work is done for you in GstVideoStream class. It basically abstracts GStreamer internals and helps you focus on the actual analytics part.

OnNewFrame method in Program.cs is where the majority of video analytics work will happen going forward.

If we take the original example above, our new code would look like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
// create a new video stream from a URI
GstVideoStream stream = new GstVideoStream("http://download.blender.org/durian/trailer/sintel_trailer-1080p.mp4"); 
// For using a web cam source, make it as
// GstVideoStream stream = new GstVideoStream(0, 0, 0); 
// First parameter - camera device index (0, 1, 2, etc.),
// second and third - desired width and height of the video stream 
// (works if the camera supports it, keep them both zero if not sure)

// new frame handler - called for every frame received 
stream.NewFrame += (GstVideoStream sender, IGstVideoFrameContext frameContext) =>
{
    /*
        Use frameContext.Buffer for accessing raw buffer as IntPtr. 
        Make sure the format is what you expect by checking frameContext.Format (RGBA is the only supported one for now) 
        Use frameContext.CopyTo for copying raw frame data to another unmanaged buffer (e.g. WritableBitmap) 
    */
}; 

// start pipeline
stream.Play(); 

// wait for the pipeline to finish with end-of-stream or error 
stream.WaitForMessageAsync(MessageType.Eos | MessageType.Error, TimeSpan.Zero).Wait();

// Free resources
stream.Dispose();

What’s next?

Next - detect a face

Now we are ready for our first big CV task - detecting faces! I will do a few iterations on that, trying several methods, moving to an advanced DNN-based model, and optimizing the process with face tracking between frames!

/images/facetracking.gif

Stay tuned!