Blepo Reference Manual

Topics:
  1. Preliminaries
  2. Fundamental classes and structs
  3. Image operations
  4. Image processing
  5. Computer vision
  6. Matrix operations
  7. Linear algebra
  8. Figure
  9. Capture


Preliminaries

The purpose of this reference manual is to provide information about the most useful and stable classes and functions, in an easy-to-read format. Although the library is still in a state of development, the interfaces captured here are not expected to change significantly (if at all), and the functionality captured here has been tested at least to a moderate degree. While users should also feel free to use any additional functions they find in the header files themselves, keep in mind that such functions are less well tested and are more likely to have their interfaces changed in the future.

The library follows the convention of passing inputs as const references (except for native types, which are passed by value), outputs as pointers (after the inputs), and input-outputs as pointers at the beginning. Of course, due to C++ restrictions, parameters with default values (even if they are inputs) always come last. Examples:

  void Foo1(const A& a, A* b);            // 'a' is input, 'b' is output
  void Foo2(const A& a, B* b, int c=0);   // same but with a default value for the last parameter
  void Foo3(B* b, const A& a);            // 'b' is both input and output
This syntax, similar to that used by printf() and scanf() in C, makes code more readable because you can immediately tell whether a parameter is changed by a function simply by looking at whether its pointer is passed. For example, we can tell by immediate inspection that the second line of this code does not modify img1, even without knowing anything about the semantics of the function:
  ImgGray img1, img2;
  Func(img1, &img2);                     // does not modify 'img1'

The library automatically handles the allocation / reallocation, if necessary, of the internal data of outputs, so all you need to do is pass a valid pointer. For example, just pass a pointer to an image without first specifying its size. On the other hand, input-outputs are parameters whose internal values may change (perhaps as a function of their input values) but whose size remains constant.

Many of the functions check to see whether the operation is being performed in place. For example, suppose we call the function above with the same value being passed as input and output:

  A a;
  Foo1(a, &a);   // 'in-place' operation
In this case the output pointer passed to the function points to the input parameter, but the function will work correctly nevertheless. Keep in mind that some of the functions can be efficiently implemented in place, in which case the function will do so; while other functions require the allocation of a temporary variable to store the intermediate results. In the latter case, the in-place operation provides convenience for the user, but perhaps at the expense of computational overhead.

In the functions below, src is used for inputs, dst for outputs, and src_dst for input-outputs. For brevity, the notation ImgX is used when a function is implemented for multiple image types.


Fundamental Classes and Structs

Image

Overview

There are five image classes, all derived from a base templated class Image:

    ImgGray : Image<unsigned char>    graylevel image (one byte per pixel)
    ImgBgr : Image<Bgr>               blue-green-red color image (three bytes per pixel)
    ImgBinary : Image<bool>           packed binary image (one bit per pixel)
    ImgInt : Image<int>               integer image (machine dependent, but usually 4 bytes per pixel)
    ImgFloat : Image<float>             single-precision floating-point image (4 bytes per pixel)

The image data are stored in memory as contiguous pixels, in row major order with no padding between rows. For multi-channel images, the channels are interleaved; e.g., the pixels in ImgBgr are <B0, G0, R0, B1, G1, R1, ...>. For ImgBinary, the first pixel is stored in the most significant bit of the first byte, the ninth pixel is stored in the most significant bit of the second byte, and so forth; if the total number of pixels is not a multiple of 8, then the last byte contains unused bits.

The Bgr struct is described below in detail, but it basically contains three unsigned chars: struct Bgr { unsigned b, g, r };

Members

    Typedefs:
      Pixel                                            a pixel (unsigned char, int, float, bool, Bgr, ...)
      Iterator                                         generalized pointer to a pixel
      ConstIterator                                    constant generalized pointer to a pixel
    Constants:
      int NBITS_PER_PIXEL                              number of bits per pixel
      int NCHANNELS                                    number of channels
      Pixel MIN_VAL                                    minimum pixel value 
      Pixel MAX_VAL                                    maximum pixel value 
    Info:
      int Width() const                                returns the width
      int Height() const                               returns the height
      int NBytes() const                               returns the number of bytes
    Pixel data:
      const Pixel& operator()(int x, int y) const      access a pixel (read only)
      Pixel& operator()(int x, int y)                  access a pixel (read / write)
      ConstIterator Begin() const                      return iterator to first pixel (read only)
      ConstIterator Begin(int x, int y) const          return iterator to pixel (x,y) (read only)
      ConstIterator End() const                        return iterator to just past last pixel (read only)
      Iterator Begin()                                 return iterator to first pixel (read / write)
      Iterator Begin(int x, int y)                     return iterator to pixel (x,y) (read / write)
      Iterator End()                                   return iterator to just past last pixel (read / write)
      const unsigned char* BytePtr() const             return pointer to first byte (same as Begin() typecast to unsigned char*)
      unsigned char* BytePtr()                         return pointer to first byte (same as Begin() typecast to unsigned char*)
    Constructor / destructor / etc.	  
      Image()                                          create empty (0 x 0) image
      Image(int width, int height)                     create width x height image (uninitialized values)
      Image(const Image& other)                        make an exact replica (with a separate copy of the data)
      ~Image()                                         destroy image (free up all memory)
      Image& operator=(const Image& other)             make an exact replica (with a separate copy of the data)
      void Reset(int width, int height)                resizes image to width x height (does not initialize values),
                                                              -- does nothing if width x height image already allocated
      void Reset()                                     free memory for image data
                                                              -- does nothing if image is already empty

Matrix

There are two matrix classes, all derived from a base templated class Matrix:

    MatFlt : Matrix<float>    single-precision floating point matrix (32 bits per element)
    MatDbl : Matrix<double>   double-precision floating point matrix (64 bits per element)
The interface to the matrix class is very similar to the image class.

Bgr

  struct Bgr { unsigned b, g, r };

Point / Rect

Currently,

    typedef CPoint Point
    typedef CRect Rect
In the future, they may be reimplemented, but their interfaces should remain consistent with those of the MFC classes.

Exception

The Exception class has only one method:

  Display()  display the human-readable string in a message box.
Note: If you would like to throw an Exception yourself, pass a human-readable string to the macro BLEPO_ERROR(). To format the string in the same line, using StringEx:
    BLEPO_ERROR("Out of bounds");
    BLEPO_ERROR(StringEx("Unable to open file %s", filename));


Image Operations

Load

  void Load(const char* filename, ImgBgr* out);
  void Load(const char* filename, ImgGray* out);
Loads an image from a file into out, allocating memory for out. Supported file formats: BMP, PGM (binary), PPM (binary), JPEG. The file type is determined automatically by the magic number in the file (ignoring the file extension).

Save

  void Save(const ImgBgr& img, const char* filename, const char* filetype = NULL);
  void Save(const ImgGray& img, const char* filename, const char* filetype = NULL);
Saves an image to a file. Supported file formats: BMP, PGM (binary), PPM (binary), JPEG. The image type is determined automatically by the extension of filename (".bmp", ".pgm", ".ppm", ".jpg", or ".jpeg"). If the filename has a nonstandard extension, pass the standard extension as an optional third parameter (without the period). For example,
  Save(img, "file.jaypegg", "jpg");

Convert

  void Convert(const ImgX& img, ImgY* out);
Converts between two image types, allocating memory for out. ImgX and ImgY may be any pair of distinct image types, with the following meaning:
  ImgGray   -> ImgBgr     replicates graylevel value three times
  ImgGray   -> ImgInt     copies values
  ImgGray   -> ImgFloat     copies values
  ImgGray   -> ImgBinary  sets nonzero pixels to 1, zero pixels to 0
  ImgInt    -> ImgGray    clips values outside [0, 255]
  ImgInt    -> ImgFloat     copies values
  ImgInt    -> ImgBgr     varies (see below)
  ImgInt    -> ImgBinary  same as ImgInt -> ImgGray -> ImgBinary
  ImgFloat    -> ImgGray    rounds values, then clips to [0, 255]
  ImgFloat    -> ImgInt     rounds values
  ImgFloat    -> ImgBgr     same as ImgFloat -> ImgGray -> ImgBgr
  ImgFloat    -> ImgBinary  same as ImgFloat -> ImgGray -> ImgBinary
  ImgBgr    -> ImgGray    uses (1*B + 6*G + 3*R)/10
  ImgBgr    -> ImgInt     varies (see below)
  ImgBgr    -> ImgFloat     same as ImgBgr -> ImgGray -> ImgFloat
  ImgBgr    -> ImgBinary  same as ImgBgr -> ImgGray -> ImgBinary
  ImgBinary -> ImgGray    sets 0 to zero, 1 to 255
  ImgBinary -> ImgInt     sets 0 to zero, 1 to one
  ImgBinary -> ImgFloat     sets 0 to zero, 1 to one
  ImgBinary -> ImgBgr     sets 0 to zero, 1 to Bgr(255,255,255)

The functions converting between ImgBgr and ImgInt require a third parameter to indicate the format of the integer:

Here X means "don't care", and gg is a single byte graylevel value. The last of these options is the same as converting to an ImgGray as an intermediary representation.

In addition, the functions converting from ImgBinary require two additional parameters: the zerovalue and the onevalue. For example, to set all the 0 pixels to blue and all the 1 pixels to red, use the following:

  Convert(img_binary, &img_bgr, Bgr(255,0,0), Bgr(0,0,255));

Threshold

  void Threshold(const ImgX& img, ImgX::Pixel threshold, ImgBinary* out);

Thresholds an image, setting the dimensions of out to those of img. Pixels greater than or equal to threshold are set to 1, otherwise to 0.

Defined for ImgGray, ImgInt, ImgFloat.

Set

  void Set(ImgX* out, ImgX::Pixel val);
  void Set(ImgX* out, const Rect& rect, ImgX::Pixel val);
  void Set(ImgX* out, const ImgBinary& mask, ImgX::Pixel val);
  void Set(ImgX* out, const Point& pt, const ImgX& img);
  void Set(ImgX* out, const Point& pt, const ImgX& img, const Rect& rect);
  void SetOutside(ImgX* out, const Rect& rect, ImgX::Pixel val);

Sets the pixels in out to the value val. Does not change the dimensions of out. These six versions, respectively,

Note that out must be larger than img, unless pt is (0,0).

Defined for all image types.

Extract

  void Extract(const ImgX& img, const Rect& rect, ImgX* out);

Copies the values in rect of img to out, resizing the latter to the size of the rectangle.

Defined for all image types.

IsSameSize

  bool IsSameSize(const ImgX& img1, const ImgY& img2);

Returns whether two images have the same dimensions.

Defined for all image types. Images may be of the same or different types.

IsIdentical

  bool IsIdentical(const ImgX& img1, const ImgX& img2);

Returns whether two images have the same dimensions and the exact same data.

Defined for all image types.

Comparison

  void Equal(const ImgX& img1, const ImgX& img2, ImgBinary* out);
  void NotEqual(const ImgX& img1, const ImgX& img2, ImgBinary* out);
  void LessThan(const ImgFloat& img1, const ImgFloat& img2, ImgBinary* out);
  void GreaterThan(const ImgX& img1, const ImgX& img2, ImgBinary* out);
  void LessThanOrEqual(const ImgX& img1, const ImgX& img2, ImgBinary* out);
  void GreaterThanOrEqual(const ImgX& img1, const ImgX& img2, ImgBinary* out);

Sets a binary image based upon a pixel-wise comparison of two images, resizing out to the same size as the input images (which must be of the same size).

Equal, NotEqual are defined for all image types. The other functions are defined only for ImgGray, ImgInt, ImgFloat.

Min / Max

  ImgX::Pixel Min(const ImgX& img);
  ImgX::Pixel Max(const ImgX& img);
  void MinMax(const ImgX& img, ImgX::Pixel* minn, ImgX::Pixel* maxx);
  void Min(const ImgX& img1, const ImgX& img2, ImgX* out);
  void Max(const ImgX& img1, const ImgX& img2, ImgX* out);

The first two functions return the min or max value of an image. The next function computes the min and max simultaneously and is more efficient if both are needed (because only one pass through the image data is performed). The last two functions set out to the pixel-wise min or max of the two images, resizing out to the same size as the input images (which must be of the same size).

Defined for ImgGray, ImgInt, ImgFloat.

Logical operations

Two input images:
  void And(const ImgX& img1, const ImgX& img2, ImgX* out);
  void Or (const ImgX& img1, const ImgX& img2, ImgX* out);
  void Xor(const ImgX& img1, const ImgX& img2, ImgX* out);
One input image and a binary mask:
  void And(const ImgX& img1, const ImgBinary& mask, ImgX* out);
  void Or (const ImgX& img1, const ImgBinary& mask, ImgX* out);
  void Xor(const ImgX& img1, const ImgBinary& mask, ImgX* out);
One input image and a constant value:
  void Not(const ImgX& img, ImgX* out);
  void And(const ImgX& img, ImgX::Pixel val, ImgX* out);
  void Or (const ImgX& img, ImgX::Pixel val, ImgX* out);
  void Xor(const ImgX& img, ImgX::Pixel val, ImgX* out);

Sets out by applying a bitwise logical operator to an image or pair of images. The two input images must be of the same size.

Notes:

Arithmetic operations

  void Add     (const ImgX& img1, const ImgX& img2, ImgX* out);
  void Subtract(const ImgX& img1, const ImgX& img2, ImgX* out);
  void AbsDiff (const ImgX& img1, const ImgX& img2, ImgX* out);
  void Multiply(const ImgX& img1, const ImgX& img2, ImgX* out);
  void Multiply(const ImgX& img1, const ImgX::Pixel& img2, ImgX* out);
  void Divide  (const ImgX& img1, const ImgX::Pixel& img2, ImgX* out);
  void LinearlyScale(const ImgX& img, ImgX::Pixel minval, ImgX::Pixel maxval, ImgX* out);

Applies a pixel-wise arithmetic operator to an image or pair of images. The two input images must be of the same size. LinearlyScale scales the image so that its minimum value is 'minval' and maximum value is 'maxval'.

Notes:

Resampling

  void Resample(const ImgX& img, int new_width, int new_height, ImgX* out);
  void Upsample(const ImgX& img, int factor_x, int factor_y, ImgX* out);
  void Downsample(const ImgX& img, int factor_x, int factor_y, ImgX* out);

Creates an image with new dimensions from an existing image. Resample is the most general function because it can create an image with arbitrary dimensions. Upsample and Downsample can create an image whose dimensions are an integral multiple of the original image. The factors must be positive.

Defined for all image types.

Note: Only nearest-neighbor interpolation is supported for these functions.

Interpolation

  ImgX::Pixel Interp(const ImgX& img, float x, float y);

Performs bilinear interpolation at a single pixel. Defined for all image types.

Display image

  void Draw(const ImgX& img, HDC hdc, int x, int y);
  void Draw(const ImgX& img, HDC hdc, const Rect& src, const Rect& dst);

Draws an image onto a Windows device context (Note: A CDC may be substituted for an HDC). The first function draws the image at a specified location with no stretching of the image. The second function draws an arbitrary rectangle of the image onto an arbitrary rectangle of the device context, automatically stretching the image to fit the destination rectangle.

Defined for all image types. Binary, float, and int images are linearly scaled to map their inputs to the range [0,255].

Draw onto image

  void DrawDot(const Point& pt, ImgX* out, const X& color, int size = 3);  // draws a filled-in square
  void DrawLine(const Point& pt1, const Point& pt2, ImgX* out, const X& color, int thickness=1);
  void DrawRect(const Rect& rect, ImgX* out, const X& color, int thickness=1);
  void DrawCircle(const Point& center, int radius, ImgX* out, const X& color, int thickness=1);
  void DrawEllipse(const Point& center, int major_axis, int minor_axis, double angle, ImgX* out, const X& color, int thickness=1);
  void DrawEllipticArc(const Point& center, int major_axis, int minor_axis, double angle, double start_angle, double end_angle, ImgX* out, const X& color, int thickness=1);

Draws a dot, line, rectangle, circle, ellipse, or elliptic arc, respectively, on an image. Implementation uses OpenCV functions.

  class TextDrawer:
    TextDrawer(int height=15, int thickness=2);
    void DrawText(ImgBgr* img, const char* text, const Point& pt, const Bgr& color);
    void DrawText(ImgBgr* img, const char* text, const Point& pt, const Bgr& color, const Bgr& background_color);

Draws text onto an image.


Image Processing

Morphological operations

binary morphology:
  void Erode3x3      (const ImgGray& img, ImgGray* out);
  void Dilate3x3     (const ImgGray& img, ImgGray* out);
  void Erode3x3Cross (const ImgGray& img, ImgGray* out);
  void Dilate3x3Cross(const ImgGray& img, ImgGray* out);
grayscale morphology:
  void GrayscaleErode3x3 (const ImgGray& img, int offset, ImgGray* out);
  void GrayscaleDilate3x3(const ImgGray& img, int offset, ImgGray* out);

Performs dilation or erosion with one of the following 3 x 3 kernels:

   1 1 1             0 1 0
   1 1 1             1 1 1
   1 1 1             0 1 0
                    (cross)

The binary functions treat the input image as a binary image and are optimized for MMX. The grayscale functions use a 3x3 structuring element of all ones, multiplied by 'offset' (which should be non-negative).

Convolution / Gradient / Smoothing / etc.

  void SmoothGauss5x5(const ImgGray& img, ImgGray* out);
  void GradMagPrewitt(const ImgGray& img, ImgGray* out);
  void GradPrewittX(const ImgGray& img, ImgFloat* out);
  void GradPrewittY(const ImgGray& img, ImgFloat* out);
  void GradPrewitt (const ImgGray& img, ImgFloat* gradx, ImgFloat* grady);
  void Smooth  (const ImgFloat& img, float sigma, int kernel_length, ImgFloat* img_smoothed);
  void Smooth  (const ImgFloat& img, float sigma, ImgFloat* img_smoothed);
  void Gradient(const ImgFloat& img, float sigma, int kernel_length, ImgFloat* gradx, ImgFloat* grady);
  void Gradient(const ImgFloat& img, float sigma, ImgFloat* gradx, ImgFloat* grady);

**** Mention borders. Mention efficiency. Add functions to use MMX routines we already have implemented. Gradient uses the derivative of a Gaussian.

Connected components

  void ConnectedComponents4(const ImgBgr   & img, ImgInt* labels);
  void ConnectedComponents4(const ImgBinary& img, ImgInt* labels);
  void ConnectedComponents4(const ImgGray  & img, ImgInt* labels);
  void ConnectedComponents4(const ImgInt   & img, ImgInt* labels);

Computes the connected components of an image using 4 neighbors. An optional third argument, of type std::vector >* props will cause properties of the regions to be computed as well. Example:

  ImgGray img;
  ImgInt labels;
  std::vector< ConnectedComponentProperties<ImgGray::Pixel> > reg;  		// pixel type must match input image type
  ConnectedComponents4(img, &labels, &(reg));
Note: ConnectedComponents8 performs the operation using 8 neighbors.

Floodfill

  void FloodFill4(const ImgX& img, int seed_x, int seed_y, ImgX::Pixel new_color, ImgX* out);
  void FloodFill8(const ImgX& img, int seed_x, int seed_y, ImgX::Pixel new_color, ImgX* out);

Performs floodfill on an image. All the pixels that are adjacent to the pixel img(seed_x, seed_y) and which have the same color as it in img, are colored new_color in the output image out. If out is already the same size as img, then only the flooded pixels are touched; otherwise it is automatically resized first.

Inplace is supported.

Chamfer

  void Chamfer(const ImgGray& img, ImgInt* chamfer_dist);

Computes the Chamfer distance of a binary image (i.e., Manhattan distances to the non-zero pixels in the input image).


Computer Vision

FaceDetect

  class FaceDetector
Methods:
  void DetectFrontalFaces(const ImgBgr& img, std::vector* out)
  void DetectLeftProfileFaces(const ImgBgr& img, std::vector* out)
  void DetectRightProfileFaces(const ImgBgr& img, std::vector* out)
  void DetectAllFaces(const ImgBgr& img, std::vector* frontal, std::vector* left, std::vector* right)
Detects faces in an image using the implementation in the OpenCV library. Uses hardcoded .xml files for frontal and profile face detection. These files must be either in the directory of the executable or the Blepo directory of the code used in compiling the executable.

Canny edge detection

  void Canny(const ImgGray& img, ImgBinary* out, float sigma, float perc, float ratio);

Performs Canny edge detection, described in the following paper:

Mean shift color segmentation

  void MeanShiftSegmentation(const ImgBgr& img, ImgBgr* out);
  void MeanShiftSegmentation(const ImgBgr& img, ImgBgr* out, const MeanShiftSegmentationParams& params);

Performs mean shift color segmentation, described in the following paper:

Parameters may be passed to the algorithm by passing a struct as an optional third argument:

  struct MeanShiftSegmentationParams:
    int sigma_spatial       spatial radius of the mean shift window (default: 7)
    float sigma_color       range radius of the mean shift window (default: 6.5)
    int minregion           minimum density of a region; regions smaller than this will be pruned (default: 20)
    int speedup             speedup level (0: slowest, 1: medium, 2: fastest) (default: 2)

Graph-based color segmentation

  void FHGraphSegmentation(const ImgBgr& img, float sigma, float c, int min_size, ImgBgr *out);

Performs graph-based color segmentation, described in the following paper:

Watershed segmentation

  void WatershedSegmentation(const ImgGray& img, ImgInt* labels, bool marker_based);

Performs watershed segmentation, described in the following paper:

This implementation is a slightly simplified variation of the original algorithm. .

You should pass in a gradient magnitude image as img. If marker_based is true, then new catchment basins will only be declared in regions where img is 0. If false, then new catchment basins will be declared in all local minima (likely leading to oversegmentation). To use marker-based, simply set img to zero wherever you would like to set a marker.

Elliptical head tracking

  class EllipticalHeadTracker

Tracks a person's head using an elliptical model. The algorithm is described in the following paper:

Example:

  ImgBgr img;
  EllipticalHeadTracker::EllipseState state(58, 47, 21);
  EllipticalHeadTracker	eht;
  eht.Init(img, state);
  while (1)
  {
    img.Load( ... );
    state = eht.Track(img);
  }

Camera calibration

  void CalibrateCamera(const std::vector< CalibrationPointArr >& pts, const Size& img_size, CalibrationParams* out);

Given correspondence between a number of world and image coordinates, in a number of images, computes the internal calibration parameters of the camera. Uses the implementation in the OpenCV library. pts is an array of correspondences for each image. The data structures are as follows:

  struct CalibrationPoint
  {
    double x, y, z;  // world coordinates
    double u, v;     // image coordinates
  };
  struct CalibrationParams
  {
    MatDbl intrinsic_matrix;            // intrinsic matrix
    Array< double > distortion_coeffs;  // radial and tangential lens distortion
  };
  typedef Array< CalibrationPoint > CalibrationPointArr;

Some helper functions to make it easy to accomplish the world to image correspondence are the following:

  typedef Array< CvPoint2D32f > Cvptarr;
  bool FindChessboardCorners(const ImgGray& img, const Size& grid_dims, Cvptarr* pts);
  void DrawChessboardCorners(ImgBgr* img, const Size& grid_dims, bool all_found, const Cvptarr& pts);
  void TransformChessboardPoints(const Cvptarr& pts, const Size& grid_dims, CalibrationPointArr* cpts);
To use these functions, print a calibration target consisting of a chessboard pattern (such as calibration_target.pdf in the images directory). Capture several images of the chessboard at various angles, running FindChessboardCorners. Whenever all the corners of the grid are found, the function will output true, and you should save the coordinates in a vector. When the corners in several images have been found, call TransformChessboardPoints. Then calibrate.

Example (assuming an array of images):

  const Size grid_dims(8, 6);  // number of corners in chessboard pattern (horiz. and vert.)
  Cvptarr pts;
  CalibrationPointArr cpts;
  std::vector< CalibrationPointArr > all_pts;
  CalibrationParams params;
  for (int i = 0 ; i < num_images ; i++)
  {
    if ( FindChessboardCorners(images[i], grid_dims, &pts) )
    {
      TransformChessboardPoints(pts, grid_dims, &cpts);
      all_pts.push_back( cpts );
    }
  }
  CalibrateCamera(all_pts, Size(img.Width(), img.Height()), ¶ms);


Matrix operations

Initialization

  void Eye(int dim, MatDbl* mat)                   // identity matrix
  void Eye(int dim, MatFlt* mat)                   
  void Rand(int width, int height, MatDbl* out)    // drawn from uniform random distribution in the range [0,1)
  void Rand(int width, int height, MatFlt* out)    

Resets a matrix to the desired size and initializes its values.

Diag

  void Diag(const MatDbl& mat, MatDbl* out)
  void Diag(const MatFlt& mat, MatFlt* out)   *** need to implement

This function serves two purposes:

Set

  void Set(MatDbl* out, double val)
  void Set(MatFlt* out, double val)

Sets all the elements of a matrix to a constant value.

Convert

  void Convert(const MatDbl& src, MatFlt* dst)
  void Convert(const MatFlt& src, MatDbl* dst)

Converts a double-precision to a single-precision matrix, or vice versa.

IsSameSize

  bool IsSameSize(const MatDbl& src1, const MatDbl& src2)  
  bool IsSameSize(const MatFlt& src1, const MatFlt& src2)  

Returns whether the dimensions of the matrices are the same.

Similar

  bool Similar(const MatDbl& src1, const MatDbl& src2, double tolerance)
  bool Similar(const MatFlt& src1, const MatFlt& src2, float tolerance)  *** need to implement

Returns whether all the elements of the two matrices are within tolerance. The two matrices must be of the same size.

Arithmetic operations

  void Add(const MatDbl& src1, const MatDbl& src2, MatDbl* dst)
  void Add(const MatDbl& src, double val, MatDbl* dst)
  void Subtract(const MatDbl& src1, const MatDbl& src2, MatDbl* dst)
  void Subtract(const MatDbl& src, double val, MatDbl* dst)
  void MatrixMultiply(const MatDbl& src1, const MatDbl& src2, MatDbl* dst)    // matrix multiplication
  void Multiply(const MatDbl& src, double val, MatDbl* dst)
  void MultiplyElements(const MatDbl& src1, const MatDbl& src2, MatDbl* dst)  // element-wise multiplication
  void Negate(const MatDbl& src, MatDbl* dst)                                 // multiply by -1

Add, subtract, or multiply either two matrices, or a matrix and a constant value.

Notes:

Sum

  double Sum(const MatDbl& src)
  float Sum(const MatFlt& src)

Returns the sum of the elements in a matrix.

Transpose

  void Transpose(const MatDbl& src, MatDbl* dst)
  void Transpose(const MatFlt& src, MatFlt* dst)

Computes the transpose of a matrix. "Inplace" operation is supported but is efficient only for vectors or square matrices; non-square matrices require memory allocation / deallocation of a temporary matrix, along with an extra copy. ** Need to implement inplace.

Convenient but inefficient functions

  MatDbl Diag(const MatDbl& mat)
  MatDbl Transpose(const MatDbl& mat)
  MatDbl operator+(const MatDbl& src1, const MatDbl& src2)
  MatFlt operator+(const MatFlt& src1, const MatFlt& src2)
  MatDbl operator*(const MatDbl& src1, const MatDbl& src2)
  MatFlt operator*(const MatFlt& src1, const MatFlt& src2)
  MatDbl operator-(const MatDbl& src1, const MatDbl& src2)
  MatDbl operator-(const MatDbl& src)
  MatDbl Eye(int dim)
  MatDbl Inverse(const MatDbl& mat)

To assist rapid prototyping, these functions enable matrix operations to be combined with a more natural syntax, e.g., b = a * c + Transpose(d) - Inverse(e). Due to their use of temporary matrices and "return by value", they should not be used whenever efficiency is a concern.


Linear Algebra

Norm

  double Norm(const MatDbl& src);

Returns the L2 (Euclidean) norm of a vector. *** Only defined for vectors (can we define it for matrices, too?)

Determinant

  double Determinant(const MatDbl& mat);

Returns the determinant of a square matrix.

Inverse

  void Inverse(const MatDbl& mat, MatDbl* out);

Computes the inverse of a square matrix.

Eigenvalues and eigenvectors

  void EigenSymm(const MatDbl& mat, MatDbl* eigenvalues);
  void EigenSymm(const MatDbl& mat, MatDbl* eigenvalues, MatDbl* eigenvectors);

Computes the eigenvalues (and eigenvectors) of a square, symmetric matrix. The upper triangular elements of the input matrix are ignored. The eigenvalues are stored in a vector, while the eigenvectors are stored as columns of a matrix.

SolveLinear

  void SolveLinear(const MatDbl& a, const MatDbl& b, MatDbl* x);

Solves the linear equation Ax = b, where A is a matrix and x and b are vectors. If the system of equations is overdetermined (the number of rows in A is greater than the number of columns), then the least squares solution is produced.

Singular value decomposition

  void Svd(const MatDbl& mat, MatDbl* u, MatDbl* s, MatDbl* v);

Computes the singular value decomposition (SVD) of a matrix. After calling this function, the input matrix can be reconstructed by u * Diag(s) * Transpose(v). The matrices u and v contain the left and right singular values, respectively, in their columns. The vector s contains the singular values. If the input matrix has n columns and m rows, then the output dimensions are as follows:

LU decomposition

  void Lu(const MatDbl& mat, MatDbl* el, MatDbl* u, MatDbl* p);

Computes the LU decomposition of a matrix. After calling this function, el * u should equal (subject to round-off error) p times the input matrix. el is lower triangular, u is upper triangular, and p is a permutation matrix.

QR factorization

  void Qr(const MatDbl& mat, MatDbl* q, MatDbl* r);

Computes the QR factorization of a matrix. After calling this function, the input matrix can be reconstructed by q * r.


Figure

Figure

The Figure class is used to display an image in a window on the screen.

Members

    Constructor / destructor / etc.  
      Figure()                                                  create a figure with automatic title and placement
      Figure(const char* title, int x, int y, bool permanent)   create a figure manually:
                                                                   title:  string that appears in the title bar
                                                                   x, y:   initial screen coordinates of the window
                                                                   permanent:  if false, then window is destroyed when 
                                                                               destructor is called (default: true)
      ~Figure()
    Drawing  
      void Draw(const ImgX& img)                                display an image in the window
    Mouse input
      void GrabMouseClicks(int n, Array<Point>* points)       wait for user to click 'n' times with the mouse;
                                                                    return points in image coordinates
      Point GrabMouseClick()                                    wait for user to click once
      Rect GrabRect()                                           wait for user to click-and-drag, specifying a rectangle
      bool TestMouseClick(CPoint* pt = NULL)                    grabs mouse click without waiting; returns true if clicked
    Window functions
      void SetPosition(int x, int y)                            set position of window in screen coordinates
      void SetSize(int width, int height)                       set size of window
      Point GetPosition() const                                 get position
      Size GetSize() const                                      get size

Notes:


Capture

CaptureDirectShow

The CaptureDirectShow class is used to capture images from a camera with a DirectShow driver. It has been tested with the following devices:

Members

    List devices  
      // Fills the vector with the human-readable names of all available devices.
      // Use the index of the vector for selecting an input camera in BuildGraph.
      static void GetVideoInputDevices(std::vector* friendly_names)

    Constructor / destructor / etc.  
      CaptureDirectShow()           initializes DirectShow
      ~CaptureDirectShow()          automatically destroys the graph and uninitializes DirectShow

    Build / tear down DirectShow capture graph  
      void BuildGraph(int width, int height, int camera_index = 0)
      void BuildGraph(int camera_index = 0)
      void TearDownGraph() 

    Graph control  
      void Start()                  start capturing images at the default frame rate
      void Stop()                   stop capturing
      void Pause()                  pause capturing

    Query graph state  
      bool IsRunning() const        returns true if capturing images (i.e., Start was called)
      bool IsPaused() const         returns true if paused

    Get image  
      // Get the latest frame that has been grabbed. 'timeout' is in 
      // milliseconds (-1 for infinite, 0 for immediate return).
      // Returns true if a new frame is available since the last time this function
      // was called, false otherwise.
      bool GetLatestImage(ImgBgr* out, int timeout = 0)

To use this class, first instantiate the object and call BuildGraph. Then, call Start to cause frames to be grabbed continuously at the frame rate. Once the graph is running, then call GetLatestImage repeatedly to get access to the latest image frame. There is no need to stop the capture or tear down the graph -- the destructor takes care of these automatically.

Example:

  ImgBgr img;
  CaptureDirectShow cap;
  cap.BuildGraph();             // use default camera
  cap.Start();
  while (1)
  {
    bool new_image = cap.GetLatestImage(&img);
    if (new_image)
    {
      // process image
    }
  }

CaptureIEEE1394

The CaptureIEEE1394 class is used to capture images from an IEEE 1394 (Firewire) camera. It uses the CMU 1394 Digital Camera Driver and should therefore work with any OHCI-compliant camera but NOT with DV cameras.

Note: Before using this class, you must first set the driver to the CMU driver. See 'Getting Started'.

Members

    Constructor / destructor / etc.  
      CaptureIEEE1394()           
      ~CaptureIEEE1394()          

    Camera selection  
      void SelectCamera(int index)  select a camera

    Control  
      void Start()                  start capturing images at the default frame rate
      void Stop()                   stop capturing
      void Pause()                  pause capturing

    Get image  
      // Get the latest frame that has been grabbed. 'timeout' is in 
      // milliseconds (-1 for infinite, 0 for immediate return).
      // Returns true if a new frame is available since the last time this function
      // was called, false otherwise.
      bool GetLatestImage(ImgBgr* out, int timeout = 0)

To use this class, first instantiate the object, then call Start to cause frames to be grabbed continuously at the frame rate. Once the graph is running, then call GetLatestImage repeatedly to get access to the latest image frame. There is no need to stop the capture or disconnect the camera -- the destructor takes care of these automatically.

*** This class is functional for basic image capture, but it needs some work. Specifically, the constructor should not connect to the camera (that should be done by a separate Init function).