Trending February 2024 # Performing Computer Vision Task With Opencv And Python # Suggested March 2024 # Top 7 Popular

You are reading the article Performing Computer Vision Task With Opencv And Python updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Performing Computer Vision Task With Opencv And Python

This article was published as a part of the Data Science Blogathon


Data is often defined as raw facts. Information refers to any amounts or quantities of data that have undergone processing and holds more value than the raw facts themselves. The key difference is that decisions can be taken, and actions made, based upon the studying of Information- You should not base decisions and actions on data, Only information.

Source: StudiousGuy


Following our previous article on Computer Vision, we are now going to further explore the world of Computer Vision in the Python Programming Language, using the OpenCV Python package. This article will show us how to perform a few more of the many operations offered by OpenCV in the Python Programming Language. In the previous article, we examined the below block of code and will now look at a few more aspects to it, in this article.

Python Code:

Source: Medium

Understanding CV Basics

As you may have seen, in our previous article, we have read (loaded) the OpenCV Logo image into our system memory, using the OpenCV library’s built-in method, imread(). Upon utilizing this method, we passed in two arguments, namely, a filename and a flag. The filename specified the name and location of the file on your personal computer, while the flag could be seen as the color setting for the image. Upon recollection, we remember that we read the image into our memory in a GRAYSCALE color format.

Now, the most crucial aspect to understanding OpenCV:

Images are data. When you use the OpenCV imread() method, you are converting that raw image data into another datatype. The new datatype is one that all of us on Analytics Vidhya are very familiar with and that is A NumPy Array Comprising Integers. Each element in the array represents pixel color intensity and may have one or more elements within it. Since we have loaded our array in GRAYSCALE color format, one will find that each pixel is represented as a single value that may take on a value equal to, and ranging from, zero (0) to two-hundred and fifty-five (255). As one moves from 0 to higher values, the intensity of a specific pixel increases, hence making it more striking to the eye.

255 = The Color White.

Essentially what I am trying to convey is as below(I have omitted the source as it is the same as the previous article, and also to allow for the flow of information):

We started with the image below:

Next, we returned the image in GRAYSCALE and obtained the image in a new format as below:

Now, we shall print the contents of the variable which is storing our GRAYSCALE image:

# variable image stored our GRAYSCALE image print(image)

-we receive output to the above code as follows:

Now, we shall print the type of the variable image:


The output will be seen as follows:

Fundamentally, OpenCV has transformed our image into a NumPy array, in which there are values from 0 to 255 representing pixel intensity, that correspond to the colors we see in the GRAYSCALE image. Remember GRAYSCALE images will always return an array in which each pixel has a single value that ranges from 0 (Black) to 255 (White).

Returning The Shape of The Array.

Let us print the shape of the NumPy array to the console.


The output will be seen as follows:

Our array has 600 rows and 487 columns. In image terminology, one would say that the image has the dimensions 600 pixels (height), by 487 pixels (width).

Printing The Image Using Pixels

Since OpenCV has transformed our image pixels into a NumPy array with integers, we may perform NumPy operations on the array containing image pixel values, and manipulate the array.

Our array is 2-Dimensional. This means it has rows and columns. Let us perform a bit of indexing and slicing on the array, and return the contents.

cv2.imshow("AV", image[0:100]) cv2.waitKey() cv2.destroyAllWindows()

If one is familiar with the technique of indexing and slicing, one will be able to see that we are attempting to slice a portion of our image (NumPy) array. Again, it is crucial to understand and be conscious of the fact that OpenCV Library in Python Programming Language represents its images and associated objects as NumPy nd-Arrays.

Source: Indian AI Production.

Code Explanation.

Line-by-line explanations for the above block of code are as follows:

cv2.imshow(“Analytics Vidhya Computer Vision “, image[0:100])

The imshow() method is used to display an image to the screen employing a GUI. However, in this particular instance we passed in a name for the GUI window, and only a portion of the pixel array, using slicing. Specifically, we wish to return the first 100 rows (height) from the image.


This will wait infinitely for the GUI window to be closed- i.e., user action/interaction will close this window. You may pass an integer value as an argument representing the duration (in milliseconds), the GUI window should wait before terminating automatically.


The above line of code will terminate all active/open OpenCV GUI windows. You may pass in the name of a specific GUI window to terminate as a string.

The output will be seen as follows:

Thus, we have successfully returned the first 100 rows of pixels from our image. Feel free to experiment with the image, and look at whether the pixel at a particular position, matches that of the color found on the image itself.

This concludes my article on Computer Vision With Python. I do hope that you enjoyed reading through this article, and have learned a new concept.

Please feel free to connect with me on LinkedIn.

Thank you for your time.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


You're reading Performing Computer Vision Task With Opencv And Python

Computer Vision Using Opencv – With Practical Examples

This article was published as a part of the Data Science Blogathon

Hello Readers!!

OpenCV is a very famous library for computer vision and image processing tasks. It one of the most used pythons open-source library for computer vision and image data.

It is used in various tasks such as image denoising, image thresholding, edge detection, corner detection, contours, image pyramids, image segmentation, face detection and many more. If you want to know more about OpenCV, check this link.

📌 If you want to know about Python Libraries For Image Processing, then check this Link.

📌If you want to learn Image processing using NumPy, check this link.

                                 Image Source

Table of Contents





















Import all the required libraries using the below commands:

import os import numpy as np import cv2 import matplotlib.pyplot as plt %matplotlib inline RGB IMAGE AND RESIZING 

An RGB image where RGB indicates Red, Green, and Blue respectively can be considered as three images stacked on top of each other. It also has a nickname called ‘True Color Image’ as it represents a real-life image as close as possible and is based on human perception of colours.

The RGB colour model is used to display images on cameras, televisions, and computers.

Resizing all images to a particular height and width will ensure uniformity and thus makes processing them easier since images are naturally available in different sizes.

If the size is reduced, though the processing is faster, data might be lost in the image. If the size is increased, the image may appear fuzzy or pixelated. Additional information is usually filled in using interpolation.

height = 224 width = 224 font_size = 20 plt.figure(figsize=(15, 8)) for i, path in enumerate(paths): name = os.path.split(path)[-1] img = cv2.imread(path, cv2.IMREAD_COLOR) resized_img = cv2.resize(img, (height, width)) plt.subplot(1, 2, i+1).set_title(name[ : -4], fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB))


Grayscale images are images that are shades of grey. It represents the degree of luminosity and carries the intensity information of pixels in the image. Black is the weakest intensity and white is the strongest intensity.

Grayscale images are efficient as they are simpler and faster than colour images during image processing.

plt.figure(figsize=(15, 8)) for i, path in enumerate(paths): name = os.path.split(path)[-1] img = cv2.imread(path, 0) resized_img = cv2.resize(img, (height, width)) plt.subplot(1, 2, i + 1).set_title(f'Grayscale {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(resized_img, cmap='gray')

IMAGE DENOISING for i, path in enumerate(paths): name = os.path.split(path)[-1] img = cv2.imread(path, cv2.IMREAD_COLOR) resized_img = cv2.resize(img, (height, width)) denoised_img = cv2.medianBlur(resized_img, 5) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title(f'Original {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB)) plt.subplot(1, 2, 2).set_title(f'After Median Filtering of {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(denoised_img, cv2.COLOR_BGR2RGB)) IMAGE THRESHOLDING

Image Thresholding is self-explanatory. If the pixel value in an image is above a certain threshold, a particular value is assigned and if it is below the threshold, another particular value is assigned.

Adaptive Thresholding does not have global threshold values. Instead, a threshold is set for a small region of the image. Hence, there are different thresholds for the entire image and they produce greater outcomes for dissimilar illumination. There are different Adaptive Thresholding methods

for i, path in enumerate(paths): name = os.path.split(path)[-1] img = cv2.imread(path, 0) resized_img = cv2.resize(img, (height, width)) denoised_img = cv2.medianBlur(resized_img, 5) th = cv2.adaptiveThreshold(denoised_img, maxValue = 255, adaptiveMethod = cv2.ADAPTIVE_THRESH_GAUSSIAN_C, thresholdType = cv2.THRESH_BINARY, blockSize = 11, C = 2) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title(f'Grayscale {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(resized_img, cmap = 'gray') plt.subplot(1, 2, 2).set_title(f'After Adapative Thresholding of {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(th, cv2.COLOR_BGR2RGB)) IMAGE GRADIENTS for i, path in enumerate(paths): name = os.path.split(path)[-1] img = cv2.imread(path, 0) resized_img = cv2.resize(img, (height, width)) laplacian = cv2.Laplacian(resized_img, cv2.CV_64F) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title(f'Grayscale {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(resized_img, cmap = 'gray') plt.subplot(1, 2, 2).set_title(f'After finding Laplacian Derivatives of {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(laplacian.astype('float32'), cv2.COLOR_BGR2RGB)) EDGE DETECTION

Edge Detection is performed using Canny Edge Detection which is a multi-stage algorithm. The stages to achieve edge detection are as follows. Noise Reduction – Smoothen image using Gaussian filter

Find Intensity Gradient – Using the Sobel kernel, find the first derivative in the horizontal (Gx) and vertical (Gy) directions.

for i, path in enumerate(paths): name = os.path.split(path)[-1] img = cv2.imread(path, 0) resized_img = cv2.resize(img, (height, width)) edges = cv2.Canny(resized_img, threshold1 = 100, threshold2 = 200) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title(f'Grayscale {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(resized_img, cmap = 'gray') plt.subplot(1, 2, 2).set_title(f'After Canny Edge Detection of {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(edges, cv2.COLOR_BGR2RGB)) FOURIER TRANSFORM ON IMAGE 

Fourier Transform analyzes the frequency characteristics of an image. Discrete Fourier Transform is used to find the frequency domain.

Fast Fourier Transform (FFT) calculates the Discrete Fourier Transform. Frequency is higher usually at the edges or wherever noise is present. When FFT is applied to the image, the high frequency is mostly in the corners of the image. To bring that to the centre of the image, it is shifted by N/2 in both horizontal and vertical directions.

Finally, the magnitude spectrum of the outcome is achieved. Fourier Transform is helpful in object detection as each object has a distinct magnitude spectrum

for i, path in enumerate(paths): name = os.path.split(path)[-1] img = cv2.imread(path, 0) resized_img = cv2.resize(img, (height, width)) freq = np.fft.fft2(resized_img) freq_shift = np.fft.fftshift(freq) magnitude_spectrum = 20 * np.log(np.abs(freq_shift)) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title(f'Grayscale {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB)) plt.subplot(1, 2, 2).set_title(f'Magnitude Spectrum of {name[ : -4]} Image', fontsize = font_size); plt.axis('off') plt.imshow(magnitude_spectrum, cmap = 'gray') LINE TRANSFORM 

Hough Transform can detect any shape even if it is distorted when presented in mathematical form. A line in the cartesian coordinate system y = mx + c can be put in its polar coordinate system as rho = xcosθ + ysinθ. rho is the perpendicular distance from the origin to the line and θ is the angle formed by the horizontal axis and the perpendicular line in the clockwise direction.

So, the line is represented in these two terms (rho, θ). An array is created for these two terms where rho forms the rows and θ forms the columns. This is called the accumulator. rho is the distance resolution of the accumulator in pixels and θ is the angle resolution of the accumulator in radians.

For every line, its (x, y) values can be put into (rho, θ) values. For every (rho, θ) pair, the accumulator is incremented. This is repeated for every point on the line. A particular (rho, θ) cell is voted for the presence of a line.

This way the cell with the maximum votes implies a presence of a line at rho distance from the origin and at angle θ degrees.

min_line_length = 100 max_line_gap = 10 img = cv2.imread('../input/cv-images/hough-min.png') resized_img = cv2.resize(img, (height, width)) img_copy = resized_img.copy() edges = cv2.Canny(resized_img, threshold1 = 50, threshold2 = 150) lines = cv2.HoughLinesP(edges, rho = 1, theta = chúng tôi / 180, threshold = 100, minLineLength = min_line_length, maxLineGap = max_line_gap) for line in lines: for x1, y1, x2, y2 in line: hough_lines_img = cv2.line(resized_img ,(x1,y1),(x2,y2),color = (0,255,0), thickness = 2) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title('Original Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB)) plt.subplot(1, 2, 2).set_title('After Hough Line Transformation', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(hough_lines_img, cv2.COLOR_BGR2RGB))


Harris Corner finds the difference in intensity for a displacement in all directions to detect a corner.

img = cv2.imread('../input/cv-images/corners-min.jpg') resized_img = cv2.resize(img, (height, width)) img_copy = resized_img.copy() gray = cv2.cvtColor(resized_img,cv2.COLOR_BGR2GRAY) gray = np.float32(gray) corners = cv2.cornerHarris(gray, blockSize = 2, ksize = 3, k = 0.04) corners = cv2.dilate(corners, None) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title('Original Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB)) plt.subplot(1, 2, 2).set_title('After Harris Corner Detection', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB))


Morphological Transformation is usually applied on binary images where it takes an image and a kernel which is a structuring element as inputs. Binary images may contain imperfections like texture and noise.

These transformations help in correcting these imperfections by accounting for the form of the image

kernel = np.ones((5,5), np.uint8) plt.figure(figsize=(15, 8)) img = cv2.imread('../input/cv-images/morph-min.jpg', cv2.IMREAD_COLOR) resized_img = cv2.resize(img, (height, width)) morph_open = cv2.morphologyEx(resized_img, cv2.MORPH_OPEN, kernel) morph_close = cv2.morphologyEx(morph_open, cv2.MORPH_CLOSE, kernel) plt.subplot(1,2,1).set_title('Original Digit - 7 Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB)) plt.subplot(1,2,2).set_title('After Morphological Opening and Closing of Digit - 7 Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(morph_close, cv2.COLOR_BGR2RGB))


Geometric Transformation of images is achieved by two transformation functions namely cv2.warpAffine and cv2.warpPerspective that receive a 2×3 and 3×3 transformation matrix respectively.

pts1 = np.float32([[1550, 1170],[2850, 1370],[50, 2600],[1850, 3450]]) pts2 = np.float32([[0,0],[4160,0],[0,3120],[4160,3120]]) img = cv2.imread('../input/cv-images/book-min.jpg', cv2.IMREAD_COLOR) transformation_matrix = cv2.getPerspectiveTransform(pts1, pts2) final_img = cv2.warpPerspective(img, M = transformation_matrix, dsize = (4160, 3120)) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img = cv2.resize(img, (256, 256)) final_img = cv2.cvtColor(final_img, cv2.COLOR_BGR2RGB) final_img = cv2.resize(final_img, (256, 256)) plt.figure(figsize=(15, 8)) plt.subplot(1,2,1).set_title('Original Book Image', fontsize = font_size); plt.axis('off') plt.imshow(img) plt.subplot(1,2,2).set_title('After Perspective Transformation of Book Image', fontsize = font_size); plt.axis('off') plt.imshow(final_img)


Contours are outlines representing the shape or form of objects in an image. They are useful in object detection and recognition. Binary images produce better contours. There are separate functions for finding and drawing contours.

plt.figure(figsize=(15, 8)) img = cv2.imread('contours-min.jpg', cv2.IMREAD_COLOR) resized_img = cv2.resize(img, (height, width)) contours_img = resized_img.copy() img_gray = cv2.cvtColor(resized_img,cv2.COLOR_BGR2GRAY) ret,thresh = cv2.threshold(img_gray, thresh = 127, maxval = 255, type = cv2.THRESH_BINARY) contours, hierarchy = cv2.findContours(thresh, mode = cv2.RETR_TREE, method = cv2.CHAIN_APPROX_NONE) cv2.drawContours(contours_img, contours, contourIdx = -1, color = (0, 255, 0), thickness = 2) plt.subplot(1,2,1).set_title('Original Image', fontsize = font_size); plt.axis('off') plt.imshow(resized_img) plt.subplot(1,2,2).set_title('After Finding Contours', fontsize = font_size); plt.axis('off') plt.imshow(contours_img)


Images have a resolution which is the measure of the information in the image. In certain scenarios of image processing like Image Blending, working with images of different resolutions is necessary to make the blend look more realistic.

In OpenCV, images of high resolution can be converted to low resolution and vice-versa. By converting a higher-level image to a lower-level image, the lower-level image becomes 1/4th the area of the higher-level image.

When this is done for a number of iterations and the resultant images are placed next to each other in order, it looks like it is forming a pyramid and hence its name ‘Image Pyramid’

R = cv2.imread('GR-min.jpg', cv2.IMREAD_COLOR) R = cv2.resize(R, (224, 224)) H = cv2.imread('../input/cv-images/H-min.jpg', cv2.IMREAD_COLOR) H = cv2.resize(H, (224, 224)) G = R.copy() guassian_pyramid_c = [G] for i in range(6): G = cv2.pyrDown(G) guassian_pyramid_c.append(G) G = H.copy() guassian_pyramid_d = [G] for i in range(6): G = cv2.pyrDown(G) guassian_pyramid_d.append(G) laplacian_pyramid_c = [guassian_pyramid_c[5]] for i in range(5, 0, -1): GE = cv2.pyrUp(guassian_pyramid_c[i]) L = cv2.subtract(guassian_pyramid_c[i-1], GE) laplacian_pyramid_c.append(L) laplacian_pyramid_d = [guassian_pyramid_d[5]] for i in range(5,0,-1): guassian_expanded = cv2.pyrUp(guassian_pyramid_d[i]) L = cv2.subtract(guassian_pyramid_d[i-1], guassian_expanded) laplacian_pyramid_d.append(L) laplacian_joined = [] for lc,ld in zip(laplacian_pyramid_c, laplacian_pyramid_d): r, c, d = lc.shape lj = np.hstack((lc[:, 0 : int(c / 2)], ld[:, int(c / 2) :])) laplacian_joined.append(lj) laplacian_reconstructed = laplacian_joined[0] for i in range(1,6): laplacian_reconstructed = cv2.pyrUp(laplacian_reconstructed) laplacian_reconstructed = cv2.add(laplacian_reconstructed, laplacian_joined[i]) direct = np.hstack((R[ : , : int(c / 2)], H[ : , int(c / 2) : ])) plt.figure(figsize=(30, 20)) plt.subplot(2,2,1).set_title('Golden Retriever', fontsize = 35); plt.axis('off') plt.imshow(cv2.cvtColor(R, cv2.COLOR_BGR2RGB)) plt.subplot(2,2,2).set_title('Husky', fontsize = 35); plt.axis('off') plt.imshow(cv2.cvtColor(H, cv2.COLOR_BGR2RGB)) plt.subplot(2,2,3).set_title('Direct Joining', fontsize = 35); plt.axis('off') plt.imshow(cv2.cvtColor(direct, cv2.COLOR_BGR2RGB)) plt.subplot(2,2,4).set_title('Pyramid Blending', fontsize = 35); plt.axis('off') plt.imshow(cv2.cvtColor(laplacian_reconstructed, cv2.COLOR_BGR2RGB))


Colourspace Conversion, BGR↔Gray, BGR↔HSV conversions are possible. The BGR↔Gray conversion was previously seen. HSV stands for Hue, Saturation, and Value respectively.

Since HSV describes images in terms of their hue, saturation, and value instead of RGB where R, G, B are all co-related to colour luminance, object discrimination is much easier with HSV images than RGB images.

lower_white = np.array([0, 0, 150]) upper_white = np.array([255, 255, 255]) img = cv2.imread('../input/cv-images/color_space_cat.jpg', cv2.IMREAD_COLOR) img = cv2.resize(img, (height, width)) background = cv2.imread("../input/cv-images/galaxy.jpg", cv2.IMREAD_COLOR) background = cv2.resize(background, (height, width)) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) hsv_img = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) mask = cv2.inRange(hsv_img, lowerb = lower_white, upperb = upper_white) final_img = cv2.bitwise_and(img, img, mask = mask) final_img = np.where(final_img == 0, background, final_img) plt.figure(figsize=(15, 8)) plt.subplot(1,2,1).set_title('Original Cat Image', fontsize = font_size); plt.axis('off') plt.imshow(img) plt.subplot(1,2,2).set_title('After Object Tracking using Color-space Conversion of Cat Image', fontsize = font_size); plt.axis('off') plt.imshow(final_img)


The foreground of the image is extracted using user input and the Gaussian Mixture Model (GMM).

img = cv2.imread('Cat.jpg', cv2.IMREAD_COLOR) img = cv2.resize(img, (height, width)) img_copy = img.copy() mask = np.zeros(img.shape[ : 2], np.uint8) background_model = np.zeros((1,65),np.float64) foreground_model = np.zeros((1,65),np.float64) rect = (10, 10, 224, 224) cv2.grabCut(img, mask = mask, rect = rect, bgdModel = background_model, fgdModel = foreground_model, iterCount = 5, mode = cv2.GC_INIT_WITH_RECT) img = img * new_mask[:, :, np.newaxis] plt.figure(figsize=(15, 8)) plt.subplot(1,2,1).set_title('Original Cat Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB)) plt.subplot(1,2,2).set_title('After Interactive Foreground Extraction of Cat Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))


Image Segmentation is done using the Watershed Algorithm. This algorithm treats the grayscale image as hills and valleys representing high and low-intensity regions respectively. If these valleys are filled with coloured water and as the water rises, depending on the peaks, different valleys with different coloured water will start to merge.

To avoid this, barriers can be built which gives the segmentation result. This is the concept of the Watershed algorithm. This is an interactive algorithm as one can specify which pixels belong to an object or background. The pixels that one is unsure about can be marked as 0. Then the watershed algorithm is applied on this where it updates the labels given and all the boundaries are marked as -1

img = cv2.imread('lymphocytes-min.jpg', cv2.IMREAD_COLOR) resized_img = cv2.resize(img, (height, width)) img_copy = resized_img.copy() gray = cv2.cvtColor(resized_img, cv2.COLOR_BGR2GRAY) ret, thresh = cv2.threshold(gray, thresh = 0, maxval = 255, type = cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU) opening = cv2.morphologyEx(thresh, op = cv2.MORPH_OPEN, kernel = kernel, iterations = 2) background = cv2.dilate(opening, kernel = kernel, iterations = 5) dist_transform = cv2.distanceTransform(opening,cv2.DIST_L2,5) ret, foreground = cv2.threshold(dist_transform, thresh = 0.2 * dist_transform.max(), maxval = 255, type = cv2.THRESH_BINARY) foreground = np.uint8(foreground) unknown = cv2.subtract(background, foreground) ret, markers = cv2.connectedComponents(foreground) markers = markers + 1 markers[unknown == 255] = 0 markers = cv2.watershed(resized_img, markers) resized_img[markers == -1] = [0, 0, 255] plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title('Lymphocytes Image', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB)) plt.subplot(1, 2, 2).set_title('After Watershed Algorithm', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(resized_img, cv2.COLOR_BGR2RGB))


Images may be damaged and require fixing. For example, an image may have no pixel information in certain portions. Image Inpainting will fill all the missing information with the help of the surrounding pixels.

mask = cv2.imread('mask.png',0) mask = cv2.resize(mask, (height, width)) for i, path in enumerate(paths): name = os.path.split(path)[-1] img = cv2.imread(path, cv2.IMREAD_COLOR) resized_img = cv2.resize(img, (height, width)) ret, th = cv2.threshold(mask, 127, 255, cv2.THRESH_BINARY) inverted_mask = cv2.bitwise_not(th) damaged_img = cv2.bitwise_and(resized_img, resized_img, mask = inverted_mask) result = cv2.inpaint(resized_img, mask, inpaintRadius = 3, flags = cv2.INPAINT_TELEA) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title(f'Damaged Image of {name[ : -4]}', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(damaged_img, cv2.COLOR_BGR2RGB)) plt.subplot(1, 2, 2).set_title(f'After Image Inpainting of {name[ : -4]}', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(result, cv2.COLOR_BGR2RGB))


Template Matching matches the template provided to the image in which the template must be found. The template is compared to each patch of the input image. This is similar to a 2D convolution operation. It results in a grayscale image where each pixel denotes the similarity of the neighbourhood pixels to that of the template.

From this output, the maximum/minimum value is determined. This can be regarded as the top-left corner coordinates of the rectangle. By also considering the width and height of the template, the resultant rectangle is the region of the template in the image.

w, h, c = template.shape method = eval('cv2.TM_CCOEFF') result = cv2.matchTemplate(img, templ = template, method = method) min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result) top_left = max_loc bottom_right = (top_left[0] + w, top_left[1] + h) cv2.rectangle(img, top_left, bottom_right, color = (255, 0, 0), thickness = 3) plt.figure(figsize=(30, 20)) plt.subplot(2, 2, 1).set_title('Image of Selena Gomez and Taylor Swift', fontsize = 35); plt.axis('off') plt.imshow(cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB)) plt.subplot(2, 2, 2).set_title('Face Template of Selena Gomez', fontsize = 35); plt.axis('off') plt.imshow(cv2.cvtColor(template, cv2.COLOR_BGR2RGB)) plt.subplot(2, 2, 3).set_title('Matching Result', fontsize = 35); plt.axis('off') plt.imshow(result, cmap = 'gray') plt.subplot(2, 2, 4).set_title('Detected Face', fontsize = 35); plt.axis('off') plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))


It is done by using Haar Cascades. Check the below code for face and eye detection:

face_cascade = cv2.CascadeClassifier( + 'haarcascade_frontalface_default.xml') eye_cascade = cv2.CascadeClassifier( + 'haarcascade_eye.xml') img = cv2.imread('../input/cv-images/elon-min.jpg') img = cv2.resize(img, (height, width)) img_copy = img.copy() gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) faces = face_cascade.detectMultiScale(gray, scaleFactor = 1.3, minNeighbors = 5) for (fx, fy, fw, fh) in faces: img = cv2.rectangle(img, (fx, fy), (fx + fw, fy + fh), (255, 0, 0), 2) roi_gray = gray[fy:fy+fh, fx:fx+fw] roi_color = img[fy:fy+fh, fx:fx+fw] eyes = eye_cascade.detectMultiScale(roi_gray) for (ex, ey, ew, eh) in eyes: cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 255, 0), 2) plt.figure(figsize=(15, 8)) plt.subplot(1, 2, 1).set_title('Elon Musk', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(img_copy, cv2.COLOR_BGR2RGB)) plt.subplot(1, 2, 2).set_title('Elon Musk - After Face and Eyes Detections', fontsize = font_size); plt.axis('off') plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

End Notes

So in this article, we had a detailed discussion on Computer Vision Using OpenCV. Hope you learn something from this blog and it will help you in the future. Thanks for reading and your patience. Good luck!

You can check my articles here: Articles

Email id: [email protected]

Connect with me on LinkedIn: LinkedIn

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion


Computer Vision Tutorial: A Step

19 minutes


Rating: 5 out of 5.


What’s the first thing you do when you’re attempting to cross the road? We typically look left and right, take stock of the vehicles on the road, and make our decision. Our brain is able to analyze, in a matter of milliseconds, what kind of vehicle (car, bus, truck, auto, etc.) is coming towards us. Can machines do that?

Now, there are multiple ways of dealing with computer vision challenges. The most popular approach I have come across is based on identifying the objects present in an image, aka, object detection. But what if we want to dive deeper? What if just detecting objects isn’t enough – we want to analyze our image at a much more granular level?

As data scientists, we are always curious to dig deeper into the data. Asking questions like these is why I love working in this field!

In this article, I will introduce you to the concept of image segmentation. It is a powerful computer vision algorithm that builds upon the idea of object detection and takes us to a whole new level of working with image data. This technique opens up so many possibilities – it has blown my mind.

What Is Image Segmentation?

Let’s understand image segmentation using a simple example. Consider the below image:

There’s only one object here – a dog. We can build a straightforward cat-dog classifier model and predict that there’s a dog in the given image. But what if we have both a cat and a dog in a single image?

We can train a multi-label classifier, in that instance. Now, there’s another caveat – we won’t know the location of either animal/object in the image.

That’s where image localization comes into the picture (no pun intended!). It helps us to identify the location of a single object in the given image. In case we have multiple objects present, we then rely on the concept of object detection (OD). We can predict the location along with the class for each object using OD.

Before detecting the objects and even before classifying the image, we need to understand what the image consists of. Enter – Image Segmentation.

How Does Image Segmentation Work?

We can divide or partition the image into various parts called segments. It’s not a great idea to process the entire image at the same time as there will be regions in the image which do not contain any information. By dividing the image into segments, we can make use of the important segments for processing the image. That, in a nutshell, is how image segmentation works.

An image is a collection or set of different pixels. We group together the pixels that have similar attributes using image segmentation. Take a moment to go through the below visual (it’ll give you a practical idea of image segmentation):

Source :

Object detection builds a bounding box corresponding to each class in the image. But it tells us nothing about the shape of the object. We only get the set of bounding box coordinates. We want to get more information – this is too vague for our purposes.

Image segmentation creates a pixel-wise mask for each object in the image. This technique gives us a far more granular understanding of the object(s) in the image.

Why do we need to go this deep? Can’t all image processing tasks be solved using simple bounding box coordinates? Let’s take a real-world example to answer this pertinent question.

What Is Image Segmentation Used For?

The shape of the cancerous cells plays a vital role in determining the severity of the cancer. You might have put the pieces together – object detection will not be very useful here. We will only generate bounding boxes which will not help us in identifying the shape of the cells.

Image Segmentation techniques make a MASSIVE impact here. They help us approach this problem in a more granular manner and get more meaningful results. A win-win for everyone in the healthcare industry.

Source: Wikipedia

Here, we can clearly see the shapes of all the cancerous cells. There are many other applications where Image segmentation is transforming industries:

Traffic Control Systems

Self Driving Cars

Locating objects in satellite images

Different Types of Image Segmentation

We can broadly divide image segmentation techniques into two types. Consider the below images:

Can you identify the difference between these two? Both the images are using image segmentation to identify and locate the people present.

In image 1, every pixel belongs to a particular class (either background or person). Also, all the pixels belonging to a particular class are represented by the same color (background as black and person as pink). This is an example of semantic segmentation

Image 2 has also assigned a particular class to each pixel of the image. However, different objects of the same class have different colors (Person 1 as red, Person 2 as green, background as black, etc.). This is an example of instance segmentation

Let me quickly summarize what we’ve learned. If there are 5 people in an image, semantic segmentation will focus on classifying all the people as a single instance. Instance segmentation, on the other hand. will identify each of these people individually.

So far, we have delved into the theoretical concepts of image processing and segmentation. Let’s mix things up a bit – we’ll combine learning concepts with implementing them in Python. I strongly believe that’s the best way to learn and remember any topic.

Region-based Segmentation

One simple way to segment different objects could be to use their pixel values. An important point to note – the pixel values will be different for the objects and the image’s background if there’s a sharp contrast between them.

In this case, we can set a threshold value. The pixel values falling below or above that threshold can be classified accordingly (as an object or the background). This technique is known as Threshold Segmentation.

If we want to divide the image into two regions (object and background), we define a single threshold value. This is known as the global threshold.

If we have multiple objects along with the background, we must define multiple thresholds. These thresholds are collectively known as the local threshold.

Let’s implement what we’ve learned in this section. Download this image and run the below code. It will give you a better understanding of how thresholding works (you can use any image of your choice if you feel like experimenting!).

First, we’ll import the required libraries.

View the code on Gist.

Let’s read the downloaded image and plot it:

View the code on Gist.

It is a three-channel image (RGB). We need to convert it into grayscale so that we only have a single channel. Doing this will also help us get a better understanding of how the algorithm works.

Python Code:

Now, we want to apply a certain threshold to this image. This threshold should separate the image into two parts – the foreground and the background. Before we do that, let’s quickly check the shape of this image:


(192, 263)

The height and width of the image is 192 and 263 respectively. We will take the mean of the pixel values and use that as a threshold. If the pixel value is more than our threshold, we can say that it belongs to an object. If the pixel value is less than the threshold, it will be treated as the background. Let’s code this:

View the code on Gist.

Nice! The darker region (black) represents the background and the brighter (white) region is the foreground. We can define multiple thresholds as well to detect multiple objects:

View the code on Gist.

Calculations are simpler

Fast operation speed

When the object and background have high contrast, this method performs really well

But there are some limitations to this approach. When we don’t have significant grayscale difference, or there is an overlap of the grayscale pixel values, it becomes very difficult to get accurate segments.

Edge Detection Segmentation

What divides two objects in an image? There is always an edge between two adjacent regions with different grayscale values (pixel values). The edges can be considered as the discontinuous local features of an image.

We can make use of this discontinuity to detect edges and hence define a boundary of the object. This helps us in detecting the shapes of multiple objects present in a given image. Now the question is how can we detect these edges? This is where we can make use of filters and convolutions. Refer to this article if you need to learn about these concepts.

The below visual will help you understand how a filter colvolves over an image :

Here’s the step-by-step process of how this works:

Take the weight matrix

Put it on top of the image

Perform element-wise multiplication and get the output

Move the weight matrix as per the stride chosen

Convolve until all the pixels of the input are used

One such weight matrix is the sobel operator. It is typically used to detect edges. The sobel operator has two weight matrices – one for detecting horizontal edges and the other for detecting vertical edges. Let me show how these operators look and we will then implement them in Python.

Sobel filter (horizontal) =


Sobel filter (vertical) =


Edge detection works by convolving these filters over the given image. Let’s visualize them on this article.

View the code on Gist.

It should be fairly simple for us to understand how the edges are detected in this image. Let’s convert it into grayscale and define the sobel filter (both horizontal and vertical) that will be convolved over this image:

View the code on Gist.

Now, convolve this filter over the image using the convolve function of the ndimage package from scipy.

View the code on Gist.

Let’s plot these results:

View the code on Gist. View the code on Gist.

Here, we are able to identify the horizontal as well as the vertical edges. There is one more type of filter that can detect both horizontal and vertical edges at the same time. This is called the laplace operator:


Let’s define this filter in Python and convolve it on the same image:

View the code on Gist.

Next, convolve the filter and print the output:

View the code on Gist.

Here, we can see that our method has detected both horizontal as well as vertical edges. I encourage you to try it on different images and share your results with me. Remember, the best way to learn is by practicing!

Clustering-based Image Segmentation

This idea might have come to you while reading about image segmentation. Can’t we use clustering techniques to divide images into segments? We certainly can!

In this section, we’ll get an an intuition of what clustering is (it’s always good to revise certain concepts!) and how we can use of it to segment images.

Clustering is the task of dividing the population (data points) into a number of groups, such that data points in the same groups are more similar to other data points in that same group than those in other groups. These groups are known as clusters.

K-means Clustering

One of the most commonly used clustering algorithms is k-means. Here, the k represents the number of clusters (not to be confused with k-nearest neighbor). Let’s understand how k-means works:

First, randomly select k initial clusters

Randomly assign each data point to any one of the k clusters

Calculate the centers of these clusters

Calculate the distance of all the points from the center of each cluster

Depending on this distance, the points are reassigned to the nearest cluster

Calculate the center of the newly formed clusters

Finally, repeat steps (4), (5) and (6) until either the center of the clusters does not change or we reach the set number of iterations

Let’s put our learning to the test and check how well k-means segments the objects in an image. We will be using this image, so download it, read it and and check its dimensions:

View the code on Gist.

It’s a 3-dimensional image of shape (192, 263, 3). For clustering the image using k-means, we first need to convert it into a 2-dimensional array whose shape will be (length*width, channels). In our example, this will be (192*263, 3).

View the code on Gist.

(50496, 3)

We can see that the image has been converted to a 2-dimensional array. Next, fit the k-means algorithm on this reshaped array and obtain the clusters. The cluster_centers_ function of k-means will return the cluster centers and labels_ function will give us the label for each pixel (it will tell us which pixel of the image belongs to which cluster).

View the code on Gist.

I have chosen 5 clusters for this article but you can play around with this number and check the results. Now, let’s bring back the clusters to their original shape, i.e. 3-dimensional image, and plot the results.

View the code on Gist.

Amazing, isn’t it? We are able to segment the image pretty well using just 5 clusters. I’m sure you’ll be able to improve the segmentation by increasing the number of clusters.

k-means works really well when we have a small dataset. It can segment the objects in the image and give impressive results. But the algorithm hits a roadblock when applied on a large dataset (more number of images).

It looks at all the samples at every iteration, so the time taken is too high. Hence, it’s also too expensive to implement. And since k-means is a distance-based algorithm, it only applies to convex datasets and is unsuitable for clustering non-convex clusters.

Finally, let’s look at a simple, flexible and general approach for image segmentation.

Mask R-CNN

Data scientists and researchers at Facebook AI Research (FAIR) pioneered a deep learning architecture, called Mask R-CNN, that can create a pixel-wise mask for each object in an image. This is a really cool concept so follow along closely!

Mask R-CNN is an extension of the popular Faster R-CNN object detection architecture. Mask R-CNN adds a branch to the already existing Faster R-CNN outputs. The Faster R-CNN method generates two things for each object in the image:

Its class

The bounding box coordinates

Mask R-CNN adds a third branch to this which outputs the object mask as well. Take a look at the below image to get an intuition of how Mask R-CNN works on the inside:


We take an image as input and pass it to the ConvNet, which returns the feature map for that image

Region proposal network (RPN) is applied on these feature maps. This returns the object proposals along with their objectness score

A RoI pooling layer is applied on these proposals to bring down all the proposals to the same size

Finally, the proposals are passed to a fully connected layer to classify and output the bounding boxes for objects. It also returns the mask for each proposal

Mask R-CNN is the current state-of-the-art for image segmentation and runs at 5 fps.

Summary of Image Segmentation Techniques

I have summarized the different image segmentation algorithms in the below table.. I suggest keeping this handy next time you’re working on an image segmentation challenge or problem!

AlgorithmDescriptionAdvantagesLimitationsRegion-Based SegmentationSeparates the objects into different regions based on some threshold value(s).a. Simple calculations

b. Fast operation speed

c. When the object and background have high contrast, this method performs really well

When there is no significant grayscale difference or an overlap of the grayscale pixel values, it becomes very difficult to get accurate chúng tôi Detection SegmentationMakes use of discontinuous local features of an image to detect edges and hence define a boundary of the chúng tôi is good for images having better contrast between chúng tôi suitable when there are too many edges in the image and if there is less contrast between objects.Segmentation based on ClusteringDivides the pixels of the image into homogeneous clusters.Works really well on small datasets and generates excellent clusters.a. Computation time is too large and expensive.

b. k-means is a distance-based algorithm. It is not suitable for clustering non-convex clusters.

Mask R-CNNGives three outputs for each object in the image: its class, bounding box coordinates, and object maska. Simple, flexible and general approach

b. It is also the current state-of-the-art for image segmentation

High training time


This article is just the beginning of our journey to learn all about image segmentation. In the next article of this series, we will deep dive into the implementation of Mask R-CNN. So stay tuned!

I have found image segmentation quite a useful function in my deep learning career. The level of granularity I get from these techniques is astounding. It always amazes me how much detail we are able to extract with a few lines of code. I’ve mentioned a couple of useful resources below to help you out in your computer vision journey:

Frequently Asked Questions

Q1. What are the different types of image segmentation?

A. There are mainly 4 types of image segmentation: region-based segmentation, edge detection segmentation, clustering-based segmentation, and mask R-CNN.

Q2. What is the best image segmentation method?

A. Clustering-based segmentation techniques such as k-means clustering are the most commonly used method for image segmentation.

Q3. What is image segmentation?

A. Image segmentation is the process of filtering or categorizing a database of images into classes, subsets, or regions based on certain specific features or characteristics.


Color Picker Application Using Computer Vision

Here arise one question because the one who is familiar with the OpenCV/is well aware that to show the image we ideally use cv2.imshow function but just to make one thing clear I’m personally a fan of Jupiter notebook and here to see the results cv2.imshow function won’t work it will just crash that kernel also when you will search about this issue you will find out that using the cv2.imshow function is meaningless to use in the client-side server (Jupiter notebook) hence we use matplotlib (plt.imshow) to plot the result in the form of an image.

Let’s declare some global variables which will be accessible along with the whole code.

flag_variable = False red_channel = g_channel = b_channel = x_coordinate = y_coordinate = 0


Then we have red, green, and blue channels (RGB) along with that the X and Y coordinate which for now is set to 0 but as soon as we will move around the image and pick the colors from it then these values will get changed.

Now we will read the color CSV file and give the heading name to every column.

heading = ["Color", "Name of color", "Hexadecimal code", "Red channel", "Green channel", "Blue channel"] color_csv = pd.read_csv('colors.csv', names=heading, header=None)


We are setting the name of headings that the color CSV file will have.

Then we will be reading the color.csv file with the help of the read_csv function.

Note: So, this color CSV file has the name, hexadecimal code, RGB values of the color we will be comparing the values from this CSV file only.

Now, we will create the function to get the name of the color (get_color_name).

def get_color_name(Red, Green, Blue): minimum = 10000 for i in range(len(color_csv)): distance = abs(Red - int(color_csv.loc[i, "Red channel"])) + abs(Green - int(color_csv.loc[i, "Green channel"])) + abs(Blue - int(color_csv.loc[i, "Blue channel"])) if distance <= minimum: minimum = distance color_name = color_csv.loc[i, "Name of color"] return color_name


So, here first we are setting the threshold value to 10000 i.e. minimum threshold distance between the actual color code and the one which we got while selecting the color from the image.

Then we have calculated the distance of the color code from the image.

Now we will just see that the distance that we have calculated should be less than or equal to the threshold distance.

At last, we will store the name of the color from the CSV file and return it.

Now we will create the function to get the coordinates (draw_function)

def draw_function(event, x_coordinate, y_coordinate, flags, parameters): if event == cv2.EVENT_LBUTTONDBLCLK: global b, g, r, x_position, y_position, flag_variable flag_variable = True x_position = x_coordinate y_position = y_coordinate b, g, r = test[y_coordinate, x_coordinate] b = int(b) g = int(g) r = int(r)


Then comes the main part of the function in which we will store the values of coordinates and their corresponding RGB values in the global variables.

At the last, we will just convert the values to integer type using int().

cv2.namedWindow('image') cv2.setMouseCallback('image', draw_function) while True: cv2.imshow("image", test) if flag_variable: cv2.rectangle(test, (20, 20), (750, 60), (b, g, r), -1) text = get_color_name(r, g, b) + ' R=' + str(r) + ' G=' + str(g) + ' B=' + str(b) cv2.putText(test, text, (50, 50), 2, 0.8, (255, 255, 255), 2, cv2.LINE_AA) cv2.putText(test, text, (50, 50), 2, 0.8, (0, 0, 0), 2, cv2.LINE_AA) flag_variable = False if cv2.waitKey(20) & 0xFF == 27: break cv2.destroyAllWindows()


Source: Author

Source: Author


In the main logic, firstly we will create the rectangle(filled: -1 is used to fill the rectangle) on which we will have our text.

Now we will have the string of our text which will have a color code of RGB.

Then with the help of the put text method, we will show the text just above the rectangle that we have previously drawn.

We have our validation if the color is light, then we will display the text string in black color.

At last, we will have the option to quit the application with the escape key i.e 27.


First, we have imported all the libraries.

Then we loaded and plot the selected image.

Then we have given the headings to our color CSV file.

Then the app loop to execute all the steps.

Thus, by implementing the above steps, we can develop a Color picker applications using computer vision.


Read on AV Blog about various predictions using Machine Learning.

About Me

Along with full-time work, I’ve got an immense interest in the same field, i.e. Data Science, along with its other subsets of Artificial Intelligence such as Computer Vision, Machine Learning, and Deep learning; feel free to collaborate with me on any project on the domains mentioned above (LinkedIn).

Hope you liked my article on Heart Disease Prediction? You can access my other articles, which are published on Analytics Vidhya as a part of the Blogathon link.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.

Become A Computer Vision Artist With Stanford’s Game Changing ‘Outpainting’ Algorithm (With Github Link)


Stanford researchers have designed ‘Outpainting’ – an algorithm that extrapolates and extends existing images

At the core of the algorithm are GANs; these were used on a dataset of 36,500 images, with 100 held out in the validation set

You can try it out yourself using a Keras implementation that has been open sourced on GitHub


If you are a keen follower of AVBytes, you must have read about a technique called “inpainting” (read up on that in case you haven’t yet, it’s really worth it). It is a popular computer vision technique that aims to restore missing parts in an image and has produced some exquisite results, as you will see in that article. Current state-of-the-art methods for inpainting involve GANs (Generative Adversarial Networks) and CNNs (Convolutional Neural Networks).

It was only a matter of time before someone from the ML community figured out a technique that goes beyond the scope of inpainting. This breakthrough has come from a couple of Stanford researchers, Mark Sabini and Gili Rusak, and the new technique is appropriately named “outpainting”.

This approach extends the use of GANs for inpainting to estimate and imagine what the existing image might look like beyond what can be seen. Then the algorithm expands the image and paints what it has estimated – and the results, as you can see in the image below, are truly astounding.

For the dataset, the researchers used 36,500 images of 256×256 size, which were downsampled to 128×128. 100 images were held out for the validation set.

Even the research paper for outpainting has been written in a user-friendly format. Instead of the usual page after page of theory, the paper is of just 2 pages – one which lists down how the technique was derived and how it works, and the second which contains a list of references. Check out the image of the first page below which lists down a step-by-step approach for designing and executing outpainting:

Wondering how to implement this on your own? Wonder no more – use this GitHub repository as your stepping stone. It is a Keras implementation of outpainting in Python. It gives you the option to either build your model from scratch or use the pertained model the creator has uploaded. Get started now!

Our take on this

What an awesome concept! If this doesn’t get your interest in computer vision going, I don’t know what will. Take this course to learn all about computer vision with deep learning, and get started on your path toward becoming a CV expert!

For the Keras model, there’s a caveat here – as you’ll read in this Reddit discussion thread, there’s a chance that the Keras model was overfitted. The model was trained on images that were present in the training set itself so it was able to convincingly extrapolate the generated image. The model still did fairly well when tested on unseen data, but not as well as first imagined. But don’t let that dissuade you! The Stanford technique is still solid, and there will be far more refined frameworks coming soon using outpainting. Hope to see one from our Analytics Vidhya community as well!

Subscribe to AVBytes here


Facial Landmark Detection Simplified With Opencv

This article was published as a part of the Data Science Blogathon

OpenCV is the cross-platform open-source library for computer vision, machine learning, and image processing using which we can develop real-time computer vision applications. It is mainly used for image or video processing and also analysis including object detection, face detection, etc.

Facial landmarks are used to localize and represent important regions of the face, such as:

· Mouth

· Eyes

· Eyebrows

· Nose

· Jawline etc.


Facial landmarks have many applications such as:

Face Replacement:

If we have facial landmark feature points estimated on two different faces, we can align one face to the other, and then flawlessly we can clone one face onto the other.

Face Morphing:

Facial landmarks can be used to produce in-between images by aligning faces that can be morphed.

Head pose estimation:

Once we know a few facial landmark points, then we also estimate the pose of the head.

MediaPipe Face Mesh

MediaPipe Face Mesh estimates 468 3D face landmarks in real-time even on mobile devices. It requires only a single camera input by applying machine learning (ML) to infer the 3D surface geometry, without the need for a dedicated depth sensor. It delivers better real-time performance.


The model for 3D face landmarks has been employed using transfer learning and it is trained on a network with different objectives: the network predicts 3D landmark coordinates on synthetic rendered data. The resulting network performed reasonably well on real-world data.

The 3D landmark network takes input as a cropped video frame without additional depth input. The model outputs the positions of the 3D points, reasonably aligned in the input.


The Geometry Pipeline is a crucial component, it estimates the face geometry objects within the 3D Metric space. On each frame, the following steps are executed respectively:

are converted into the Metric 3D space coordinates.

Face pose the transformation matrix is estimated as a rigid linear mapping from the canonical face metric landmark which is then sent into the runtime face a metric landmark that minimizes a difference between the two.

Then the runtime face metric landmarks create a face mesh.

Let’s Implements it.

First, let us check if our webcam ids working fine and print the frames per second (fps) on the output screen.

import cv2 import time cap = cv2.VideoCapture(0) pTime = 0 while True: success, img = imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) cTime = time.time() fps = 1/(cTime-pTime) pTime = cTime cv2.putText(img, f'FPS:{int(fps)}', (20, 70), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) cv2.imshow("Test", img) cv2.waitKey(1)

It should open a window if you have a webcam or else, instead of zero inside the ‘VideoCapture’ function you can mention the path to a video. And on the top left corner, you can see the FPS (varying) as shown below.

Now let’s create a new python file and start creating our module for facial landmark detection.

Install the required modules.

- pip install opencv-python - pip install mediapipe import cv2 import mediapipe as mp import time cap = cv2.VideoCapture(0) pTime = 0 NUM_FACE = 2 mpDraw = mpFaceMesh = faceMesh = mpFaceMesh.FaceMesh(max_num_faces=NUM_FACE) drawSpec = mpDraw.DrawingSpec(thickness=1, circle_radius=1)

Here in the above code, we are taking the input from the webcam and the variable ‘NUM_FACE’ tells how many faces to detect and locate the facial landmark from the frames. To draw the facial points we use ‘mpDraw’ variable. We will use ‘’ to create the face mesh. In order to control the thickness of the connection lines and the points we will use ‘drawSpec’.

while True: success, img = imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) results = faceMesh.process(imgRGB) if results.multi_face_landmarks: for faceLms in results.multi_face_landmarks: mpDraw.draw_landmarks(img, faceLms,mpFaceMesh.FACE_CONNECTIONS, drawSpec, drawSpec) for id,lm in enumerate(faceLms.landmark): print(lm) ih, iw, ic = img.shape x,y = int(lm.x*iw), int(lm.y*ih) # cv2.putText(img, str(id), (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.3, (0, 255, 0), 1) print(id, x,y) cTime = time.time() fps = 1/(cTime-pTime) pTime = cTime cv2.putText(img, f'FPS:{int(fps)}', (20,70), cv2.FONT_HERSHEY_SIMPLEX, 1, (0,255,0), 2) cv2.imshow("Test", img) cv2.waitKey(1)

Now in order to create a module, so that we can use it in different projects, first we need to create a class with a function in it.

import cv2 import mediapipe as mp import time NUM_FACE = 2 class FaceLandMarks(): def __init__(self, staticMode=False,maxFace=NUM_FACE, minDetectionCon=0.5, minTrackCon=0.5): self.staticMode = staticMode self.maxFace = maxFace self.minDetectionCon = minDetectionCon self.minTrackCon = minTrackCon self.mpDraw = self.mpFaceMesh = self.faceMesh = self.mpFaceMesh.FaceMesh(self.staticMode, self.maxFace, self.minDetectionCon, self.minTrackCon) self.drawSpec = self.mpDraw.DrawingSpec(thickness=1, circle_radius=1) def findFaceLandmark(self, img, draw=True): self.imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) self.results = self.faceMesh.process(self.imgRGB) faces = [] if self.results.multi_face_landmarks: for faceLms in self.results.multi_face_landmarks: if draw: self.mpDraw.draw_landmarks(img, faceLms, self.mpFaceMesh.FACE_CONNECTIONS, self.drawSpec, self.drawSpec) face = [] for id, lm in enumerate(faceLms.landmark): # print(lm) ih, iw, ic = img.shape x, y = int(lm.x * iw), int(lm.y * ih) #cv2.putText(img, str(id), (x,y), cv2.FONT_HERSHEY_SIMPLEX, 0.3, (0,255,0), 1) #print(id, x, y) face.append([x,y]) faces.append(face) return img, faces def main(): cap = cv2.VideoCapture(0) pTime = 0 detector = FaceLandMarks() while True: success, img = img, faces = detector.findFaceLandmark(img) if len(faces)!=0: print(len(faces)) cTime = time.time() fps = 1 / (cTime - pTime) pTime = cTime cv2.putText(img, f'FPS:{int(fps)}', (20, 70), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2) cv2.imshow("Test", img) cv2.waitKey(1) if __name__ == "__main__": main() Conclusion

In the above code, the function name is ‘findFaceLandmarks’ which detects the facial landmark and does the same function as explained above. The class ‘FaceLandMarks()’ takes static mode, a maximum number of faces, and minimum detection confidence and minimum tracking confidence. Then create the main function to run the code.


My LinkedIn

Thank you

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


Update the detailed information about Performing Computer Vision Task With Opencv And Python on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!