CNN
Node: CNN
How to detect edges of image
-
Convolution: multipying each element from two marix that are in corresponding positions (in Python, use function: conv_forward)
- Why we use Convolution: the filter's function is amplifying the differences between the elements (vertical or horizontal) in a specific matrix
- For detecting the changes in intense of a image;
- if the result of convolution is a positive value, it means the intense of the image turns bright to dark
- if the result of convolution is a negative value, it means the intense of the image turns dark to bright
-
the use of filter is still discussed:
- Sobel filter: it put more weight to the central row, makes it more robust. $$\left[\begin{array}{ccc} 1 & 0 & -1\ 2 & 0 & -2 \ 1 & 0 & -1\end{array}\right]$$
- Scharr filter: also used in previous studies. $$\left[\begin{array}{ccc} 3 & 0 & -3\ 10 & 0 & -10 \ 3 & 0 & -3\end{array}\right]$$
-
We could treat the elements in the matrix, in other words, the weights of the filter as the parameters, then backprop to learn, thus makes them better in capturing the statistics of data than any of these hand code filters.$$\left[\begin{array}{ccc} w_1 & w_2 & w_3\ w_4 & w_5 & w_6 \ w_7 & w_8 & w_9\end{array}\right]$$
Padding
- Some shorts by using convolution:
- Shranking the image: When you want to conduct a
filter on a matrix, the output should be a matrix - Information Loss on edge: throw away a lot of information near the edge of the image is unused
- Shranking the image: When you want to conduct a
- p = padding the extra pixel of the image (if p =1, it means add extra pixel surrounding the original matrix)
- Output should be a
matrix - Two common types of convolutions
- Valid: means no padding
- Same: Pad so that output size is the same as the input size
- let
, then we got - here,
is always odd. - odd dimension filter corresponse no asymmetric padding; e.g., a
filter need pixels padded to each sides unblanced (2 on left 1 on right , or, vice versa) - odd dimension filter has a central position which is a significant distinguisher (so called 'central pixels')
- odd dimension filter corresponse no asymmetric padding; e.g., a
- let
Strided convolution
- Sometimes we need to conduct convolution over one step
- At that time:
- S = the steps we stride; e.g, 2 means we stride 2 elements as unit to convolute
- the output should be a
matrix - when the result is not an integer, we round down it as
Notes:
- In most of deep learning literature, cross-correlation is called as Convolution
- BUT as the necessary step of convolution, we need to transposition (flip it over its diagonal) the filter matrix before multipy each element from both filter and target matrix
- In original Convolution, the function hold Associativity:
which is important in the context of signal operation - However, for the reason of simplifying code, we just skip the transposition
Convolutions on RGB images
- Usually, iamges with colors are represented as a
matrix, and a filter as a matrix. is the channels of the image (RGB images have 3 channels) - Calculation:
- just like 2-dimetion convolution, we multipy each element from the target matrix and filter matrix in corresponding positions within corresponding channels.
- e.g., for red channel, we conduct convolution on target and filter red matrixs. Then for green and so on. Finally, we add up the vaule of multiplications.
- Which could be represented as following:
- (h, w) represents the pixel postion;
- c represent the color channel of the image;
- i and j are the row and column of the filter kernel;
- d is the index of the color channel
Layers of Convolutions
- if we want to conduction multiple layers as filters to detect the features of the image, we may define:
= filter size; = padding; = stride; = number of filters ( is the layer) - Input could be:
- Output could be:
which is Activations - the number of filters should be:
- We could also remark it as following: