CNN

#node #algorithm #deepLearning #machineLearning

Node: CNN

Convolution: multipying each element from two marix that are in corresponding positions (in Python, use function: conv_forward)
- Why we use Convolution: the filter's function is amplifying the differences between the elements (vertical or horizontal) in a specific matrix
- For detecting the changes in intense of a image;
  - if the result of convolution is a positive value, it means the intense of the image turns bright to dark
  - if the result of convolution is a negative value, it means the intense of the image turns dark to bright
the use of filter is still discussed:
- Sobel filter: it put more weight to the central row, makes it more robust. $$\left[\begin{array}{ccc} 1 & 0 & -1\ 2 & 0 & -2 \ 1 & 0 & -1\end{array}\right]$$
- Scharr filter: also used in previous studies. $$\left[\begin{array}{ccc} 3 & 0 & -3\ 10 & 0 & -10 \ 3 & 0 & -3\end{array}\right]$$
We could treat the elements in the matrix, in other words, the weights of the filter as the parameters, then backprop to learn, thus makes them better in capturing the statistics of data than any of these hand code filters.$$\left[\begin{array}{ccc} w_1 & w_2 & w_3\ w_4 & w_5 & w_6 \ w_7 & w_8 & w_9\end{array}\right]$$

Some shorts by using convolution:
- Shranking the image: When you want to conduct a $f \times f$ filter on a $n \times n$ matrix, the output should be a $(n - f + 1) \times (n - f + 1)$ matrix
- Information Loss on edge: throw away a lot of information near the edge of the image is unused
p = padding the extra pixel of the image (if p =1, it means add extra pixel surrounding the original matrix)
Output should be a $(n - f + 2 p + 1) \times (n - f + 2 p + 1)$ matrix
Two common types of convolutions
- Valid: means no padding
- Same: Pad so that output size is the same as the input size
  - let $n - f + 2 p + 1 = n$ , then we got $p = (f - 1) / 2$
  - here, $f$ is always odd.
    - odd dimension filter corresponse no asymmetric padding; e.g., a $4 \times 4$ filter need pixels padded to each sides unblanced (2 on left 1 on right , or, vice versa)
    - odd dimension filter has a central position which is a significant distinguisher (so called 'central pixels')

Sometimes we need to conduct convolution over one step
At that time:
- S = the steps we stride; e.g, 2 means we stride 2 elements as unit to convolute
- the output should be a $(\frac{n - f + 2 p}{s} + 1) \times (\frac{n - f + 2 p}{s} + 1)$ matrix
- when the result is not an integer, we round down it as $⌊ \frac{n - f + 2 p}{s} + 1 ⌋ \times ⌊ \frac{n - f + 2 p}{s} + 1 ⌋$

Notes:

In most of deep learning literature, cross-correlation is called as Convolution
BUT as the necessary step of convolution, we need to transposition (flip it over its diagonal) the filter matrix before multipy each element from both filter and target matrix
In original Convolution, the function hold Associativity: $(A * B) * C = A * (B * C)$ which is important in the context of signal operation
However, for the reason of simplifying code, we just skip the transposition

Usually, iamges with colors are represented as a $n \times n \times c$ matrix, and a filter as a $f \times f \times c$ matrix.
$c$ is the channels of the image (RGB images have 3 channels)
Calculation:
- just like 2-dimetion convolution, we multipy each element from the target matrix and filter matrix in corresponding positions within corresponding channels.
- e.g., for red channel, we conduct convolution on target and filter red matrixs. Then for green and so on. Finally, we add up the vaule of multiplications.
- Which could be represented as following:
  - (h, w) represents the pixel postion;
  - c represent the color channel of the image;
  - i and j are the row and column of the filter kernel;
  - d is the index of the color channel

output (h, w, c) = \sum_{i = 0}^{K - 1} \sum_{j = 0}^{K - 1} \sum_{d = 0}^{2} input (h + i, w + j, d) \times filter (i, j, d, c)

F^{(i)} = σ (\sum_{j = 1}^{k^{(i - 1)}} W_{j}^{(i)} * F^{(i - 1)})