Convolutional neural network(CNNs) is a network which has at least one convolutional layer. A typical CNN also includes other types of layers, such as pooling layers and dense layers.
Convolution is a matrix (Filter/ kernel) which is smaller than the input, used to transform the input into chunks
Maxpool is pooling process in which many values are converted into a single value by taking the maximum value from among them.
Stride is the number of pixels to slide the kernel (filter) across the image.
The 1st layer is convolutional layer with (3,3) filter/kernel, the 32 is convoluted outputs will be created. The 2nd layer is maxpool layer with (2,2) size and strides 2 (this will reduce the outputs which is 32). We repeat the same layers until we reach flatten layer which takes all layers and put them into one dimensional array. It have a dense layer with 128 neurons and relay activation, with 10 class-scores as a probability distribution.