Why not use a single RNN
-
There’s is an intuitive interpretation for this: “If you can discriminate a person from his eyes, why should I look at his nose?”
-
So when we cut pictures into different patches and train with different CNNs, actually we are forcing the network to learn every single piece of information.
-
And in practice, we are using this trick.