Notes on “Deep learning face representation by joint identification-verification (2014)”

Why not use a single RNN

  • There’s is an intuitive interpretation for this: “If you can discriminate a person from his eyes, why should I look at his nose?”

  • So when we cut pictures into different patches and train with different CNNs, actually we are forcing the network to learn every single piece of information.

  • And in practice, we are using this trick.

