Neural network interpretability is a young field with no standard terminology. In existing research, the concept of attribution has been referred to by various terms, such as “feature visualization”. However, recent studies appear to favor the use of terms like “attribution” and “saliency maps”. In this talk, we will discuss the enemy of feature visualization: a noisy image with nonsensical high-frequency patterns that the network responds strongly to. To deal with this high-frequency noise, we will focus on regularization techniques: frequency penalization, transformation robustness, and learning a prior. We will conclude by reducing the high frequencies in the gradient instead of the visualization itself.
Attachment | Size |
---|---|
Denoising Learned Features May 12.pdf | 20.01 MB |