Abstract. Fisher Vector (FV) and deep Convolutional Neural Network (CNN) are two popular approaches for extracting eective image representations. FV aggregates local information (e.g., SIFT) and have been state-of-the-art before the recent success of deep learning approaches. Recently, combination of FV and CNN has been investigated. However, only the aggregation of SIFT has been tested. In this work, we propose combining CNN and FV built upon binary local features, called BMM-FV. The results show that BMM-FV and CNN improve the latter retrieval performance with less computational eort with respect to the
use of the traditional FV which relies on non-binary features.