Understanding the nature of hand-drawn sketches is challengingUnderstanding the nature of hand-drawn sketches is challengingdue to the wide variation in their creation. Federico et al. [10]demonstrated that recognizing complex structural patterns enhances bothsketch recognition and generation. Building on this foundation, we explorehow the extracted features can also be leveraged for hand-drawnsketch retrieval. In this work, we extend ViSketch-GPT, a multiscalecontext extraction model originally designed for classification andgeneration, to the task of retrieval. The model’s ability to capture intricatedetails at multiple scales allows it to learn highly discriminativerepresentations, making it well-suited for retrieval applications.Through extensive experiments on the QuickDraw and TU-Berlin datasets,we show that ViSketch-GPT surpasses state-of-the-art methods insketch retrieval, achieving substantial improvements across multiple evaluationmetrics. Our results show that the extracted feature representations,originally designed for classification and generation, are also highlyeffective for retrieval tasks. This highlights ViSketch-GPT as a versatileand high-powerful framework for various applications in computer visionand sketch analysis.

