Si Wang

and 3 more

Over the last decade, the cost of designing and training a deep neural network (DNN) model has escalated to the point that lures DNN model intellectual property (IP) theft and misappropriation. Recent advancements in model extraction and side-channel attacks have further reduced the cost of illegally reverse engineering a deployed DNN model. Existing countermeasures either require white-box accessibility to the suspect model for ownership verification or lack buyer traceability in the blackbox setting. This paper presents a novel black-box tripartite verifiable DNN IP protection scheme that not only allows the model owner (i.e., the first party) or any trusted third party (TTP) to verify the model ownership and trace the buyers of the distributed models, but also enables buyers (i.e., the second party) to verify the authenticity of a received model IP to prevent malicious model substitution during the delivery or prevent a fake model from exploiting good brand name. This is made possible by combining source-specific backdoor-based watermarking with adversarial fingerprinting. The proposed watermarking uses a dual-key trigger to bind an image source class with a composite backdoor feature. Such composite backdoor feature is created by applying a different data augmentation (DA) method on each color channel of an image. The large number of different accuracy-preserving model instances for buyer traceability can be met by embedding multiple backdoors with different combinations of dirty labels through fine-tuning a common clean host model. The trigger is embedded by content-adaptive interpolation to enhance the stealth and triggering efficiency. Adversarial examples generated from the host model are used for all its generated watermarked model instances as a common fingerprint for the proof of authenticity by transferability. Six series of 30 watermarked instances are evaluated. Each series is generated from either ResNet-18 or GoogleNet host model trained on CIFAR-10, GTSRB or ImageNet-10. They are evaluated on worstcase watermarked model accuracy, piracy detection accuracy, traitor traceability and robustness against nine types of attacks to demonstrate the superiority of the proposed scheme over stateof-the-art DNN IP protection schemes.