Discussion

Direct affect of attribution fragments to the corresponding property tasks. The above experimental results demonstrated the reliability of the attribution fragments. On the one hand, from the perspective of mathematical statistics, we obtained a total of 365 attribution fragments for all the forty-two property tasks. For each fragment, the difference in the two probabilities of occurrence on the new positive and negative datasets demonstrates its relevance to the property task. We found that about 90% of the attribution fragments have positive relevance with the corresponding property tasks; on the other hand, six classic side effects property tasks have been verified from the mechanism. We searched for their corresponding exact substructures in the literature for several positive samples randomly selected in each task and compared them with the obtained attribution fragments. The comparison results (Fig. 3 and Fig. S2-S21 in the Supporting Information) showed that the attribution method can give accurate fragments with high confidence, which has an excellent guiding significance for experts. In general, the most crucial significance of the above attribution discovery is that specific fragments of a particular property task can be successfully generalized to improve many downstream tasks, such as molecule generation with specific functions in molecular design; as a priori knowledge to improve the accuracy of property prediction, thereby reducing the loss rate in the drug discovery process, shortening the cycle of synthesis and testing, and objectively designing molecules to reduce human-induced bias.
Meanwhile, we also found that the specific fragments reveal the internal relationships between property tasks, which the transferability between property tasks has verified. The relationship can be used to make a unified judgment on closely related tasks. In the above section, we discussed the close relationship of "NR-AR" and "Endocrine disorders" property task (orange ball with number "3" and yellow ball with the number "27" in Fig. 4B) that has been described in the literature [31\cite{bib33}]. In addition, the cytochrome P450 enzyme system (CYP450) in the liver is the main enzyme system for metabolizing drugs in the body, so the liver is closely related to the metabolic system in the body [34\cite{bib34}], corresponding to "Hepatobiliary disorders" and "Metabolism and nutrition disorders" property task (yellow ball with the number "15" and number "16"). Another case is that estrogen receptors play an important role in pregnancy and may lead to tumor diseases such as uterine cancer and breast cancer [35\cite{bib35}], and the "NR-ER" and "Pregnancy, puerperium and perinatal conditions" is also displayed in the relation map (orange ball with number "7" and yellow ball with the number "37"). Based on this relation map, we can provide a guide for exploring the relationship between drug molecules and properties more quickly. For example, suppose a new drug has an important effect on a certain property. In that case, we should pay more attention to these properties closely related to the former one because the drug molecule has a high probability of having a similar effect on the related ones. Meanwhile, the obtained property relation map can also guide achieving higher-performance model transfer, thereby promoting the development of "AI & property prediction".
Advantages of our fragment-based method compared to other methods. We first discuss the comparison between the atom-based attribution method and our fragment-based attribution method. As shown in Fig. 3C and Fig. S2-S21 (Supporting Information), we displayed the structures given in the pharmacological literature (that is, "Ground Truth" fragments that activate the related property), the attribution results obtained by atom-based attribution method, and results by fragment-based attribution method of six classical task cases. In general, our method can often obtain more accurate results compared with the atom-based method, which meets the basic fact that molecular properties are closely related to specific fragments in the molecules [15, 16, 17, 18, 19, 20\cite{bib16,bib17,bib18,bib19,bib20,bib21}]. There is no doubt that the prediction of molecular properties by simply considering the relationship between atoms can obtain the high-dimensional features that accurately represent the molecular map through the information interaction between atoms, thereby completing the prediction task. However, due to multiple information interactions of the model, the information of several atoms belonging to a region (i.e., fragment) is mixed into the surrounding atoms, so all the atoms, which cover the "Ground Truth" fragment in this processing mode, are almost impossible to be accurately located. As shown in Fig. S2-S21 (Supporting Information), the distribution of high-confidence atoms is wholly scattered, which cannot give a reasonable explanation for the "Ground Truth" fragment. Our method firstly divides the whole molecule into fragments by splitting the molecular tree and then uses the fragment as the smallest unit to build a new molecular map to make predictions. The processing method has two benefits: first, it considers the role of both the atoms and the bonds. The two parts then are combined into a whole fragment to explore the cause of being positive, which is in line with objective laws; second, as a guide to critical structures, when we take the fragment as the smallest unit of the model, we can directly locate the fragments that significantly affect the properties, instead of manually extracting the region around the most critical atoms or expecting the activation atoms to surround together. Experimental results (Fig. S2-S21 in the Supporting Information) demonstrated that critical substructures extracted by our method can match the "Ground Truth" with high accuracy. Therefore, as a computer-aided positioning method, the fragment-based method is more robust than the atom-based method for the guidance of experts.
Meanwhile, we analyze the difference between the MGA framework [14\cite{bib15}] and our method from the prediction performance and crucial substructure mining capability on the above-mentioned six side effect property tasks. The MGA framework aims to improve the performance of predicting toxicity tasks and also tries to use the attention mechanism to explore the most critical structural information. As shown in Fig. S22 (Supporting Information), our method achieves on par with MGA and even outperforms it on some tasks. More importantly, our method demonstrates stronger interpretability on most property tasks in the situation where the prediction performances are nearly equal. We present the comparative results between two methods of mining crucial substructures related to properties (Fig. S23-S28 in the Supporting Information). It is obvious that the localization of property-related substructures based on our method is more accurate than the results given by MGA. The specific method is to find the atom with the largest attention weight, and use it as the center point to delineate an area around it as the property-related substructure. However, this positioning strategy is not suitable for practical use for two reasons: on the one hand, the atom-based strategy of MGA lacks general credibility. MGA only focuses on the atom with the largest attention weight. By default, the substructure formed by the area around this atom is considered as the factor that affects the properties, which lacks scientific basis. Instead, our fragment-based strategy is based on many existing scientific discoveries [15, 16, 17, 18, 19, 20\cite{bib16,bib17,bib18,bib19,bib20,bib21}] and consider multifaceted effects in the substructure mining process. On the other hand, the positioning strategy of MGA cannot find all the crucial substructures. Actually, there may be two or more substructures related to the property for a certain molecule. However, MGA only chooses the most crucial atom as the center to delineate the substructure, resulting in the inability to provide multiple results. Our method takes into account the overall effect of the Top-\(k\) output fragments (\(k\) represents the number of selected attribution fragments), which has the ability to output all correct results with high confidence (Fig. S24 in the Supporting Information).
Fragment combination strategy for molecule decomposition dilemma. Junction Tree [36\cite{bib27}] was a pretty promising decomposition method for molecular design, and we also used the universally recognized method to implement our fragment-based method. However, only some simple rings and diatomic fragments can be extracted due to the limitation of static molecular tree decomposition. When the "Ground Truth" fragments are pretty complex structures, the method can only attribute to part of the whole structure. Meanwhile, due to the specificity of the "Ground Truth" fragments for each property task, it is difficult to make a straightforward way of splitting specific tasks, which is an inherent limitation.
Fortunately, there is an effective coping strategy to deal with this situation because fragments with different confidence can be attributed to a positive molecule. In general, fragments with higher confidence tend to have a more significant impact on the property, and fragments with higher ranks are then combined in this method. As shown in Fig. 3C, the top-0, top-1, and top-2 attribution fragments on the first molecule form the "Ground Truth" fragment. Other similar combinations, as shown in the second line of Fig. 3C, such as the top-0 and top-5 of attribution results which together form the "Ground Truth". Although the top-5 result is less reliable, the top-0 fragment closely related to Ground Truth is given greater confidence. Therefore, we consider this situation as a guiding attribution result, and the situation mentioned above with the required fragments entirely at the top confidences is called accurate attribution.
Top-ranked fragments selection to effectively avoid attribution bias. The obtained attribution fragments have a certain degree of bias, which means that there exist a small number of structures that are not related to the property task in all the attribution fragments. The number of attribution fragments for each positive test molecule is not the same, and each attribution fragment has a degree of confidence. In general, fragments with high confidence contribute more to the positive property. As shown in Tab. 1, all the results are positive numbers, which means that the overall attribution effect shows partiality for the positive label. Meanwhile, the metrics on almost all property tasks are the best (shown in bold) when \(k\) is 20, and the three exceptions also appear when \(k\) is 50. As fewer top fragments are selected, the existing bias fragments tend to be filtered out, and the attribution fragments thereby have a more significant impact on property tasks. Therefore, if there is a doubt about whether the attribution method positions the incorrect fragments or not, choosing the attribution fragments with higher confidence tends to bring better overall results.