Zhenshuo Chen

and 2 more

The housing crisis in Ireland has rapidly grown in recent years. To make a more significant profit, many landlords are no longer renting out their houses under long-term tenancies but under short-term tenancies. The shift from long-term to short-term rentals has harmed the supply of private housing rentals. Regulating rentals in Rent Pressure Zones with the highest and rising rents is becoming a tricky issue. In this paper, we develop a breach identifier to check short-term rentals located in Rent Pressure Zones with potential breaches only using publicly available data from Airbnb (an online marketplace focused on short-term home-stays). First, we use a Residual Neural Network to filter out outdoor landscape photos that negatively impact identifying whether an owner has multiple rentals in a Rent Pressure Zone. Second, a Siamese Neural Network is used to compare the similarity of indoor photos to determine if multiple rental posts correspond to the same residence. Next, we use the Haversine algorithm to locate short-term rentals within a circle centered on the coordinate of a permit. Short-term rentals with a permit will not be restricted. Finally, we improve the occupancy estimation model combined with sentiment analysis, which may provide higher accuracy. Because Airbnb does not disclose accurate house coordinates and occupancy data, it is impossible to verify the accuracy of our breach identifier. The accuracy of the occupancy estimator cannot be verified either. It only provides an estimate within a reasonable range. Users should be skeptical of short-term rentals that are flagged as possible breaches.

Zhenshuo Chen

and 2 more

Network and system security are incredibly critical issues now. Due to the rapid proliferation of malware, traditional analysis methods struggle with enormous samples. In this paper, we propose four easy-to-extract and small-scale features, including sizes and permissions of Windows PE sections, content complexity, and import libraries, to classify malware families, and use automatic machine learning to search for the best model and hyper-parameters for each feature and their combinations. Compared with detailed behavior-related features like API sequences, proposed features provide macroscopic information about malware. The analysis is based on static disassembly scripts and hexadecimal machine code. Unlike dynamic behavior analysis, static analysis is resource-efficient and offers complete code coverage, but is vulnerable to code obfuscation and encryption. The results demonstrate that features which work well in dynamic analysis are not necessarily effective when applied to static analysis. For instance, API 4-grams only achieve 57.96% accuracy and involve a relatively high dimensional feature set (5000 dimensions). In contrast, the novel proposed features together with a classical machine learning algorithm (Random Forest) presents very good accuracy at 99.40% and the feature vector is of much smaller dimension (40 dimensions). We demonstrate the effectiveness of this approach through integration in IDA Pro, which also facilitates the collection of new training samples and subsequent model retraining.