The laryngeal high-speed video (HSV) is a commonly used method for diagnosing laryngeal diseases. The quantitative analysis of each frame within each video can effectively assist in clinical diagnosis. Among various approaches, the segmentation of vocal folds and glottis areas shows great potential in analyzing vocal fold vibration patterns and diagnosing vocal fold disorder. In this study, we present an innovative approach to automatic vocal fold segmentation using only the glottis information. Our system designs prompt engineering techniques customized for the Segment Anything Model (SAM), leveraging glottis data to enhance segmentation accuracy. By combining vocal fold information extracted from U-Net masks-enhanced through brightness contrast adjustment and morphological closing-with a coarse bounding box of the larynx region generated by the YOLO-v5 model, we generate an effective bounding box prompt. Additionally, we introduce a point prompt derived from the local extrema in the first derivative of grayscale intensity along glottis-intersecting lines, providing auxiliary information on the vocal fold location. Experimental results show that our method that does not need labeled vocal fold data achieves comparable performance with the fully supervised method, reaching a Dice Coefficient of 0.91. To show the compactness of our work, we provide demo and open-source codes, available at https://github.com/yucongzh/Laryngoscopic-Image-Segmentation-Toolkit.