Designing Brain Computer Interfaces (BCIs), for helping patients, needs appropriate datasets which are relevant for the language of the patients. There exists a significant shortage of datasets for Indian languages that can be used for BCI research. Malayalam is a prominent south Indian language spoken by more than 34 million people, yet, there exist no BCI datasets for research. We address this issue by creating a dataset for selected Malayalam words by collecting Electro Encephalograph (EEG) signal samples. Our dataset was created by generating EEG samples using the OpenBCI Cyton device when the commonly used Malayalam words were spoken by a volunteer. The created dataset consists of three major types of data: (i) EEG data for spoken Malayalam words, (ii) EEG data for the spoken English words which were closest to the English translation of the corresponding Malayalam words, and (iii) EEG data for sub-vocal (silent) pronunciation of the Malayalam words. We created the dataset for 26 words where each of these words had been recorded for the above mentioned three types. For each word, 10 EEG samples over 8 channels were recorded. This dataset is useful for developing BCI solutions for patients suffering from neuro-degenerative diseases by developing Machine Learning (ML) classifiers for translating EEG-signals to Malayalam words, vocal or sub-vocal, especially considering the scarcity of datasets available in Indian languages.