Deep learning models have been widely applied in seismological signal processing, including phase picking, signal denoising, polarity determination, and phase association. But when it comes to the tasks to process phase information across a station network, most current workflows rely on either traditional physics informed optimization (e.g., hypocenter and grid search for source locations and focal mechanisms), or machine leaning models pre-trained on fixed station arrays, which limits the generality of the model to specific areas. We propose that by making proper use of transformer encoders (self-attention layers), in which we treat every station’s phase information as a “token vector” and append with the station locations/metadata as “positional encodings”, we can train a model capable of processing phase information from any general set of station networks.To demonstrate this idea, we built a transformer-based focal-mechanism determination model, named FOCONET, which directly solves for the strike, dip, and rake angles of a double couple focal mechanism, based on the locations, first-motion polarities, S/P amplitude ratios, and SNRs from a set of stations. FOCONET is trained on 440,000 noise-added synthetics generated with random source locations, focal mechanisms, and station distributions. We use the Kagan angle - the minimum rotation angle between the predicted and the (known) ground truth mechanism - to evaluate the prediction quality. Tested on the noised synthetics with known focal mechanisms, FOCONET reaches average Kagan angles of 29°, 16° and 12° when using data from 12, 24 or 32 stations. This is well within typical focal mechanism errors (25°–30°), and 3º–10° lower than the predictions from traditional methods including: FPFIT and HASH (S/P included or excluded). We also tested our FOCONET on 200+ M>2.5 events of the 2016 Amatrice, Italy earthquake sequence with A-class HASH solutions for comparison, and achieved an average Kagan angle of 20º. Given its stronger performance on the synthetic test data, it is plausible that the FOCONET predictions may in fact be closer to the unknown ground truths. The success of FOCONET in focal mechanism determination from a network of stations suggests that similar joint-station seismological task would benefit from transformer-based models.