In today's scientific landscape, Directed Acyclic Graphs (DAGs) are pivotal for representing task dependencies in data-intensive applications. Traditionally, two dominant bottomup DAG scheduling approaches exist: one overlooks communication contention and the other fails to exploit parallelization for improving latency. This study distinguishes itself by advocating a top-down approach prioritizing latency or cost optimization in multi-tier environments to fulfill QoS and SLA requirements. Our strategy effectively mitigates bandwidth contention and facilitates parallel executions, leading to substantial completion time reductions. Our findings suggest that myopic knowledgebased scheduling, emphasizing latency or cost minimization, can yield benefits comparable to its look-ahead counterparts. Through latency-efficient and cost-efficient topological sorting, our wDAGSplit strategy introduces a two-stage partitioning and scheduling approach. Its simplicity and adaptability extend its usability to DAGs of any scale. Evaluated on over 100,000 real-world DAG applications, wDAGSplit demonstrates latency enhancements of up to 80x compared to Edge-only scenarios, 15x to Near-Edge-only, and 6x to Cloud-only. In terms of cost, our approach achieves enhancements of up to 60x compared to Edgeonly scenarios, 250x to NE-only, and 70x to Cloud-only. Moreover, for DAGs with 50 tasks, we achieve 5x reduced latency compared to previous approaches, along with a complexity reduction of up to 24 times.