To both questions above, just simple averaging of the logits (classification) or raw outputs (regressions) usually works well. If I had to guess why people don't use this approach often in kaggle competitions is the relative difficulty of training an ensemble of NNs. Also, NNs are a bit more sensitive to the type of features used and their distribution relative to decision trees (DTs).
Ensemble models work well because they reduce both bias & variance errors. Like DTs, NNs have low bias errors and high variance errors when used individually. The variance error drops as you use more learners (DTs/NNs) in the ensemble. Also, the more diverse the learners, the lower the overall error.
Simple ways to promote the diversity of the NNs in the ensemble is to start their weights from different random seeds and train each one of them on a random sample from the overall training set (say 70-80% w/o replacement).