It takes a lot of energy to train the models in the first place, but very little once you have them. I run mixture of agents on my laptop, and it outperforms anything openai has released on pretty much every benchmark, maybe even every benchmark. I run it quite a bit and have noticed no change in my electricity bill. I imagine inference on gpt4 must almost be very efficient, if not, they should just switch to piping people open sourced llms run through MoA.