None of the so called "shit" that you mentioned needs to be any more difficult than the 3d things you mentioned. You even seem to say that creating a believable scene is harder than creating say a scalable ragdoll physics engine, but I entirely disagree-- content is incredibly easy to come by, load in, and modify these days entirely for free. The longest amount of time would be spent reimplementing complex systems, say for bone animations or efficient texture atlasing (if performance requirements demand it), rather than trying to find believable content or writing a camera system. And let's please not say anything like OOP or ECS ;)