The boxes might be identical, but without the same data (database, web requests, STP traffic, load levels, and so on) it can be very difficult to find and fix a problem outside of production. Load in particular can be difficult to replicate.
Example: I've seen "identical" environments produce different results when the web request traffic was off by a few milliseconds, where different is defined as "works fine in staging, completely melts down in production". It was discovered that the replay tool was batching its requests in 5ms intervals. Once the interval was improved to 1ms the staging environment began to exhibit the same behavior that was observed in production.
That said there are not many environments that actually involve such tight tolerances.