Recently we experienced a production project that We have one client that reported their production Magento site keeps having problems with random downtime that lasts for several weeks. Our client runs Magento 2.4.0 open source edition.
In order to fix the issue, initially we checked everything and nothing seems strange with the server, nor with the Magento itself. There was only one small error complaining about PHP memory limit so we decided to increase PHP memory limit (slowly but we did it several times) up to 2G. This 2G is considerably too big for a Magento application, given how relatively small the site is.
At that point we already know that it is the Magento_Csp module that has been complaining about the PHP memory limit but we didn’t know that the fix needs something more than just increasing the PHP memory limit. Plus, the timestamps of the error log isn’t always 100% matched with the actual downtimes.
Our attempt to fix the problem isn’t over yet. After a few days the site keeps crashing randomly with similar errors. Same as previous occurrence, when downtime happens, Zabbix records that CPU and memory consumption goes high thus triggering out-of-memory (OOM) error and Linux system will start killing processes randomly. In this case, PHP-FPM is always in the kill list.
The strangest thing that we realized during the most recent downtime is that the disk usage was growing very quickly. In our case, about 15G of disk space was eaten in less than 2 hours, causing high IO and server collapsed.
We knew that the disk was being eaten by something, but we didn’t know what’s doing it. This then leads to a theory: something in Magento is writing something into the disk. This theory was supported by the PHP-FPM error log related to the Magento_Csp module which mentioned something about cache. Also we couldn’t think about anything else but the Magento cache.
A quick research gave us this: https://github.com/magento/magento2/issues/29964 and it turned out to be the exact same thing that we experienced. However, most people only complain about the OOM issue but there’s one guy who mentioned that the disk also got eaten.
If you are too lazy to read that whole github issue for the solution, here’s the summary of the solutions offered by the people there. Best if you can upgrade to the newer Magento version, at least 2.4.2. I know upgrading may not be easy for most merchants because you will need to make sure that all integrations are still functioning, so…. Fortunately if you have the Magento_Csp module sitting there from the default installation (and you’ve never done anything to it), you should be safe to disable the Magento_Csp completely.
bin/magento module:disable Magento_Csp
bin/magento cache:flush
After disabling the module, we can now have a good sleep at night. Server is super quiet:
This solution works as a quick fix for that CSP problem, but probably scheduling a time for a Magento upgrade isn’t a bad idea either.