After an Artifactory upgrade at work, users began complaining that bundler (the ruby package management tool) was timing out trying to fetch dependencies. Bundler wold try 4 times to download the dependencies for a gem, and then give up. After that it would just download the gemspec for every gem, which would take many, many times longer than retrieving the index.
I should note of course that we are using Artifactory as our gem repo, and the bundler Gemfile was pointing at our Artifactory server. Both the old and new server used mysql for the backend.
Oddly, this had never been an issue with the previous server. I did a bunch of troubleshooting with Jfrog to try and figure out where the timeouts were happening. Eventually, they had me add this to my etc/logback.xml:
<logger name="org.artifactory.storage.db.util.JdbcHelper"> <level value="DEBUG"/> </logger>
which showed that on the old server, gem dependency queries were taking .25ms. The same queries on the new server were taking 5ms, or 40x longer. The new server is in AWS EC2 and the new mysql server is in Amazon RDS (AWS hosted mysql). The old serve was running mysql on the local machine.
I should note that I also made sure that the RDS instance and the EC2 instance being undersized were not the problem. I spent plenty of time looking at system metrics to verify that.
So, the slow queries seemed like a problem. I then discovered that my RDS endpoint and my EC2 instance were in different availability zones in the same AWs region (us-west-2). I moved both so they were in the same AZ. As a side note, I had to disable multi-az on the RDS instance because multi-az doesn't let you pick which AZ the endpoint is in.
After I moved everything to the same AZ, the dependency queries went down from 5ms to 1ms. That's better. However, while it reduced the bundler timeouts, it didn't eliminate them.
I then realized that my RDS endpoint and EC2 instance were in different subnets. So, I moved them both to the same subnet in the same AZ. Now my queries went from 1ms to .75ms. That wasn't enough, though - I still got timeouts in bundler.
You would think there would be a timeout setting in bundler, right? That's what I thought, so I did a bunch of googling. No luck. I thought I was stuck with this problem, and I was contemplating the next step of running mysql on the EC2 instance and abandoning RDS. Ugh.
Then, one of the ruby developers at my work read the bundler code and discovered that since v1.10, bundler does have a query timeout! It's just not mentioned anywhere except in the source. I tried it, and it worked. The fix is to run this:
$ bundle config timeout 30
which will set the bundler timeout to 30 seconds. Since the worst queries I saw were taking 15 seconds, that seemed sufficient.
I am happy to report that with the longer timeout setting, everything works fine. Yes, there is certainly still some sort of underlying problem. Artifactory should not take 15 seconds to respond to a dependency query. My feeling is the problem is that a rats nest of dependencies in our ruby gems is just too much for Artifactory to deal with. I think in general Artifactory is a decent product and I don't blame them for this. I do wish they would add some caching in their gem handling, though.