I came across an interesting bug today… a Ruby on Rails application was consistently failing on a test machine, displaying an “uninitialized constant” error message. It was failing because it had not loaded an .rb file which contained the definition of the class being used. That seems straightforward enough, except that:
- The file that was not being loaded was in a directory among other .rb files, all of which did get loaded, so why only this one file?
- The application worked perfectly on the developer’s machine.
- The application was also working on another test server.
The file was there in all the installations, no access permissions issues, it was complete, readable and identical everywhere. So no issues with the file itself. Just that the runtime apparently sometimes was able to load it, sometimes not, depending on which machine it ran in.
The only thing different about this file is that its name was capitalized, whereas all other rb files there had all lowercase names. Then I found out the developer was running the application on a Mac and my test machine is a Linux box. As soon as I heard that, I remembered that the Mac HFS filesystem is a bit wacky about case. While it preserves case, it doesn’t observe it. The following surprising sequence actually works on a Mac:
% echo Hello > Hello % cat hello Hello
Problem solved, I thought! To confirm, I strace’d the application and indeed, it was opening “file.rb” even though the disk file was called “File.rb”. So it fails, except on a Mac where that works. That explains everything!
Except… the other test server that had been set up, the one where the application also works, is also a Linux box! How can it possibly be working there?
Using strace showed that there it was calling open() on “File.rb”, so it worked. But why?
After more closely reviewing the strace output, I noticed that on the working Linux box, the process was open()ing the directory, reading the list of files and then opening and reading each one in turn. So because it got the actual name of the file (“File.rb”), it was able to open it. On the Linux box where the application did not work, it never opened the directory entry, it went straight to attempting to open “file.rb” which of course failed. Ok that explains why the discrepancy in the file name being opened, but why is one reading the directory and the other one is not? Both machines have identical installations of all relevant software!
I then noticed that on the Linux box where it worked, the application was being run with the “production” environment flag of rails (-e). On the Linux box where it did not work, it wasn’t.
After some more digging, I discovered that the production.rb sets:
config.cache_classes = true
This cache_classes is defined as follows:
config.cache_classes controls whether or not application classes and modules should be reloaded on each request. Defaults to true in development mode, and false in test and production modes.
Aha! So looks like the way it is implemented is that if cache_classes is true, it scans the directories at startup and loads (and caches) all the .rb files it finds, which is what the strace output showed. Thus, it finds and loads “File.rb”. If cache_classes is false, it never scans the directory, simply attempts to (re)load “file.rb” each time, always failing.
With that, mystery truly solved! If you run into this, now you know…