Filed in
rails, ruby |
29 June, 2008
Although Ruby 1.8.7 is not officially recommended yet for Ruby on Rails, it does in fact work fine with Rails 2.1. And version p22 contains all the latest security fixes.
For what it’s worth, I can confirm that not only are our apps working well under 1.8.7 (including ferret and over 30,000 lines of app code), but they are consuming / leaking considerably less memory. (They still do leak, but the rate is much reduced.)
So upgrade now.
Beware of one issue - we had to clean up our ERB code to not contain comments that are not specifically marked inside their own tags like <%# comment %> because of different handling in 1.8.7.
Filed in
IOCCC |
25 June, 2008
Thanks to Óscar Toledo for pointing out to me that my 2004 IOCCC winner is mentioned in this interesting book (in French). It’s mentioned on page 53 with some nice commentary.
Filed in
rails, ruby |
3 June, 2008
Rails 2.1.0 has many nice new features, but also it broke our app in some places. This is what I found:
1. setup_with_fixtures no longer does anything
Previously we would use setup_with_fixtures in our tests to do setup actions, but it’s not called any more. Just using plain old setup works for all our cases.
2. render :locals=>{} - hash keys must be symbols
Previously this would have worked
render :partial=>"something", :locals=>{"foo"=>"bar", "hoo"=>"haa"}
Now make sure all the locals are denoted by symbols
render :partial=>"something", :locals=>{:foo=>"bar", :hoo=>"haa"}
3. Relative paths when rendering actions
We were using double dots in our template paths when rendering actions from another controller, like this:
render :action=>"../controller/template"
This no longer works: instead try this:
render :template=>"controller/template"
4. Non null text columns in mysql will not default to empty string automatically if default is not set
If you happen to have a column of type text in your mysql database that is set to be NOT NULL and does not have a default set, then previously you would have got away without having to set a value explicitly for the column before saving an AR object.
This is not the case any more, ActiveRecord will set the value to nil/NULL and a database error will result when you try to save the object. So you need to set a value in your AR objects for these columns before saving. See here for details of the changes that cause this, and here for the associated bug report.
5. @action_name is no longer available in views
Rails is now careful not to export into the views some instance variables that were previously available. But just omit the @ - there is a new method action_name for this purpose. Or you could use @controller.action_name, or of course params[:action] to get the same information. See here for details of this issue.
6. Collection#size no longer works correctly on collections after they have been added to with build
Because of a bug in collection#size, it does not report the correct value after you have added additional items using collection#build. This bug has existed pre 2.1.0, but has only become an issue because this changeset causes build no longer automatically loading the collection. When size tries to calculate the size of the collection it correctly counts the collection members in the DB, but incorrectly counts the added-but-not-saved members as just 1, no matter how many more than one have been added.
I submitted a patch for this.
That’s all the issues we saw, it’s all working fine now. Go and upgrade now! (But don’t try to use Ruby 1.8.7, at least not until this is fixed)
Filed in
rails, ruby |
7 April, 2008
I released a somewhat improved version of slim_attributes as a gem at Rubyforge. It now has better compatibility, and can be just dropped in to provide instant performance improvements.
Here is the project homepage.
Filed in
rails, ruby |
2 April, 2008
I was looking at the implementation of the instantiation of ActiveRecord objects from the database, and the population of the @attributes attribute. There is a method called all_hashes which generates the hashes that are used for the @attributes, so I looked at this. Generating whole hashes for each database row is a little bit expensive in time and memory. Perhaps we can do better?
I note that Stefan Kaes did some work a year and a half ago on implementing all_hashes in C but it still relies on hashes being generated - I don’t think anyone has attempted what I do here.
So without further ado, I present slim_attributes, the non-hash implementation of all_hashes.
Here are the important but unscientific benchmarks (to give you an idea) - notice that the speed relative to using plain ActiveRecord depends on how many attributes are accessed in the model objects (because slim_attributes lazily instantiates them into strings). There were 2 models used; one had 44 and the other had 104 attributes.

View the plugin here - and install with:
script/plugin install http://pennysmalls.com/rails_plugins/slim_attributes
then follow the instructions to compile it given in the README below (yes, it should be made into a gem that compiles itself):
[UPDATE: there is now a better rubygem, see here and here.]
==========
SlimAttributes
This is a small patch to the ActiveRecord Mysql adaptor that stops rails from using the existing all_hashes / each_hash mechanism - which is what is called when you do a find.
It is faster, and uses less memory.
Measuring with just ActiveRecord code - fetching stuff from the database - we see anything from very little up to a 50% (or more) speed increase, but I suppose it really depends on your system and environment, and what you are doing with the results from the database. Measure your own system and send me the results!
Installation
You’re going to need the mysql headers for this to work.
cd vendor/plugins/slim_attributes
ruby extconf.rb --with-mysql-config
make
sudo make install
Description
The reason for overriding all_hashes is threefold:
* making a hash of each and every row returned from the database is slow
* ruby makes frozen copies of each column name string (for the keys) which results in a great many strings which are not really needed
* we observe that it’s not often that all the fields of rows fetched from the database are actually used
So this is an alternative implementation of all_hashes that returns a ‘fake hash’ which contains a hash of the column names (the same hash of names is used for every row), and also contains the row data in an area memcpy’d directly from the mysql API.
The field contents are then instantiated into Ruby strings on demand - ruby strings are only made if you need them. Note that if you always look at all the columns when you fetch data from the database then this won’t necessarily be faster that the unpatched mysql adapter. But it won’t be much slower either, and we do expect that most times not all the columns from a result set are accessed.
Note that the ‘fake hash’ quacks like a hash in many ways, but not all ways. So @attributes in an ActiveRecord object may not behave as you are expecting it to, and it particularly won’t work if you try to add a key to it that is not a column name in the result set.
@attributes["not a column name"] = “something”
=> RuntimeError: Key was not a column name from the result set
Hash has many methods that are not supported by the fake hash, but I found that the ones I have implemented have been sufficient for use in our Rails app. It should be fairly easy to implement most of the missing methods if needed, but I did not wish this patch to be larger than necessary.
===========
No warranty - this plugin should be considered experimental and likely needs some more work if you want it to be foolproof. However, that said, we are using it in our production environment with good results.
==========
Finally it’s interesting to note that Dan Chak wrote some code to actually return hashes from the database rather than ActiveRecord objects, when you just want the data without any fancy associations and so on. It’s much faster, proving that creating the ActiveRecord objects is fairly slow. I’ll take a look at combining this with slim_attributes - returning fake hashes should be faster still. (Combining his 50% improvement with my 50% should yield instant results :)
Update
I have now tested hash_extension with and against slim_attributes. My test fetched all records from two separate ActiveRecord models 100 times.
| Plain ActiveRecord |
38.3s |
| Using find_as_hashes |
35.1s |
| Using slim_attributes |
13.0s |
| Using both |
10.4s |
Clearly slim_attributes makes the biggest difference, but it should be noted that this is really the ideal case - where Model.find(:all) is done without actually accessing any of the attributes.
Filed in
rails, ruby |
23 March, 2008
I’ve spent a considerable amount of time with various tools attempting to figure out why it is that our thin processes (and mongrels before them) grow so egregiously. Typically they reach about 450Mb in a day, after which we restart them via monit.
What makes them grow? Well, we are fetching a lot of stuff from the DB all the time - meaning that thousands of small strings are being instantiated - so perhaps we can attribute some growth to heap fragmentation. But we tried changing to ptmalloc3 - it didn’t help; in fact I think in our case this is rather a red herring.
In an effort to get the problem under control, I wrote a plugin to reduce the number of strings that are made, changing the implementation of the mysql library so that all_hashes actually returns fake hashes that are implemented as arrays - to prevent all those column names being saved as frozen strings (for the hash keys) for every row that is fetched from the DB. But that didn’t help much, if at all, either.
But whilst I was playing with ruby with valgrind, I noticed some memory going missing. At first I thought it was probably me. But with further investigation I found a simple expression that makes ruby leak.
a = eval "b=0"
It’s actually the eval that leaks - the a = is not really needed, but it makes the leak show as a definite leak as opposed to a possible one in this simple one liner. If you want to leak a lot of memory, this is the way:
def grow
for i in 1..100
eval "b#{i}=1"
end
end
15000.times {grow}
You can fiddle with the numbers to make it grow as much as you like.
Valgrind reports the leak like this (this one made by running the loop 5000.times):
==18706== 217,988,864 bytes in 499,985 blocks are
definitely lost in loss record 6 of 6
==18706== at 0x4A05AF7: realloc (vg_replace_malloc.c:306)
==18706== by 0x432398: ruby_xrealloc (gc.c:151)
==18706== by 0x465E9C: local_append (parse.y:5649)
==18706== by 0x465F64: local_cnt (parse.y:5667)
==18706== by 0x4646AC: assignable (parse.y:4902)
==18706== by 0x458E80: ruby_yyparse (parse.y:844)
==18706== by 0x45E5F4: yycompile (parse.y:2606)
==18706== by 0x45E8F4: rb_compile_string (parse.y:2676)
==18706== by 0x41DDF3: compile (eval.c:6412)
==18706== by 0x41E289: eval (eval.c:6493)
==18706== by 0x41E817: rb_f_eval (eval.c:6611)
==18706== by 0x41C765: call_cfunc (eval.c:5700)
==18706== by 0x41BB04: rb_call0 (eval.c:5856)
==18706== by 0x41D291: rb_call (eval.c:6103)
==18706== by 0x415182: rb_eval (eval.c:3494)
The memory is allocated when ruby is expanding its local variable table in the parser. But what I don’t know yet is exactly where to add a call to free to release that memory. I’m hoping that someone over at ruby-core can help. Interestingly, it appears that Rubinius leaks too, which is surprising given that it is a completely new implementation.
I’m not the only one to have found a leak in Ruby lately - I wonder if the issues with god are related to this?
Fixing this leak may not completely cure our Rails memory growth problem (probably won’t), but at least it will help.
Filed in
rails, ruby |
13 March, 2008
Thin is getting some attention, so I thought we would give it a try.
Installation is just a matter of gem install thin.
Run it with something like
thin -e production -s 6
That’s 6 servers running on 0.0.0.0:3000 to 0.0.0.0:3005
Look at the examples if you need to make a monit recipe.
One thing we have is some code to make individual log files for each server instance. This is how it was with mongrel - we put this code in environment.rb inside the Rails::Initializer block:
if ENV['RAILS_ENV'] == ‘production’
if defined?(Mongrel::HttpServer)
ObjectSpace.each_object(Mongrel::HttpServer) {|i| @port = i.port}
@port = “unknown” unless @port && @port.to_i > 0
config.logger = Logger.new(File.expand_path(
RAILS_ROOT+”/log/#{ENV['RAILS_ENV']}.#{@port}.log”), 2, 25000000)
end
end
Somthing very similar will work with thin:
if ENV['RAILS_ENV'] == ‘production’
if defined?(Thin::Server)
ObjectSpace.each_object(Thin::Server) {|i| @port = i.backend.port}
@port = “unknown” unless @port && @port.to_i > 0
config.logger = Logger.new(File.expand_path(
RAILS_ROOT+”/log/#{ENV['RAILS_ENV']}.#{@port}.log”), 2, 25000000)
end
end
With this code in place you will get individual log files named production.3000.log, production.3001.log etc.
Finally we were seeing these errors:
terminate called after throwing an instance of 'std::runtime_error'
what(): unable to delete epoll event: Bad file descriptor
This is a known problem not with thin, but with EventMachine. Grab an updated gem like this:
gem install eventmachine --source http://code.macournoyer.com
Filed in
rails, ruby |
25 February, 2008
Sometimes you just don’t want to instantiate a bunch of ActiveRecord objects for getting some simple information from the database. You might save on memory and it’ll be faster.
I made a very simple plugin that enables you to get all the values for one column in a table with a simple class method named according to the column - so for instance you can just say People.first_names and you will get all of them in a hash - key is the id, and value is the first_name.
View it here - and install with:
script/plugin install http://pennysmalls.com/rails_plugins/column_as_array
Here’s the README:
ColumnAsArray
=============
This extension allows you to get all the values for a particular column from a table in one hash with a simple call using the column’s name in plural. The resulting hash keys are the ids, and the values are the column values.
The idea is that in cases were only one column of data is needed from a table, but all or nearly all items in the table need to be read - perhaps for listing purposes - then we don’t need to instantiate a whole bunch of ActiveRecord objects just for this. This can save a great deal of heap space if there are many columns in a table.
Example
=======
>> Widget.names
=> {7135=>"Big widget", 33=>"Old widget", 100=>"Fast widget"
>> Thing.updates_ats
=> {2865=>"2008-02-04 09:57:55", 2344=>"2008-01-31 10:24:31", 1823=>nil, 260=>nil}
We use singularize to get the column name from the method name, and this works so that column names that are already plural need to have an extra ’s’:
>> Client.format_choicess
=> {7135=>"csv,xml", 33=>"xml", 100=>"csv,xls", 2110=>"csv", 167=>"xml"}
Filed in
rails, ruby |
28 January, 2008
Sometimes ActiveRecord wraps your data up too much, and you don’t want or need all that convenient but processor-cycle consuming abstraction.
We had a case where we wanted to delete nearly 3 million records from our database. The conditions for deletion were a little complex, and writing a single SQL query for it was not practical.
So it’s nice to do it in Ruby, but instantiating 3m ActiveRecord objects is just not an option. Well, we didn’t need to - you can get data more directly from the database by talking directly to the (mysql in our case) connection.
Here’s how to get a list of ids from the database that were last updated last year.
result = ModelClass.connection.execute("SELECT id FROM table
WHERE updated_at < '2008-01-01'")
ids = []
result.each {|row| ids << row[0].to_i}
Then you can go through the ids and do what you like with them, perhaps compare them to other lists of ids from other tables (which is what we did).
Deleting the records is easy, just use ModelClass.delete(the_id) which works without instantiating an object. Use with care!
Filed in
rails, ruby |
7 January, 2008
Apart from well known stuff like start_form_tag being deprecated, these things broke for us with Rails 2.0.2:
1. Super is no longer called in tests. Use setup_with_fixtures instead, which will work in rails 2.0.2 and future versions where the bug is fixed.
2. The paths to partials used by ActionMailer have changed - now we must use “controllername/partialname” as opposed to “../controllername/partialname”
3. The handling of plusses in urls has changed. It probably changed in this changeset. The result is that passing a + in a url path no longer gets translated to a space in the handling done by rails. So now we have correct handling according to RFC2396.
Next Page »