Filed in
rails, ruby |
14 October, 2008
I just released a new version of slim-attributes. There are some small speed gains and some other minor changes from 0.4.1, but there are no big changes.
Read more about slim-attributes at the slim-attributes homepage, or read on below.
Introduction
Slim-attributes is a small patch to the ActiveRecord Mysql adaptor that stops rails from immediately making ruby strings from the column names and data from your database queries. Because you probably don’t need them all!
So ruby strings are lazily created on demand – it’s faster and uses less memory. And it drops directly in, requiring only the installation of a gem and adding 1 line to environment.rb.
Measuring with just ActiveRecord code – fetching stuff from the database – we see anything up to a 50% (or more) speed increase, but it really depends on your system and environment, and what you are doing with the results from the database. The more columns your tables have, the better the improvement will likely be. Measure your own system and send me the results!
Installation
Try:
gem install slim-attributes -- --with-mysql-config
or:
gem install slim-attributes
then add this to environment.rb:
require 'slim_attributes'
Description
Normally the mysql adaptor in Rails returns a hash of the data returned from the database, one hash per active record object returned by the query. The routine that generates these hashes is called all_hashes, and this is what we replace. The reason for overriding all_hashes is threefold:
- making a hash of each and every row returned from the database is slow
- rails makes frozen copies of each column name string (for the keys) which results in a great many strings which are not really needed
- we observe that it’s not often that all the fields of rows fetched from the database are actually used
So this is an alternative implementation of all_hashes that returns a ‘fake hash’ which contains a hash of the column names (the same hash of names is used for every row), and also contains the row data in an area memcpy’d directly from the mysql API (which is much faster than creating ruby strings).
The field contents are then instantiated into Ruby strings on demand – ruby strings are only made if you need them – when you ask for a particular attribute from the model object.
Note that if you always look at all the columns when you fetch data from the database then this won’t necessarily be faster that the unpatched mysql adapter. But it won’t be much slower either, and we do expect that most times not all the columns from a result set are accessed.
Future development
I speculate that further speed gains might be had through keeping the mysql result objects from mysql-ruby around, and not copying the data from them at all until it is needed. However, mysql-ruby limits the non freed result sets to just 20 before calling GC.start, so surgery inside mysql-ruby would be required to achieve this.
Filed in
rails, ruby |
6 October, 2008
I was reading what Paul Barry had to say about splitting models into smaller files. It resonated with me a little – some of our models are approaching 1000 lines.
But I felt the name ‘concerned_with’ did not fully / appropriately describe what is being done, and that there should be an easier way than having to specify every file to be required.
So I ended up modifying the code to be a little easier to use. If you place it in an initializer (i.e. in a file in your initializers directory), then you can specify in your model that you wish to require all the files from a subdirectory of the same name as the model.
So if you model is called Customer, then the model file is customer.rb. Now you can also have a subdirectory called customer that contains further files containing model code.
In the original model class file you should add require_class_subdirectory to it, like this:
class Customer
require_class_subdirectory
...
end
This will cause all the files in the subdirectory to be required.
In each file in the subdirectory you should open the model class like so:
class Customer
def something # you can cut/paste code in from the main model file
end
...
end
The filenames you use don’t matter – in the above case it could be ’something.rb’ for instance.
So, to recap, your main class file customer.rb has ‘require_class_subdirectory’ added to it. You create a folder called ‘customer’ in your models directory, and place some .rb files in there. In each of those files you re-open the class (’class Customer’) and place code there just as if you were writing into the main class file.
This allows you to separate code according to function within a model, and to keep file sizes manageable.
Here is the code to put in the initializer:
class << ActiveRecord::Base
def require_class_subdirectory
ActiveSupport::Dependencies.load_paths.select{|lp| lp =~ /app\/models/}.each do |path|
Dir["#{path}/#{name.underscore}/*.rb"].each do |filename|
require_dependency "#{name.underscore}/#{File.basename(filename)}"
end
end
end
end
Other approaches to this problem are possible. In particular it may be feasible to patch or hook the constant missing mechanism in rails to automatically load the files in the subdirectory, which would remove the need for the require_class_subdirectory line in your main model file.
Finally, not even everyone thinks this is a problem that needs to be solved. My colleague who uses Aptana says it has a good outline mode that means it’s easier to work with one large file for a model than lots of smaller ones. In Textmate I find the smaller files easier.
Filed in
rails, ruby |
1 October, 2008
If you need to disable the query cache in rails, it’s not particularly easy to do that.
There is some discussion about it (dated March 2008) here.
Although you can turn all caching off, and you can turn the query cache off explicitly in your code using uncached, there isn’t a way to turn just the query cache off globally at configuration time.
So, then, here’s the monkey patch you need (tested on Rails 2.1.1). Although this is not particularly optimal (in that some query caching related code is still called), it will work. You can put this at the bottom of your environment.rb somewhere, or even better put it in its own file in the initializers directory (e.g. query_cached_off.rb).
module ActiveRecord
module ConnectionAdapters
module QueryCache
private
def cache_sql(sql)
yield
end
end
end
end
Filed in
rails, ruby |
19 August, 2008
It so happens that in our application we need to find the dimensions of many images that do not reside on our rails application server. I have previously written about how to use the GD library to find image sizes, but this requires fetching the whole file and having it available in the local file system.
Fortunately we know all these files are jpegs, and if you check the jpeg documentation you can see that the dimensions of an image are normally contained near the start, which is ideal. We will just fetch enough of the image to get the dimensions, and no more.
How can we do that? The get method from Net::HTTP will do the trick – given a block it will yield each packet of data from the remote server as it arrives. And fetching over http makes this method very general – we can size any jpeg from anywhere using this code.
Parsing the returned data for the dimensions requires a simple state machine which will break from the http get block once the dimensions are located.
So I ended up with a small class to encapsulate the dimension information from the jpeg, JpegDimensions. Here’s how to use it:
jpg_info = JpegDimensions.new("http://somewhere.co.abc/an_image.jpg")
jpg_info.height # is the height
jpg_info.width # is the width
And the best thing is that only a small part of the image will be fetched, saving time and bandwith.
Here’s the code
require 'net/http'
class JpegDimensions
attr_reader :width, :height
def initialize(image_path)
@uri_split = URI.split(image_path)
find_jpeg_size
end
def find_jpeg_size
begin
http = Net::HTTP.new(@uri_split[2], @uri_split[3])
state = 0
http.get(@uri_split[5]) do |str| # this yields strings as each packet arrives
str.each_byte do |b|
state = case state
when 0
b == 0xFF ? 1 : 0
when 1
b >= 0xC0 && b <= 0xC3 ? 2 : 0
when 2
3
when 3
4
when 4
5
when 5
@height = b * 256
6
when 6
@height += b
7
when 7
@width = b * 256
8
when 8
@width += b
break
end
end
break if state == 8 # don't need to fetch any more of the image
end
rescue Exception=>e
# I do nothing here, but you can do something more useful with the exception if required
end
end
end
Filed in
rails, ruby |
29 June, 2008
Although Ruby 1.8.7 is not officially recommended yet for Ruby on Rails, it does in fact work fine with Rails 2.1. And version p22 contains all the latest security fixes.
For what it’s worth, I can confirm that not only are our apps working well under 1.8.7 (including ferret and over 30,000 lines of app code), but they are consuming / leaking considerably less memory. (They still do leak, but the rate is much reduced.)
So upgrade now.
Beware of one issue – we had to clean up our ERB code to not contain comments that are not specifically marked inside their own tags like <%# comment %> because of different handling in 1.8.7.
Filed in
IOCCC |
25 June, 2008
Thanks to Óscar Toledo for pointing out to me that my 2004 IOCCC winner is mentioned in this interesting book (in French). It’s mentioned on page 53 with some nice commentary.
Filed in
rails, ruby |
3 June, 2008
Rails 2.1.0 has many nice new features, but also it broke our app in some places. This is what I found:
1. setup_with_fixtures no longer does anything
Previously we would use setup_with_fixtures in our tests to do setup actions, but it’s not called any more. Just using plain old setup works for all our cases.
2. render :locals=>{} – hash keys must be symbols
Previously this would have worked
render :partial=>"something", :locals=>{"foo"=>"bar", "hoo"=>"haa"}
Now make sure all the locals are denoted by symbols
render :partial=>"something", :locals=>{:foo=>"bar", :hoo=>"haa"}
3. Relative paths when rendering actions
We were using double dots in our template paths when rendering actions from another controller, like this:
render :action=>"../controller/template"
This no longer works: instead try this:
render :template=>"controller/template"
4. Non null text columns in mysql will not default to empty string automatically if default is not set
If you happen to have a column of type text in your mysql database that is set to be NOT NULL and does not have a default set, then previously you would have got away without having to set a value explicitly for the column before saving an AR object.
This is not the case any more, ActiveRecord will set the value to nil/NULL and a database error will result when you try to save the object. So you need to set a value in your AR objects for these columns before saving. See here for details of the changes that cause this, and here for the associated bug report.
5. @action_name is no longer available in views
Rails is now careful not to export into the views some instance variables that were previously available. But just omit the @ – there is a new method action_name for this purpose. Or you could use @controller.action_name, or of course params[:action] to get the same information. See here for details of this issue.
6. Collection#size no longer works correctly on collections after they have been added to with build
Because of a bug in collection#size, it does not report the correct value after you have added additional items using collection#build. This bug has existed pre 2.1.0, but has only become an issue because this changeset causes build no longer automatically loading the collection. When size tries to calculate the size of the collection it correctly counts the collection members in the DB, but incorrectly counts the added-but-not-saved members as just 1, no matter how many more than one have been added.
I submitted a patch for this.
That’s all the issues we saw, it’s all working fine now. Go and upgrade now! (But don’t try to use Ruby 1.8.7, at least not until this is fixed)
Filed in
rails, ruby |
7 April, 2008
I released a somewhat improved version of slim_attributes as a gem at Rubyforge. It now has better compatibility, and can be just dropped in to provide instant performance improvements.
Here is the project homepage.
Filed in
rails, ruby |
2 April, 2008
I was looking at the implementation of the instantiation of ActiveRecord objects from the database, and the population of the @attributes attribute. There is a method called all_hashes which generates the hashes that are used for the @attributes, so I looked at this. Generating whole hashes for each database row is a little bit expensive in time and memory. Perhaps we can do better?
I note that Stefan Kaes did some work a year and a half ago on implementing all_hashes in C but it still relies on hashes being generated – I don’t think anyone has attempted what I do here.
So without further ado, I present slim_attributes, the non-hash implementation of all_hashes.
Here are the important but unscientific benchmarks (to give you an idea) – notice that the speed relative to using plain ActiveRecord depends on how many attributes are accessed in the model objects (because slim_attributes lazily instantiates them into strings). There were 2 models used; one had 44 and the other had 104 attributes.

View the plugin here – and install with:
script/plugin install http://pennysmalls.com/rails_plugins/slim_attributes
then follow the instructions to compile it given in the README below (yes, it should be made into a gem that compiles itself):
[UPDATE: there is now a better rubygem, see here and here.]
==========
SlimAttributes
This is a small patch to the ActiveRecord Mysql adaptor that stops rails from using the existing all_hashes / each_hash mechanism – which is what is called when you do a find.
It is faster, and uses less memory.
Measuring with just ActiveRecord code – fetching stuff from the database – we see anything from very little up to a 50% (or more) speed increase, but I suppose it really depends on your system and environment, and what you are doing with the results from the database. Measure your own system and send me the results!
Installation
You’re going to need the mysql headers for this to work.
cd vendor/plugins/slim_attributes
ruby extconf.rb --with-mysql-config
make
sudo make install
Description
The reason for overriding all_hashes is threefold:
* making a hash of each and every row returned from the database is slow
* ruby makes frozen copies of each column name string (for the keys) which results in a great many strings which are not really needed
* we observe that it’s not often that all the fields of rows fetched from the database are actually used
So this is an alternative implementation of all_hashes that returns a ‘fake hash’ which contains a hash of the column names (the same hash of names is used for every row), and also contains the row data in an area memcpy’d directly from the mysql API.
The field contents are then instantiated into Ruby strings on demand – ruby strings are only made if you need them. Note that if you always look at all the columns when you fetch data from the database then this won’t necessarily be faster that the unpatched mysql adapter. But it won’t be much slower either, and we do expect that most times not all the columns from a result set are accessed.
Note that the ‘fake hash’ quacks like a hash in many ways, but not all ways. So @attributes in an ActiveRecord object may not behave as you are expecting it to, and it particularly won’t work if you try to add a key to it that is not a column name in the result set.
@attributes["not a column name"] = "something"
=> RuntimeError: Key was not a column name from the result set
Hash has many methods that are not supported by the fake hash, but I found that the ones I have implemented have been sufficient for use in our Rails app. It should be fairly easy to implement most of the missing methods if needed, but I did not wish this patch to be larger than necessary.
===========
No warranty – this plugin should be considered experimental and likely needs some more work if you want it to be foolproof. However, that said, we are using it in our production environment with good results.
==========
Finally it’s interesting to note that Dan Chak wrote some code to actually return hashes from the database rather than ActiveRecord objects, when you just want the data without any fancy associations and so on. It’s much faster, proving that creating the ActiveRecord objects is fairly slow. I’ll take a look at combining this with slim_attributes – returning fake hashes should be faster still. (Combining his 50% improvement with my 50% should yield instant results :)
Update
I have now tested hash_extension with and against slim_attributes. My test fetched all records from two separate ActiveRecord models 100 times.
| Plain ActiveRecord |
38.3s |
| Using find_as_hashes |
35.1s |
| Using slim_attributes |
13.0s |
| Using both |
10.4s |
Clearly slim_attributes makes the biggest difference, but it should be noted that this is really the ideal case – where Model.find(:all) is done without actually accessing any of the attributes.
Filed in
rails, ruby |
23 March, 2008
I’ve spent a considerable amount of time with various tools attempting to figure out why it is that our thin processes (and mongrels before them) grow so egregiously. Typically they reach about 450Mb in a day, after which we restart them via monit.
What makes them grow? Well, we are fetching a lot of stuff from the DB all the time – meaning that thousands of small strings are being instantiated – so perhaps we can attribute some growth to heap fragmentation. But we tried changing to ptmalloc3 – it didn’t help; in fact I think in our case this is rather a red herring.
In an effort to get the problem under control, I wrote a plugin to reduce the number of strings that are made, changing the implementation of the mysql library so that all_hashes actually returns fake hashes that are implemented as arrays – to prevent all those column names being saved as frozen strings (for the hash keys) for every row that is fetched from the DB. But that didn’t help much, if at all, either.
But whilst I was playing with ruby with valgrind, I noticed some memory going missing. At first I thought it was probably me. But with further investigation I found a simple expression that makes ruby leak.
a = eval "b=0"
It’s actually the eval that leaks – the a = is not really needed, but it makes the leak show as a definite leak as opposed to a possible one in this simple one liner. If you want to leak a lot of memory, this is the way:
def grow
for i in 1..100
eval "b#{i}=1"
end
end
15000.times {grow}
You can fiddle with the numbers to make it grow as much as you like.
Valgrind reports the leak like this (this one made by running the loop 5000.times):
==18706== 217,988,864 bytes in 499,985 blocks are
definitely lost in loss record 6 of 6
==18706== at 0x4A05AF7: realloc (vg_replace_malloc.c:306)
==18706== by 0x432398: ruby_xrealloc (gc.c:151)
==18706== by 0x465E9C: local_append (parse.y:5649)
==18706== by 0x465F64: local_cnt (parse.y:5667)
==18706== by 0x4646AC: assignable (parse.y:4902)
==18706== by 0x458E80: ruby_yyparse (parse.y:844)
==18706== by 0x45E5F4: yycompile (parse.y:2606)
==18706== by 0x45E8F4: rb_compile_string (parse.y:2676)
==18706== by 0x41DDF3: compile (eval.c:6412)
==18706== by 0x41E289: eval (eval.c:6493)
==18706== by 0x41E817: rb_f_eval (eval.c:6611)
==18706== by 0x41C765: call_cfunc (eval.c:5700)
==18706== by 0x41BB04: rb_call0 (eval.c:5856)
==18706== by 0x41D291: rb_call (eval.c:6103)
==18706== by 0x415182: rb_eval (eval.c:3494)
The memory is allocated when ruby is expanding its local variable table in the parser. But what I don’t know yet is exactly where to add a call to free to release that memory. I’m hoping that someone over at ruby-core can help. Interestingly, it appears that Rubinius leaks too, which is surprising given that it is a completely new implementation.
I’m not the only one to have found a leak in Ruby lately – I wonder if the issues with god are related to this?
Fixing this leak may not completely cure our Rails memory growth problem (probably won’t), but at least it will help.
« Previous Page — Next Page »