We recently wanted to remove an Amazon S3 bucket where 1,000,000+ files were stored. This bucket also had versioning enabled which means the actual number of files was way bigger. To give us an idea, we dump the file paths to delete: the associated output text was 500MB big.

This task which seems simple at first proved to be quite complicated to handle, mostly because of Amazon own limitations that it would be nice to see addressed.

The first thing we had to do is obviously to disable versioning in the Amazon Web Services console:

Without this, not only the bucket would not be emptied but some delete markers would be added to the bucket which would make our life even harder.

The first assumption a user has when wanting to delete a S3 bucket is that clicking on Delete Bucket works. But Amazon does not allow to delete a non empty bucket.

Emptying the bucket through the Amazon Console does not work either when the bucket contains more than 10,000 files. And this is where the troubles begin: simply listing the files to delete ends up crashing the most popular S3 tools like s3cmd.

We found some really interesting scripts which are designed to delete both delete markers and all file versions on a S3 bucket. These scripts were indeed deleting the files on our S3 bucket but kept on running after four days in a row.
The main reason for this is that a query is made for each file deletion. We needed to perform some bulk delete instead.

Amazon CLI provides the capacity to delete up to 1000 files using a single HTTP request via the delete-objects command.

We engineered a ruby script which relies on this command to delete our files faster:

Pre-requisites:

To use this script you need to:

  • Export your Amazon credentials: export AWS_ACCESS_KEY_ID=... and export AWS_SECRET_ACCESS_KEY=...
  • Have the Amazon CLI installed.
  • Have a Ruby interpreter installed.
  • Download the above file and make it executable: chmod +x FILE

Usage:

Simply execute the script like any other programs with the bucket name you would like to empty as the argument.
E.g: Providing the Ruby script was called S3_bucket_cleaner.rb:

./S3_bucket_cleaner.rb BUCKET_NAME

Figures and conclusion:

The above script was able to remove all the files of our S3 bucket in less than 20 min which was good! It would be great if Amazon let people emptying / removing a S3 bucket regardless how full this one is. In the meantime, we are happy to share this script with you today in case you run into a similar scenario.

When projects grow they become hard to change. One aspect that is not often highlighted is dependency direction. I haven’t found much material on the topic, maybe the best ideas came from this talk by Sandi Metz “Less, the path to a better design”.

Some of the main points of Sandi’s talk

The purpose of design is to reduce the cost of change, anything else is not design.
Managing dependencies is at the heart of design.

According to the Stable Dependencies Principle

[The dependency] should be in the direction of the stability.
“Stable” roughly means “hard to change”

But then:

if you don’t know what types of changes are likely, it is best to wait and see what happens as the system evolves.

Sandi’s main point in her talk is that dependency direction is a choice, and:

[17:55] Uncertainty is not a license to guess, it’s a directive to decouple.

And the last pill of wisdom:

Don’t guess what changes will come, guess what will change.

Which, quickly explained here, is about applying the open / closed principle when the code you’re writing might change.

Every class used in your application can be ranked along a scale of how likely it is to undergo a change relative to all other classes.

  • Sandi Metz POODR, Chapter 3, pg 54

My suggestions to choosing navigability

The class diagram of the app can express navigability with the slim arrow (->). The navigability determines the dependency direction. When in doubt about a dependency direction, we can follow the class diagram.

  • If Post belong_to User, User owns Post, the navigability is Post -> User and you should consider favouring depending on User in Post, rather than the other way around;
  • Ask yourself: “Can Post exist without User?” (and vice-versa); User makes sense even without Post, but it’s unlikely that a Post can exist without a User, so the navigability should be Post -> User;
  • Avoid User <-> Post, if you do it you will be unable to use User without a Post and vice-versa;
  • Classes with many associations should not hold methods about them; Failing to do so will break the SRP;
  • Divide your application into modules, and apply strict dependency direction between modules; E.g.: Reports -> Users means strictly no methods like user.daily_report;
  • Add the dependency to the lower level object, so that the parent stays clean. This spreads the logic more evenly in classes who are usually more specific about the logic being added.

An example

    ######################
    # Less stable solution
    class Controller
      def action
        purchase.cost
      end
    end

    class Purchase
      has_many :line_items, inverse_of: :purchase

      # `cost` is an external dependency
      def cost
        line_items.sum(:cost)
      end
    end

    class LineItem
      belongs_to :purchase, inverse_of: :line_items
    end

    ######################
    # More stable solution
    class Controller
      def action
        LineItem.total_cost_of(purchase)
      end
    end

    class Purchase
      has_many :line_items, inverse_of: :purchase
    end

    class LineItem
      belongs_to :purchase, inverse_of: :line_items

      # Only dealing with internal dependencies
      def self.total_cost_of(purchase)
        where(purchase: purchase).sum(:cost)
      end
    end

Most projects will have two god classes: User and whatever the focus happens to be for that application. In a blog application, it will be User and Post. – Thoughtbot, How much should I refactor

Instead of having a class User that knows about a bunch of unrelated concepts like posts, notifications, friends etc, you can easily picture a small User class that other resources depend on.

Conclusion

Either you do or don’t agree with this idea, I hope we all agree that choosing the dependency direction is an important factor to improve an app maintainability.

Dependency direction is a choice, and whether you noticed it or not, you just made one

  • Sandi Metz

This post is overlooking dependency injection, interfaces stable dependency principle on purpose.

Further readings:

When handling state changes or object and logic condition changes, one of the most useful gems is aasm. However, after many years I find that it’s often under-utilised and even misused.

It’s really easy to grab the aasm gem and begin implementing the ActiveRecord methodology. This implementation is effective if the state and its dependencies are only models and providing you are not creating circular dependencies.

To get your head in the game here’s a quick example we came across that required us to implement something new:

  • We have a booking for a room when the user first begins the booking process we give the booking the state: draft.
  • As the user progresses, the date is confirmed and the state is changed to pre-reserved.
  • Once the payment has been received the state should be changed to booked.
  • Finally, when the booking has been finalised, the status will be fulfilled.

Of course, we might have more transitions between those cases, but for this scenario, four will suffice.

If we go with the normal Active Record style, our code might look something like this:

Even when the previous approach is quick and simpler. There are a few cons:

  • Booking model is prone to becoming a ‘god object’.
  • Single responsibility principle is being contrevend. After changing states we are adding business logic. Booking is responsible for persistency, changing states (we can consider this as persistency if we don’t want to be strict), sending emails, and releasing payments.
  • Testing will become harder, with more items to stub.
  • Additionally, the dependency direction is wrong, entities now be aware of other services.

A simpler approach would be using the state machine as a service, like this:

To achieve this implementation we only need to be aware of 2 sections. Everything else is your own code:

And we need to make our changes persistent:

Considering this approach we will get two huge benefits:

  • Booking model is only responsible for data persistency
  • This StateService is our actual State machine and the dependency direction is now as it should be. The service no longer requires an understanding of other services and the booking model won’t have awareness of emails or payments.