Posts

We recently wanted to remove an Amazon S3 bucket where 1,000,000+ files were stored. This bucket also had versioning enabled which means the actual number of files was way bigger. To give us an idea, we dump the file paths to delete: the associated output text was 500MB big.

This task which seems simple at first proved to be quite complicated to handle, mostly because of Amazon own limitations that it would be nice to see addressed.

The first thing we had to do is obviously to disable versioning in the Amazon Web Services console:

Without this, not only the bucket would not be emptied but some delete markers would be added to the bucket which would make our life even harder.

The first assumption a user has when wanting to delete a S3 bucket is that clicking on Delete Bucket works. But Amazon does not allow to delete a non empty bucket.

Emptying the bucket through the Amazon Console does not work either when the bucket contains more than 10,000 files. And this is where the troubles begin: simply listing the files to delete ends up crashing the most popular S3 tools like s3cmd.

We found some really interesting scripts which are designed to delete both delete markers and all file versions on a S3 bucket. These scripts were indeed deleting the files on our S3 bucket but kept on running after four days in a row.
The main reason for this is that a query is made for each file deletion. We needed to perform some bulk delete instead.

Amazon CLI provides the capacity to delete up to 1000 files using a single HTTP request via the delete-objects command.

We engineered a ruby script which relies on this command to delete our files faster:

Pre-requisites:

To use this script you need to:

  • Export your Amazon credentials: export AWS_ACCESS_KEY_ID=... and export AWS_SECRET_ACCESS_KEY=...
  • Have the Amazon CLI installed.
  • Have a Ruby interpreter installed.
  • Download the above file and make it executable: chmod +x FILE

Usage:

Simply execute the script like any other programs with the bucket name you would like to empty as the argument.
E.g: Providing the Ruby script was called S3_bucket_cleaner.rb:

./S3_bucket_cleaner.rb BUCKET_NAME

Figures and conclusion:

The above script was able to remove all the files of our S3 bucket in less than 20 min which was good! It would be great if Amazon let people emptying / removing a S3 bucket regardless how full this one is. In the meantime, we are happy to share this script with you today in case you run into a similar scenario.

When projects grow they become hard to change. One aspect that is not often highlighted is dependency direction. I haven’t found much material on the topic, maybe the best ideas came from this talk by Sandi Metz “Less, the path to a better design”.

Some of the main points of Sandi’s talk

The purpose of design is to reduce the cost of change, anything else is not design.
Managing dependencies is at the heart of design.

According to the Stable Dependencies Principle

[The dependency] should be in the direction of the stability.
“Stable” roughly means “hard to change”

But then:

if you don’t know what types of changes are likely, it is best to wait and see what happens as the system evolves.

Sandi’s main point in her talk is that dependency direction is a choice, and:

[17:55] Uncertainty is not a license to guess, it’s a directive to decouple.

And the last pill of wisdom:

Don’t guess what changes will come, guess what will change.

Which, quickly explained here, is about applying the open / closed principle when the code you’re writing might change.

Every class used in your application can be ranked along a scale of how likely it is to undergo a change relative to all other classes.

  • Sandi Metz POODR, Chapter 3, pg 54

My suggestions to choosing navigability

The class diagram of the app can express navigability with the slim arrow (->). The navigability determines the dependency direction. When in doubt about a dependency direction, we can follow the class diagram.

  • If Post belong_to User, User owns Post, the navigability is Post -> User and you should consider favouring depending on User in Post, rather than the other way around;
  • Ask yourself: “Can Post exist without User?” (and vice-versa); User makes sense even without Post, but it’s unlikely that a Post can exist without a User, so the navigability should be Post -> User;
  • Avoid User <-> Post, if you do it you will be unable to use User without a Post and vice-versa;
  • Classes with many associations should not hold methods about them; Failing to do so will break the SRP;
  • Divide your application into modules, and apply strict dependency direction between modules; E.g.: Reports -> Users means strictly no methods like user.daily_report;
  • Add the dependency to the lower level object, so that the parent stays clean. This spreads the logic more evenly in classes who are usually more specific about the logic being added.

An example

    ######################
    # Less stable solution
    class Controller
      def action
        purchase.cost
      end
    end

    class Purchase
      has_many :line_items, inverse_of: :purchase

      # `cost` is an external dependency
      def cost
        line_items.sum(:cost)
      end
    end

    class LineItem
      belongs_to :purchase, inverse_of: :line_items
    end

    ######################
    # More stable solution
    class Controller
      def action
        LineItem.total_cost_of(purchase)
      end
    end

    class Purchase
      has_many :line_items, inverse_of: :purchase
    end

    class LineItem
      belongs_to :purchase, inverse_of: :line_items

      # Only dealing with internal dependencies
      def self.total_cost_of(purchase)
        where(purchase: purchase).sum(:cost)
      end
    end

Most projects will have two god classes: User and whatever the focus happens to be for that application. In a blog application, it will be User and Post. – Thoughtbot, How much should I refactor

Instead of having a class User that knows about a bunch of unrelated concepts like posts, notifications, friends etc, you can easily picture a small User class that other resources depend on.

Conclusion

Either you do or don’t agree with this idea, I hope we all agree that choosing the dependency direction is an important factor to improve an app maintainability.

Dependency direction is a choice, and whether you noticed it or not, you just made one

  • Sandi Metz

This post is overlooking dependency injection, interfaces stable dependency principle on purpose.

Further readings:

Portfolio Items