Prevent Gutenberg from breaking words in table blocks

For some reason, the Gutenberg editor applies the following CSS rule to table block cells:

.wp-block-table td, .wp-block-table th {
    word-break: break-all;
}

This rule will make your tables look something like this:

As you can see, a cell’s text content will be wrapped mid-word which might not be the behavior you’d expect from your table cells (at least it wasn’t was I was expecting).

To fix the issue, add the following rule either to your theme’s CSS file or use the “Add custom CSS” functionality offered by the WordPress site customizer:

.wp-block-table td, .wp-block-table th {
    word-break: normal;
}

You could also define a utility CSS class to apply the rule on a per-block basis:

.word-break-normal td, .word-break-normal th {
    word-break: normal;
}

He or She? Or: The basics of (binary) classifier evaluation

Of all the amazing scientific discoveries of the 20th century, the most astonishing has to be that “men are from Mars, [and] women are from Venus”. (Not that the differences weren’t obvious pre-20th century, but it’s always good to have something in writing.)

If indeed the genders do originate from different planets, then surely the ways in which they use language must be very different as well. In fact, the differences should be so gleamingly obvious that even a computer should be able to tell them, right?

So we’re building an author gender classifier…

In natural language processing, there is a task called author profiling. One of its subtasks, author gender identification, deals with detecting the gender of a text’s author. Please note that for the sake of didactic simplicity (and not an old-fashioned view of gender identity), I’ll confine myself to the two traditional genders.

In supervised machine learning, a classifier is a function that takes some object or element and assigns it to a set of pre-defined classes. As it turns out, the task of author gender identification is a nice example of a classification problem. More specifically, we are dealing with binary classification since we assume only two possible classes.

By default, these classes are labelled as positive (aka “yes”) and negative (aka “no”). Needless to say, it is perfectly fine to adapt the naming of the two possible outcomes. In our case, female and male (aka “not female”) seem like plausible choices.

It all starts with the data

We are about to train supervised classifiers and so we first need to obtain a good amount of training data. Understandably, I wasn’t too excited about manually collecting thousands of training examples. Therefore, I went ahead and wrote a Scrapy spider to automatically collect articles from nytimes.com on a per-author basis.

If you are interested in the spider code, you’re welcome to check it out. Our industrious spider managed to collect the titles and summaries of more than 210000 articles as well as their authors’ genders. All in all, there were about 2.5 times more male articles than female ones. This is a great real-world example of a problem known as class imbalance or data imbalance.

Meet the stars of the show

With the data kindly collected by the NewYorkTimesSpider, we’ll train two supervised classifiers and compare their performance. To this purpose, we’ll make use of scikit-learn, one of the most popular Python frameworks for machine learning. We’ll be training two different classification models: Naive Bayes (NB) and Gradient Boosting (GB).

NB is a classic and historically quite successful model in all kinds of real-world domains including text analysis & classification. The GB model is a more recent development that has achieved considerable success on problems posed on kaggle.com.

This article will not delve into the algorithmic details of these two models. Rather, we’ll assume a black box view and focus on their evaluation. The same goes for the topic of feature extraction. For instructional purposes, we’ll go with a very basic feature set based on the tried-and-tested bag-of-words representation. scikit-learn comes with an efficient implementation which spares us having to reinvent the wheel.

Evaluation metrics 101

Unfortunately, no classifier is perfect and so each decision (positive vs. negative or female vs. male) can either be true (correct) or false (incorrect). This leaves us with a total of 2*2 = 4 boxes we can put each classifier decision (aka prediction) into:

Predicted \ ActualPositiveNegative
PositiveTrue positive (TP)False positive (FP)
NegativeFalse negative (FN)True negative (TN)

As presented in the table, true positives are positive examples correctly classified as positive. On the other hand, false negatives, are positive examples misclassified as negative. The same relationship goes for true negatives and false positives. In the area of machine learning, a 2-by-2 table structure such as the above is commonly referred to as a confusion matrix.

A confusion matrix can serve as the basis for calculating a number of metrics. A metric is a method of reducing the confusion matrix to a single (scalar) value. This reduction is very important because it gives us one value to focus on when improving our classifiers. If we didn’t have this one value, we could endlessly argue back and forth about whether this or that confusion matrix represents a better result.

The below table summarizes some of the the most fundamental & widely used metrics for classifier evaluation. Note that although all of them result in values between 0 and 1, I will describe them in terms of percentages for the sake of intuition. Also, some metrics have different names in different fields and contexts. I will highlight the names most commonly used in machine learning in bold.

MetricFormulaDescription / Intuition
Accuracy\frac{TP + TN}{TP + TN + FP + FN}What percentage of elements were predicted correctly? How good is the classifier at finding both positive & negative elements?
True positive rate (aka recall, sensitivity)\frac{TP}{TP + FN}What percentage of positive elements were predicted correctly? How good is the classifier at finding positive elements?
False positive rate\frac{FP}{TP + FN}What percentage of positive elements were predicted incorrectly? How bad is the classifier at finding positive elements?
True negative rate (aka specificity)\frac{TN}{TN + FP}What percentage of negative elements were predicted correctly? How good is the classifier at finding negative elements?
False negative rate\frac{FN}{TN + FP}What percentage of negative elements were predicted incorrectly? How bad is the classifier at finding negative elements?
Precision (aka positive predictive value)\frac{TP}{TP + FP}What percentage of elements predicted as positive were actually positive?
F1 score\frac{2*Precision*Recall}{Precision + Recall}
Weighted average of the precision and recall with precision and recall being weighted equally. How good is the classifier in terms of both precision & recall?

Now that we have basic understanding of the fundametal metrics for evaluating classifiers, it’s time to put the theory into practice (i.e. write some code). Luckily for us, scikit-learn comes with many pre-implemented metrics. In addition to the metrics, scikit-learn also provides us with a number of pre-implemented cross-validation schemes.

One of the primary motivations for cross-validating your classifiers is to reduce the variance between multiple runs of the same evaluation setup. This holds especially true for situations where only a limited amount of data is available in the first place. In such cases, splitting your data into multiple datasets (a training and a test dataset) will reduce the number of training samples even further.

Oftentimes, this reduction will lead to significant performance differences between two or more evaluation runs caused by particular random choices of training and test sets. After partioning the dataset and running the evaluation multiple times, we can average the results and thereby arrive at a more reliable overall evaluation result.

The importance of a baseline

The evaluation code is available as a Jupyter notebook. Besides a data loading function and the two classifiers to be tested, the notebook also contains the definition of a baseline for our evaluation (HeOrSheBaselineClassifier). A baseline is a simple classifier that gives us a basis for comparing our actual models to.

In many cases, choosing a baseline is a quite straightforward process. For example, in our domain of newspaper articles, about 71.5% of articles were written by men. Therefore, it makes sense to define a baseline classifier that unconditionally predicts an article to have a male author. If a classifier can’t deliver a better performance than this super simple baseline classifier, then obviously it can’t be any good.

To summarize, a baseline provides us with a performance minimum that we should be able to exceed in any case. scikit-learn accelerates the development of baseline classifiers by providing the DummyClassifier class that the HeOrSheBaselineClassifier inherits from.

Finally, results

If we take a look at the Jupyter evaluation notebook, we can see that both classifiers significantly outperform our baseline in every metric. Though overall the GB classifier offers better performance, the NB model features a better precision score.

Obviously, the classifiers presented in the course of this post are only the tip of the iceberg. But even though we haven’t performed any optimization, the results are already significantly better than the expected minimum performance (i.e. the baseline). What this means is that there is a statistical difference in how often each gender uses specific words since word counts were the only features employed by the presented models.

The results of the above evaluation might serve as the basis for another post on where to go from here. Further resources on how to improve upon the existing performance can be found in the academic literature (e.g. Author gender identification from text).

Avoiding query code duplication in Django with custom model managers

For a second, please imagine we are building the next big social network. We decide that our “revolutionary” new app shall allow its users to create profiles. Besides a mandatory user name and avatar, we consider a profile complete only if the user also supplies an email or physical address (or both). In other words, a profile is not complete as long as we can’t contact the user in some way (via their email or physical address).

Based on the above requirements, our lead developer comes up with the following models to represent our use case:

from django.db import models
 
 
class User(models.Model):
    name = models.CharField(max_length=50)
    avatar = models.ImageField()
    email_address = models.EmailField(blank=True, null=True)
    address = models.ForeignKey('Address', blank=True, null=True)
 
 
class Address(models.Model):
    street = models.CharField(max_length=50)
    city = models.CharField(max_length=50)
    country = models.CharField(max_length=50)

Next, suppose that for some reason we would like to distinguish between users with complete profiles from those with incomplete ones. Our developer comes up with the following query to make things happen:

from django.db.models import Q
 
User.objects.filter(Q(email_address__isnull=False) | Q(address__isnull=False))

It’s not hard to imagine that this query might be relevant in several situations. For example, we might need it for displaying a list of complete user profiles but we might also need it for filtering the users that can be contacted. Of course we could just go ahead and copy the query to multiple places, but in the spirit of DRY it makes a lot of sense not to do that.

Luckily, Django offers a built-in alternative to copying query code. Thanks to the concept of custom model managers, we can define a query once and use it over and over again in different places.

class CustomUserManager(models.Manager):
    def with_complete_profiles(self):
        return self.get_queryset().filter(Q(email_address__isnull=False) | Q(address__isnull=False))
 
 
class User(models.Model):
    name = models.CharField(max_length=50)
    avatar = models.ImageField()
    email_address = models.EmailField(blank=True, null=True)
    address = models.ForeignKey('Address', blank=True, null=True)
 
    objects = CustomUserManager()

In the previous code example, we are effectively overriding Django’s default manager for the User model by redefining the objects attribute. From now on, we can readably and cleanly retrieve all users with complete profiles by calling the with_complete_profiles() method on the manager:

User.objects.with_complete_profiles()

Neat!

Hosting multiple sites within a single Django project

Imagine we are running a successful online store for cat food (let’s call it catfood247.com). Since things are going so well, we would like to expand our business with a second store for dog food, dogfood247.com. Does this mean we’ll have to set up a separate server even though the two stores will be very similar and share a lot of code? Having more servers means higher maintenance & running costs which are obviously things that, if possible, we would like to avoid.

Luckily, Django’s built-in “sites” framework enables us to run two or more websites within a single Django installation. Consider the following project layout:

.
└── petfood
    ├── petfood
    │   ├── settings.py
    ├── catfood
    │   ├── urls.py
    │   ├── views.py
    │   ├── …
    ├── dogfood
    │   ├── urls.py
    │   ├── views.py
    │   ├── …
    ├── manage.py
    └── …

We have three Django apps: petfood, the main app holding the global settings.py file every Django project needs to have; and catfood as well as dogfood , the two apps representing the actual sites to be served.

In addition to the app directories, we need to create two instances of Django’s Site model. The Site model is a simple Django model meant to logically represent a website by its domain & user-defined name. In our case, the following two Site objects are needed:

Site(name='catfood', domain='catfood247.com')
Site(name='dogfood', domain='dogfood247.com')

In other words, each site is represented by a Django app directory (including it’s own URLconf) and a corresponding Site object in Django’s database.

The only thing left to do is create a mechanism for determining the right URLconf on a per-request basis. Django’s documentation gives us a valuable hint at how to achieve what we want:

When a user requests a page from your Django-powered site, this is the algorithm the system follows to determine which Python code to execute:

1. Django determines the root URLconf module to use. Ordinarily, this is the value of the ROOT_URLCONF setting, but if the incoming HttpRequest object has a urlconf attribute (set by middleware), its value will be used in place of the ROOT_URLCONF setting.
2. (…)

How Django processes a request

In other words, Django makes it possible to set a URLconf for each separate request, thereby allowing us to differentiate between to or more Sites and their respective URLconf. Let’s go ahead and define a new middleware that sets the request.urlconf attribute based on the requested site’s name (e.g. catfood):

class SetURLConfMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response
 
    def __call__(self, request):
        request.urlconf = f"{request.site.name}.urls"
        response = self.get_response(request)
        return response

Don’t forget to add this middleware to the MIDDLEWARE list in your settings.py in order to activate it. Also, make sure to insert it after Django’s CurrentSiteMiddleware so that it has access to the request.site attribute.

And that’s all there is to it! From now on, whenever Django receives a request, it first determines the site the request should be forwarded to based on the request’s domain. This simple method makes it possible to support an arbitray number of sites within a single Django setup.

Note for Heroku users: Don’t forget to register each of your domains with your Heroku app!

Spring-cleaning your (Arch) Linux system

Disclaimer: Some operations mentioned in this post are potentially destructive and irreversible. Be sure to back up all your important data before proceeding.

Note: This post is written from the point of view of an Arch Linux user. Most steps presented below should nevertheless translate well to other distributions.

Through the usual course of their operation, operating systems (even Arch Linux) tend to slowly accumulate obsolete data. In most cases, this is not a problem. However, if you are like me, it gives you a nice and warm feeling to have a clean system. Apart from that, keeping your file system clean will also help you save some disk space and reduce the duration of system upgrades. More importantly, it will soon make you an expert of your operating system.

pacreport is a utility that lists (possibly) obsolete packages and files on your system. You can get it by installing the pacutils package. The following magic command will run pacreport, reformat its ouput and pipe the result into several files for easier post-editing:

sudo pacreport --unowned-files | head -n -2 | awk '$1 ~ /^[A-Z]/ {print $0} $1 !~ /^[A-Z]/ {print $1}' | csplit -szf pacreport - /:$/ {*}

The command should leave you with five files named pacreport0*. These files will help us in removing the following categories of obsolete data:

  1. Obsolete packages
  2. Obsolete system-level files
  3. Obsolete user files

Uninstall obsolete packages

Packages become obsolete for at least two reasons. For one, you might simply not need them anymore (unneeded packages). And secondly, they might have been installed as dependencies for other packages that are long gone (orphaned packages).

The pacreport command has generated three package lists for us: pacreport01, pacreport02 and pacreport03. Each of these lists potentially contains unneeded or orphaned packages. Now it’s your turn to go through these lists and leave only those packages you would like to remove. If you are unsure about some package, use the pacman -Qi some-package command to get more information. In case you would like to keep a package listed in pacreport02, remove it from the file and mark it as explicity installed:

sudo pacman -D --asexplicit some-package

Once you are done, remove the files’ header lines and run the following command:

sudo pacman -Rscn $(cat pacreport01 pacreport02 pacreport03)

After double-checking the output, confirm the removal operation to finally remove the listed packages. In my case, more than 400 packages were removed in this way.

Remove obsolete system-level files

Many administrative processes store data all across the filesystem. For example, pacman stores downloaded packages in /var/cache/pacman/pkg/ but does not remove them automatically. In case of problems after an upgrade, this practice allows downgrading a package without the need to re-download an older version. On the other hand, this directory can grow very large in size if not pruned periodically.

The paccache script that comes with the pacman-contrib package deletes all cached package versions except for the three most recent:

sudo paccache -r

To (additionally) remove all cached versions currently not installed execute the pacman -Sc command. In this way, you remove all cached versions of packages not currently installed and only retain the three most recent versions of all installed packages.

But it is not only pacman spreading files across the filesystem. Any process with sufficient permissions can create files wherever it likes. As these files are not part of the original distribution package, they are not automatically removed when uninstalling a package.

Luckily, the above pacreport command has also generated a list of unowned files. Open pacreport00 and go through the list of files, removing only those file paths you would like to keep. /etc/pacreport.conf allows you to track unowned-but-needed files and their associations (run man pacreport for an example). Then, when using pacreport --unowned-files, the files referenced in /etc/pacreport.conf will be omitted. Finally, remove the files left in pacreport00 with the below command:

sudo rm -r $(cat pacreport00)

Remove obsolete user files

Removing personal files in /home is often the most labour-intensive step as it can’t be automated easily. This is because any process a user executes can create any file within that user’s home directory. Fortunately, the process of manually removing obsolete files from your /home directory isn’t as tedious as it might sound at first.

My method of choice is performing a manual depth-first traversal of my /home directory tree, evaluating the files & directories I encounter and, if appropriate, removing them. Pay special attention to the following directories:

  • ~/.config/ – default directory for application configuration files
  • ~/.cache/ – default user cache directory
  • ~/.local/share/ – also used for application-specific configuration files

Validating constraints across multiple form fields in Django

After all fields of a Django form have been validated in isolation, the Form().clean(self) method is called to conclude the validation process. This method is meant to house validation logic not associated with one field in particular.

For example, let’s suppose we have an application where our users can order gourmet-level cat & dog food. For some awkward legal reason, though, the amount of cat & dog food items taken together cannot exceed 50 items per order.

Clearly, this requirement cannot be expressed in relation to only one field. Rather, two values have to be taken into account together during form validation. The below code sample illustrates a solution to our example use case:

from django import forms
 
 
class PetFoodForm(forms.Form):
    cat_cans = forms.IntegerField(initial=0, min_value=0)
    dog_cans = forms.IntegerField(initial=0, min_value=0)
 
    def clean(self):
        cleaned_data = super().clean()
        cat_cans = cleaned_data.get("cat_cans")
        dog_cans = cleaned_data.get("dog_cans")
 
        if cat_cans and dog_cans and (cat_cans + dog_cans > 50):
            raise forms.ValidationError("The number of selected items exceeds 50.")
 
 
form1 = PetFoodForm({
    'dog_cans': '15',
    'cat_cans': '15'
})
assert form1.is_valid()
 
form2 = PetFoodForm({
    'dog_cans': '30',
    'cat_cans': '30'
})
assert not form2.is_valid()

Where to define and instantiate associated models in Django

For the sake of example, let’s consider the following UML diagram describing two model classes and how they are interrelated:

We can see that there are two classes, User and UserProfile, with a one-to-one association between them. That is, each instance of class User is associated with one and only one instance of UserProfile and vice versa. This type of association is frequently encountered when modelling all sorts of real-world domains.

In Django, this kind of relationship between two entities is expressed as a OneToOneField defined within some Model class and pointing to another (or the same, in the case of a reflexive relationship). This raises the question of where to put the OneToOneField: Should it be an attribute of User or UserProfile?

You might be doubting the relevance of this question as–thanks to the related_name mechanism (aka reverse references)–we can later on traverse the association in either direction. This is certainly true, but there are other considerations to the association’s semantics that should be taken into account.

Existential dependence

One of these considerations is whether one object’s existence depends on the existence of the other. In the context of our example, does a UserProfile depend on the existence of a User? In most cases that would arguably hold true. The opposite statement could more likely turn out to not be true. Depending on the application, you could argue that a User can exist without having a UserProfile.

In this case, it would make sense to define the reference as an attribute of the UserProfile class. This way, you can express that whenever we delete a User instance, the associated UserProfile will be deleted as well. This would result in the following two Django model definitions:

class User(models.Model):
    pass


class UserProfile(models.Model):
    user = models.OneToOneField('User', on_delete=models.CASCADE)

If we defined the association within the User model class, the result would be semantically different:

class User(models.Model):
    profile = models.OneToOneField('User', on_delete=models.CASCADE)


class UserProfile(models.Model):
    pass

In the latter code example, the existence of a User instance depends on the existence of its associated UserProfile. Per se, neither piece of code is in any aspect “wronger” than the other. To know right from wrong, we would simply need to know more about the modeled domain.

Order of instantiation

Another factor to consider when deciding on how to associate models is the order of instantiation. In neither of the above model definitions is it possible to create both a User and a UserProfile instance at the same time. It is therefore necessary to decide which object to instantiate first.

As an example, let’s consider two possible scenarios of a user registration system. In the first scenario, a user completes the registration process and can later on fill out an optional UserProfile. In the second system, potential users are first asked to complete a profile as part of an application process. Only after they have been approved will an actual User instance be created.

Where to instantiate the associated objects

Once one has decided on where to put the associating attribute, it’s time to think about where to actually create the model instances. In the spirit of our example, we would like to create one and only one UserProfile whenever a new User has succesfully registered. At first glance, multiple places look like promising candidates for this functionality.

The __init__(…) magic method

Arguably the most obvious candidate is Python’s __init__(…) constructor method. After all, we would like to create a UserProfile whenever a new User is added or, in other words, initialized. However, this logic disregards the distinction between what happens in the database and what happens on the Python level.

__init__(…) is a Python construct and will be executed whenever a model instance is created in Python. This happens when we create a new User object for the first time, but it also happens whenever we retrieve an already existing instance from the database. In other words, if we were to instantiate a user’s UserProfile within the User model’s __init__(…) method, before long we would end up creating more than one profile instance per user!

The __new__(…) magic method

Another of Python’s magic methods, __new__(…), poses the same problem as __init__(…), as does Django’s save(…) method. The latter is called not only when creating a User object, but also when updating it. This behavior is not what we are looking for.

Signals to the rescue

Luckily, there is another mechanism that we can leverage to achieve what we want. Django features a variety of built-in signals which are a way for a piece of code to get notified when actions occur elsewhere in the framework. Beyond the built-in signals, a developer can easily define custom ones for application-specific events.

post_save is one of Django’s built-in signals. As its name strongly suggests, it is fired whenever an object has been saved to the application database. Though it’s not exactly what we need, we can still go ahead and base a new custom signal on the existing one:

# In myapp/signals.py

from django.db.models.signals import post_save
from django.dispatch import receiver, Signal


post_create = Signal()


@receiver(post_save)
def send_post_create(created, **kwargs):
    if created:
        post_create.send(**kwargs)

In Django, all signals are instances of the django.dispatch.Signal class. All you have to do to define a new signal is to name and instantiate a Signal instance. To actually fire the signal, we hook into post_save. Upon receiving an instance of this built-in signal and checking the created flag, we simply send a post_create signal. From now on we can react to the instantiation of a new object simply by defining another @receiver function with post_create as its signal.

Streamlining radio buttons and checkboxes with CSS & Font Awesome

The appearance of radio buttons and checkboxes differs greatly between browsers. Luckily, there are a number of ways to streamline the look of these input elements across browsers. One of them involves a combination of HTML, pure CSS and Font Awesome icons.

HTML structure & icons

The required HTML structure consists of a <label> element wrapping a hidden <input type="checkbox"> tag and two icons as well as an optional label text. Each of the icons represents one of the checkbox’s two possible states.

<label class="checkbox">
    <input type="checkbox" name="salami">
    <i class="far fa-lg fa-square"></i>
    <i class="far fa-lg fa-check-square"></i>
    Add salami by activating this chekbox
</label>

Wondering about why you would use the <label> tag as a wrapper and whether that really is legit HTML5? First of all, it is definitely legit, and secondly, this technique spares you having to provide a for="…" attribute on the <label>.

CSS rules

The CSS code uses the :checked pseudo-class selector to toggle one or the other of the icons.

label.checkbox input:checked ~ .fa-square {
    display: none;
}
label.checkbox input:not(:checked) ~ .fa-check-square {
    display: none;
}

That’s all there is to do! Your checkboxes will now look the same on every browser. As a bonus, you can also colorize them to make them fit the rest of your design:

Creating a persistent Arch Linux installation on a USB stick

I’ve been using Arch Linux for the better part of a decade now. As a result, I am so used to it that I’ll choose it for nearly any task at hand. Although Arch might not be a traditional distribution for persistent live systems, there’s really no reason to not use it to this purpose.

What follows is a list of steps to install and set up a minimal Arch Linux live USB system. In the spirit of KISS, we will go with a single-partition layout:

  1. Create a single Linux type partition with fdisk or the tool of your choice on your USB device (e.g. /dev/sdc)
  2. Create an ext4 file system on the created partition: # mkfs.ext4 /dev/sdc1
  3. Mount the resulting file system: # mount /dev/sdc1 /mnt/usbarch
  4. Use pacstrap from the arch-install-scripts package to install the base package group: # pacstrap /mnt/usbarch base
  5. Auto-generate an fstab file: # genfstab -U /mnt/usbarch >> /mnt/usbarch/etc/fstab
  6. Take a look at the generated /etc/fstab file and adapt if necessary
  7. Change root into the new system: # arch-chroot /mnt/usbarch
  8. Configure the time zone: # ln -sf /usr/share/zoneinfo/Region/City /etc/localtime and # hwclock --systohc
  9. Uncomment en_US.UTF-8 UTF-8 and other required locales in /etc/locale.gen, and generate them with: # locale-gen
  10. Set the LANG variable in /etc/locale.conf, for example: LANG=en_US.UTF-8
  11. Set a default keymap in /etc/vconsole.conf, for instance: KEYMAP=de-latin1
  12. Define a hostname in /etc/hostname, for example: usbarch
  13. Set a super-secure root password: # passwd
  14. Install GRUB on your USB device: pacman -Sy grub && grub-install --target=i386-pc /dev/sdc
  15. Finally, use the grub-mkconfig tool to auto-generate a grub.cfg file: grub-mkconfig -o /boot/grub/grub.cfg

The system should now be bootable and can be further adapted to your liking.

Writing an ISO disk image directly from the Internet to a device

Disk images tend to be large yet available disk space remains a scarce resource even in times of multi-terabyte devices. For this reason, it can still be handy to retrieve a disk image from the Internet and write it to a device without having to temporarily store it on your disk. The below command will retrieve some.iso with wget and pipe the downloaded data to dd‘s stdin. The venerable dd command will then write everything to the /dev/sdX device:

wget -q -O - http://example.com/some.iso | sudo dd of=/dev/sdX bs=4M

As always, be careful to supply the right output device file. As we all (should) know, any mistake in using dd can cause you erased devices and sleepless nights.