Let $d$ be the dimension of the input data and $\mathcal{H} = \{f ~|~ f(\boldsymbol{x}) = \text{sign}(\boldsymbol{w}^\top \boldsymbol{x} + b), b \in \mathbb{R} \}$ be the set of all linear classifiers.

$d_{VC}(\mathcal{H}) \geq d + 1$.

Proof We need to prove that $\mathcal{H}$ can shatter at least $d+1$ points. That is, it suffices to show that there exists a set of $d+1$ points such that $\mathcal{H}$ can produce any pre-specified $\{-1, +1\}$ assignment $\boldsymbol{y} = (y_1, \ldots, y_{d+1})$. Let \begin{align} \boldsymbol{X} &= \left[ \begin{array}{c} \boldsymbol{x}_1^\top \\ \vdots \\ \boldsymbol{x}_{d+1}^\top \end{array} \right] \\ &= \left[ \begin{array}{ccccc} 1 & 0 & 0 & \cdots & 0 \\ 1 & 1 & 0 & \cdots & 0 \\ 1 & 0 & 1 & \cdots & 0 \\ & \vdots & \vdots & \ddots & \vdots \\ 1 & 0 & 0 & \cdots & 1 \end{array} \right] \in \mathbb{R}^{d+1 \times d+1} \end{align}

The first coordinate of each $\boldsymbol{x}_i$ is 1 to allow the bias term $b$ to be produced. Note that $\boldsymbol{X}$ is invertible. With this definition, for any $\boldsymbol{y}$ we can always have $\text{sign}(\boldsymbol{X} \boldsymbol{w}) = \boldsymbol{y}$ by choosing $\boldsymbol{w} = \boldsymbol{X}^{-1} \boldsymbol{y}$ since $\boldsymbol{X}\boldsymbol{w} = \boldsymbol{y}$ implies $\text{sign}(\boldsymbol{X} \boldsymbol{w}) = \boldsymbol{y}$.

$d_{VC}(\mathcal{H}) \leq d + 1$.

Proof We need to show that no $d+2$ points can be shattered by $\mathcal{H}$. That is to show that there exists a certain assignment $\boldsymbol{y} \in \mathbb{R}^{d+2}$ not achievable by $\mathcal{H}$. We will construct one such assignment.

Assume we have $d+2$ points $\boldsymbol{x}_1, \ldots, \boldsymbol{x}_{d+2}$ in $(d+1)$-dimensional space. Then, the set is linearly dependent. So, there exists $\boldsymbol{x}_j = \sum_{i\neq j} a_i \boldsymbol{x}_i$ such that not all $a_i$ are 0. Set $y_j = -1$ and $y_i = \text{sign}(a_i) = \text{sign}(\boldsymbol{w}^\top \boldsymbol{x}_i)$ . If $a_i = 0$, $y_i$ can be arbitrary.

$\text{sign}(\boldsymbol{w}^\top \boldsymbol{x}_j) = \text{sign}(\sum_{i \neq j} a_i \boldsymbol{w}^\top \boldsymbol{x}_i) > 0 \neq y_j$

Since there exists an assignment of $d+2$ points not achievable by $\mathcal{H}$, we can say that $d_{VC}(\mathcal{H}) \leq d + 1$.

$d_{VC}(\mathcal{H}) = d + 1$.

Proof Combining the two lemmas.

Source:

• Lecture 7 on VC dimension by Professor Yaser Abu-Mostafa here.
• See also the previous post on VC dimension.