Fast Walsh–Hadamard Transform

卷积

运算 \(\odot\),序列 \(a, b, c\),方阵 \(P, Q, R\) 满足 \(R\) 可逆,且 \[ \begin{cases} c_n &= \sum_{p \odot q = n}a_pb_q\\ \sum_j R_{i, j}c_j &= \sum_pP_{i,p}a_p\sum_q Q_{i, q}b_q \end{cases} \] 所以 \[ \begin{aligned} \sum_{j}R_{i, j}\sum_{p, q}[p\odot q = j]a_pb_q &= \sum_{p, q}P_{i, p}Q_{i, q}a_pb_q\\ \sum_{p, q}R_{i, p \odot q}a_pb_q &= \sum_{p, q}P_{i, p}Q_{i, q}a_pb_q\\ R_{i, p\odot q} &= P_{i, p}Q_{i, q} \end{aligned} \]

对序列 \(a\) 施加线性变换 \(P\),序列 \(b\) 施加线性变换 \(Q\),点乘之后再施加逆变换 \(R^{-1}\),就可以实现 \(\odot\) 卷积。

对于满足交换律的运算 \(\odot\)\(P = Q\)

Kronecker product

\(\mathbf {A}\)\(m \times n\) 矩阵,\(\mathbf {B}\)\(p \times q\),定义 \(\mathbf {A}\)\(\mathbf {B}\)Kronecker product\[ {\displaystyle \mathbf {A} \otimes \mathbf {B} ={\begin{bmatrix}a_{11}\mathbf {B} &\cdots &a_{1n}\mathbf {B} \\\vdots &\ddots &\vdots \\a_{m1}\mathbf {B} &\cdots &a_{mn}\mathbf {B} \end{bmatrix}}} \] 更明确地 \[ {\mathbf{A}\otimes\mathbf{B}} = \begin{bmatrix} a_{11} b_{11} & a_{11} b_{12} & \cdots & a_{11} b_{1q} & \cdots & \cdots & a_{1n} b_{11} & a_{1n} b_{12} & \cdots & a_{1n} b_{1q} \\ a_{11} b_{21} & a_{11} b_{22} & \cdots & a_{11} b_{2q} & \cdots & \cdots & a_{1n} b_{21} & a_{1n} b_{22} & \cdots & a_{1n} b_{2q} \\ \vdots & \vdots & \ddots & \vdots & & & \vdots & \vdots & \ddots & \vdots \\ a_{11} b_{p1} & a_{11} b_{p2} & \cdots & a_{11} b_{pq} & \cdots & \cdots & a_{1n} b_{p1} & a_{1n} b_{p2} & \cdots & a_{1n} b_{pq} \\ \vdots & \vdots & & \vdots & \ddots & & \vdots & \vdots & & \vdots \\ \vdots & \vdots & & \vdots & & \ddots & \vdots & \vdots & & \vdots \\ a_{m1} b_{11} & a_{m1} b_{12} & \cdots & a_{m1} b_{1q} & \cdots & \cdots & a_{mn} b_{11} & a_{mn} b_{12} & \cdots & a_{mn} b_{1q} \\ a_{m1} b_{21} & a_{m1} b_{22} & \cdots & a_{m1} b_{2q} & \cdots & \cdots & a_{mn} b_{21} & a_{mn} b_{22} & \cdots & a_{mn} b_{2q} \\ \vdots & \vdots & \ddots & \vdots & & & \vdots & \vdots & \ddots & \vdots \\ a_{m1} b_{p1} & a_{m1} b_{p2} & \cdots & a_{m1} b_{pq} & \cdots & \cdots & a_{mn} b_{p1} & a_{mn} b_{p2} & \cdots & a_{mn} b_{pq} \end{bmatrix} \]

性质

  1. \((\mathbf{A} \otimes \mathbf{B})(\mathbf{C} \otimes \mathbf{D}) = (\mathbf{AC}) \otimes (\mathbf{BD})\)

  2. \((\mathbf{A} \otimes \mathbf{B})^{-1} = \mathbf{A}^{-1} \otimes \mathbf{B}^{-1}\)

位运算卷积

位运算的每一位都是独立的,考虑按位构造变换矩阵。

假设存在 2 阶方阵 \(P_2, Q_2, R_2\),其中 \(R_2\) 可逆,满足 \(R_{i, p\odot q} = P_{i, p}Q_{i, q}\)

\(2^k\) 阶方阵 \[ \begin{aligned} P_{2^k} &= P_2 \otimes P_2 \otimes \dots \otimes P_2 = P_2^{\otimes k}\\ Q_{2^k} &= Q_2 \otimes Q_2 \otimes \dots \otimes Q_2 = Q_2^{\otimes k}\\ R_{2^k} &= R_2 \otimes R_2 \otimes \dots \otimes R_2 = R_2^{\otimes k} \end{aligned} \]

\(i, j\)​ 的二进制表示为 \(i_{k-1}\dots i_1i_0\)\(j_{k-1}\dots j_1j_0\)

上面式子的含义为 \[ P_{i, j} = \prod_{l=0}^{k-1}P_{i_l, j_l}\quad Q_{i, j} = \prod_{l=0}^{k-1}Q_{i_l, j_l}\quad R_{i, j} = \prod_{l=0}^{k-1}R_{i_l, j_l} \] 容易验证此时 \(P_{2^k}, Q_{2^k}, R_{2^k}\) 仍然满足\(R_{2^k}\) 可逆且 \(R_{i, p\odot q} = P_{i, p}Q_{i, q}\)

快速计算 \(H_{2^k}\mathbf \alpha\)

根据分块矩阵相乘 \[ H_{2n}\mathbf{\alpha}_{2n} = (H_{2}\otimes H_n)\mathbf{\alpha}_{2n} = \begin{bmatrix} H_{0, 0}H_{n} & H_{0, 1}H_{n}\\ H_{1, 0}H_{n} & H_{1, 1}H_{n} \end{bmatrix} \begin{bmatrix} \mathbf{\alpha}_n\\ \mathbf{\alpha}'_{n} \end{bmatrix} = \begin{bmatrix} H_{0, 0}H_{n}\mathbf{\alpha}_n + H_{0, 1}H_{n}\mathbf{\alpha}'_n\\ H_{1, 0}H_{n}\mathbf{\alpha}_n + H_{1, 1}H_{n}\mathbf{\alpha}'_n \end{bmatrix} \]

分治计算 \(H_n\mathbf{\alpha}_n\)\(H_n\mathbf{\alpha}'_n\) ,复杂度 \(O(k2^k)\)

Or

容易构造出 \(P = Q = R = \begin{bmatrix}1&1\\1&0\end{bmatrix}\)​ 或 \(\begin{bmatrix}1&0\\1&1\end{bmatrix}\)​。

其中 \(\begin{bmatrix}1&0\\1&1\end{bmatrix}\)​​​ 有特殊含义 \(R_{i, j} = [i \operatorname{and} j = j]\)​​​​,使用该矩阵,\(R^{-1} = \begin{bmatrix}1 & 0\\-1 & 1\end{bmatrix}\)。​

And

容易构造出 \(P = Q = R = \begin{bmatrix}0&1\\1&1\end{bmatrix}\)​​ 或 \(\begin{bmatrix}1&1\\0&1\end{bmatrix}\)​​。

其中 \(\begin{bmatrix}1&1\\0&1\end{bmatrix}\)​ 有特殊含义 \(R_{i, j} = [i \operatorname{and} j = i]\)​,使用该矩阵,\(R^{-1} = \begin{bmatrix}1 & -1\\0 & 1\end{bmatrix}\)​。

Xor

容易构造出 \(P = Q = R = \begin{bmatrix}1&-1\\1&1\end{bmatrix}\)​​ 或 \(\begin{bmatrix}1&1\\1&-1\end{bmatrix}\)​​。

其中 \(\begin{bmatrix}1&1\\1&-1\end{bmatrix}\) 有特殊含义 \(R_{i, j} = (-1)^{\operatorname{popcnt}(i \operatorname{and} j)}\),使用该矩阵,\(R^{-1} = \begin{bmatrix}\frac12 & \frac12\\\frac12 & -\frac12\end{bmatrix}\)​。

三个变换矩阵相同,因此记为 \(H\)

注意到对于异或卷积 \(H^{-1} = \frac12 H\),那么 \(H_{2^k}^{-1} = \dfrac{1}{2^k}H_{2^k}\),所以逆变换相当于正变换后每一项除以 \(n\),与离散傅里叶变换类似。

子集卷积

\[ c_n = \sum\limits_{p, q}[p \operatorname{or} q = n][p \operatorname{and} q = 0] a_pb_q \]

将限制条件转化为 \([p \operatorname{or} q = n][\operatorname{popcnt}(p) + \operatorname{popcnt}(q) = \operatorname{popcnt}(n)]\)​​​。

利用占位多项式 \[ \begin{aligned} A_n(x) &= a_nx^{\operatorname{popcnt}(n)}\\ B_n(x) &= b_nx^{\operatorname{popcnt}(n)}\\ C_n(x) &= \sum\limits_{p \operatorname{or} q = n}A_p(x)B_q(x)\\ c_n &= [x^{\operatorname{popcnt}(n)}]C_n(x) \end{aligned} \] 也可以交换两个维度,定义两个向量的或卷积为 \[ (\mathbf A \operatorname {or} \mathbf B)_n = \sum\limits_{p \operatorname{or} q = n} a_pb_q \]\[ \begin{aligned} \mathbf{A}_n &= ([\operatorname{popcnt}(0) = n]a_0, [\operatorname{popcnt}(1) = n]a_1, \dots)\\ \mathbf{B}_n &= ([\operatorname{popcnt}(0) = n]b_0, [\operatorname{popcnt}(1) = n]b_1, \dots)\\ \mathbf{C}_n &= \mathbf A_n \operatorname {or} \mathbf B_n \end{aligned} \]

广义位运算卷积

将位运算从二进制推广到 \(m\) 进制。

类似二进制的情形,假设存在 m 阶方阵 \(P_m, Q_m, R_m\),其中 \(R_m\) 可逆,满足 \(R_{i, p\odot q} = P_{i, p}Q_{i, q}\)

\(m^k\) 阶方阵, \[ \begin{aligned} P_{m^k} &= P_m \otimes P_m \otimes \dots \otimes P_m = P_m^{\otimes k}\\ Q_{m^k} &= Q_m \otimes Q_m \otimes \dots \otimes Q_m = Q_m^{\otimes k}\\ R_{m^k} &= R_m \otimes R_m \otimes \dots \otimes R_m = R_m^{\otimes k} \end{aligned} \]

\(i, j\)​ 的 \(m\)​ 进制表示为 \(i_{k-1}\dots i_1i_0\)\(j_{k-1}\dots j_1j_0\)

上面式子的含义为 \[ P_{i, j} = \prod_{l=0}^{k-1}P_{i_l, j_l}\quad Q_{i, j} = \prod_{l=0}^{k-1}Q_{i_l, j_l}\quad R_{i, j} = \prod_{l=0}^{k-1}R_{i_l, j_l} \]

Or

\(i = i_{k-1}\dots i_1i_0, j = j_{k-1}\dots j_1j_0\)​​ 其中 \(i_l, j_l \in [0, m)\)​​,定义 \(m\)​​ 进制下 \(\mathrm{or}\)​​​​ 运算为 \[ (i \operatorname{or} j)_l = \max(i_l, j_l) \]

一种构造是形如 \(\begin{bmatrix}1&0&0&0\\1&1&0&0\\1&1&1&0\\1&1&1&1\end{bmatrix}\)​ 的下三角矩阵,含义是 \(H_{i, j} = [i \geq j]\)​,\(H^{-1} = \begin{bmatrix}1&0&0&0\\-1&1&0&0\\0&-1&1&0\\0&0&-1&1\end{bmatrix}\)​。

And

\(i = i_{k-1}\dots i_1i_0, j = j_{k-1}\dots j_1j_0\)​​​ 其中 \(i_l, j_l \in [0, m)\)​​​,定义 \(m\)​​​ 进制下 \(\mathrm{and}\)​​​​​ 运算为 \[ (i \operatorname{and} j)_l = \min(i_l, j_l) \]

一种构造是形如 \(\begin{bmatrix}1&1&1&1\\0&1&1&1\\0&0&1&1\\0&0&0&1\end{bmatrix}\)​​ 的上三角矩阵,含义是 \(H_{i, j} = [i \leq j]\)​​,\(H^{-1}=\begin{bmatrix}1&-1&0&0\\0&1&-1&0\\0&0&1&-1\\0&0&0&1\end{bmatrix}\)​​。

Xor

\(i = i_{k-1}\dots i_1i_0, j = j_{k-1}\dots j_1j_0\)​​ 其中 \(i_l, j_l \in [0, m)\)​​,定义 \(m\)​​ 进制下 \(\mathrm{xor}\)​​​​ 运算为 \[ (i \operatorname{xor} j)_l = (i_l + j_l) \bmod m \] 发现这个就是循环卷积,于是可以把傅里叶变换的范德蒙矩阵拿过来。 \[ H= \begin{bmatrix}1& 1 & 1& \cdots & 1\\ 1& \omega_m^1& \omega_m^2& \cdots & \omega_m^{m - 1}\\ 1& \omega_m^2 & \omega_m^4& \cdots & \omega_m^{2(m - 1)}\\ \vdots & \vdots& \vdots& \ddots& \vdots\\ 1& \omega_m^{m - 1}&\omega_m^{2(m - 1)} & \cdots & \omega_m^{(m - 1)(m - 1)} \end{bmatrix} \] 它的逆为: \[ H^{-1}=\dfrac{1}{m}\begin{bmatrix}1& 1 & 1& \cdots & 1\\ 1& \omega_m^{-1}& \omega_m^{-2}& \cdots & \omega_m^{-(m - 1)}\\ 1& \omega_m^{-2} & \omega_m^{-4}& \cdots & \omega_m^{-2(m - 1)}\\ \vdots& \vdots& \vdots&\ddots& \vdots\\ 1& \omega_m^{-(m - 1)}& \omega_m^{-2(m - 1)} & \cdots & \omega_m^{-(m - 1)(m - 1)}\end{bmatrix} \]

对于 \(\omega_m\) 不存在的情形,需要扩域:

分圆多项式

\[ \Phi_n(x) = \prod\limits_{i = 0}^{n - 1}(x - \omega_n^i)^{[i \perp n]} \] 易证 \[ \prod\limits_{d \mid n}\Phi_d(x) = x^n - 1 \] 莫比乌斯反演可得 \[ \Phi_n(x) = \prod\limits_{d \mid n}(x^d - 1)^{\mu(n / d)} \]

定理: 1. 分圆多项式在 \(Q\) 上不可约。 2. 在模 \(\Phi_n(x)\) 意义下,\(x\) 的阶恰好为 \(k\)

所以可以用 \(x\) 代替 \(\omega_m\),运算在模 \(\Phi_m(x)\) 意义下进行。

如果 \(\Phi_m(x)\) 比较简单,可以直接模 \(\Phi_m(x)\),否则,由于 \(\Phi_m(x) \mid (x^m - 1)\) 所以可以先在模 \(x^m - 1\) 意义下运算,即做长度为 \(m\) 的循环卷积,最后再模 \(\Phi_m(x)\)

模板题:CodeForces 1103E

// Author:  HolyK
// Created: Sun Aug  8 21:08:34 2021
#include <bits/stdc++.h>
template <class T, class U>
inline bool smin(T &x, const U &y) {
  return y < x ? x = y, 1 : 0;
}
template <class T, class U>
inline bool smax(T &x, const U &y) {
  return x < y ? x = y, 1 : 0;
}

using LL = long long;
using PII = std::pair<int, int>;

using u64 = uint64_t;
using Poly = std::array<u64, 4>;
Poly operator*(Poly a, Poly b) {
  static u64 c[7];
  memset(c, 0, sizeof c);
  for (int i = 0; i < 4; i++) {
    for (int j = 0; j < 4; j++) {
      c[i + j] += a[i] * b[j];
    }
  }
  return {c[0] - c[4] - c[5], c[1] + c[4] - c[6], c[2] - c[4], c[3] + c[4]};
}
Poly &operator+=(Poly &a, Poly b) {
  for (int i = 0; i < 4; i++) {
    a[i] += b[i];
  }
  return a;
}
Poly &operator-=(Poly &a, Poly b) {
  for (int i = 0; i < 4; i++) {
    a[i] -= b[i];
  }
  return a;
}
Poly operator+(Poly a, Poly b) {
  return a += b;
}
Poly operator-(Poly a, Poly b) {
  return a -= b;
}
constexpr Poly w[] = {
  Poly({1ULL, 0ULL, 0ULL, 0ULL}),
  Poly({0ULL, 1ULL, 0ULL, 0ULL}),
  Poly({0ULL, 0ULL, 1ULL, 0ULL}),
  Poly({0ULL, 0ULL, 0ULL, 1ULL}),
  Poly({~0ULL, 1ULL, ~0ULL, 1ULL}),
  Poly({~0ULL, 0ULL, 0ULL, 0ULL}),
  Poly({0ULL, ~0ULL, 0ULL, 0ULL}),
  Poly({0ULL, 0ULL, ~0ULL, 0ULL}),
  Poly({0ULL, 0ULL, 0ULL, ~0ULL}),
  Poly({1ULL, ~0ULL, 1ULL, ~0ULL})
};
constexpr int N(100000), M(10);
void fwt(Poly *a, int f) {
  for (int i = 1; i < N; i *= M) {
    for (int j = 0; j < N; j += i * M) {
      for (int k = 0; k < i; k++) {
        static Poly t[M];
        for (int p = 0; p < M; p++) {
          t[p].fill(0);
          for (int q = 0; q < M; q++) {
            t[p] += a[i * q + j + k] * w[(M + p * f) * q % M];
          }
        }
        for (int p = 0; p < M; p++) {
          a[i * p + j + k] = t[p];
        }
      }
    }
  }
}
Poly fpow(Poly x, int k) {
  Poly r;
  r.fill(0);
  r[0] = 1;
  for (; k; k >>= 1, x = x * x) {
    if (k & 1) r = r * x;
  }
  return r;
}
Poly a[N];
int main() {
  std::ios::sync_with_stdio(false);
  std::cin.tie(nullptr);
  int n;
  std::cin >> n;
  for (int i = 0; i < n; i++) {
    int x;
    std::cin >> x;
    a[x][0]++;
  }
  fwt(a, 1);
  for (int i = 0; i < N; i++) {
    a[i] = fpow(a[i], n);
  }
  fwt(a, -1);
  for (int i = 0; i < n; i++) {
    assert(a[i][1] == 0 && a[i][2] == 0 && a[i][3] == 0);
    // std::cout << a[i][0] << "\n";
    std::cout << (a[i][0] >> 5) * 94170628496287261ULL % (1ULL << 58) << "\n";
  }
  return 0;
}

最后修改于 2021-08-23