Image convolution centers on interpolation.
Sub-pixel rendering is based on the fact that a pixel is composed of three color components, say R G B horizontally. A white pixel of width 2/3 on the border of a white area can be rendered in false color as R G. Especially font rendering is improved by sub-pixel rendering.
The geometry of the color components may vary, like B G R. A wrong geometry will destroy the intended extra sharpness. Also, note that the positive effect is on the x-axis.
Consider a Pixel #cccccc two-third width: for instance #ffff00. And a half-pixel #ff8800. More or less. Should weight the green component higher probably.
Sub-pixel convolution just operates with these sub-pixels, instead of just till pixel level. Your data model has for the x coordinate 3 times higher resolution and combines triples in a special way.
Grayscale images may exploit sub-pixel rendering best.