~lkcamp/patches

20 3

[RESEND v6 0/9] Add new formats support to vkms

Details
Message ID
<20220819182411.20246-1-igormtorrente@gmail.com>
DKIM signature
missing
Download raw message
Summary
=======
This series of patches refactor some vkms components in order to introduce
new formats to the planes and writeback connector.

Now in the blend function, the plane's pixels are converted to ARGB16161616
and then blended together.

The CRC is calculated based on the ARGB1616161616 buffer. And if required,
this buffer is copied/converted to the writeback buffer format.

And to handle the pixel conversion, new functions were added to convert
from a specific format to ARGB16161616 (the reciprocal is also true).

Tests
=====
This patch series was tested using the following igt tests:
-t ".*kms_plane.*"
-t ".*kms_writeback.*"
-t ".*kms_cursor_crc*"
-t ".*kms_flip.*"

New tests passing
-------------------
- pipe-A-cursor-size-change
- pipe-A-cursor-alpha-transparent

Performance
-----------
It's running slightly faster than the current implementation.

Results running the IGT[1] test
`igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:

|                  Frametime                   |
|:--------------------------------------------:|
|  Implementation |  Current  |   This commit  |
|:---------------:|:---------:|:--------------:|
| frametime range |  9~22 ms  |     10~22 ms   |
|     Average     |  11.4 ms  |     12.32 ms   |

Memory consumption
==================
It consumes less memory than the current implementation in
the common case (more detail in the commit message).

| Memory consumption (output dimensions) |
|:--------------------------------------:|
|       Current      |     This patch    |
|:------------------:|:-----------------:|
|   Width * Heigth   |     2 * Width     |

[1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4

XRGB to ARGB behavior
=====================
During the development, I decided to always fill the alpha channel of
the output pixel whenever the conversion from a format without an alpha
channel to ARGB16161616 is necessary. Therefore, I ignore the value
received from the XRGB and overwrite the value with 0xFFFF.

Primary plane and CRTC size
===========================
This patch series reworks the blend function to accept a primary plane with
a different size and position from CRTC.
Because now we need to fill the background, we had a loss in
performance with this change

Alpha channel output for XRGB formats
=====================================
There's still an open question about which value the writeback alpha channel
should be for XRGB formats.
The current igt test implementation is expecting the channel to not be change.
But it's not entirely clear if this should be the behavior followed by vkms
(or any other driver).

Open issue: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/issues/118
---

Igor Torrente (9):
  drm: vkms: Replace hardcoded value of `vkms_composer.map` to
    DRM_FORMAT_MAX_PLANES
  drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
  drm: drm_atomic_helper: Add a new helper to deal with the writeback
    connector validation
  drm: vkms: get the reference to `drm_framebuffer` instead if coping it
  drm: vkms: Add fb information to `vkms_writeback_job`
  drm: vkms: Refactor the plane composer to accept new formats
  drm: vkms: Supports to the case where primary plane doesn't match the
    CRTC
  drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
  drm: vkms: Add support to the RGB565 format

 Documentation/gpu/vkms.rst            |   7 +-
 drivers/gpu/drm/drm_atomic_helper.c   |  39 ++++
 drivers/gpu/drm/vkms/Makefile         |   1 +
 drivers/gpu/drm/vkms/vkms_composer.c  | 314 ++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_drv.h       |  39 +++-
 drivers/gpu/drm/vkms/vkms_formats.c   | 302 +++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
 drivers/gpu/drm/vkms/vkms_plane.c     |  50 ++--
 drivers/gpu/drm/vkms/vkms_writeback.c |  39 +++-
 include/drm/drm_atomic_helper.h       |   3 +
 10 files changed, 586 insertions(+), 220 deletions(-)
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h

-- 
2.30.2

[RESEND v6 1/9] drm: vkms: Replace hardcoded value of `vkms_composer.map` to DRM_FORMAT_MAX_PLANES

Details
Message ID
<20220819182411.20246-2-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +1 -1
The `map` vector at `vkms_composer` uses a hardcoded value to define its
size.

If someday the maximum number of planes increases, this hardcoded value
can be a problem.

This value is being replaced with the DRM_FORMAT_MAX_PLANES macro.

Acked-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 91e63b12f60f..36fbab5989d1 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -30,7 +30,7 @@ struct vkms_writeback_job {
struct vkms_composer {
	struct drm_framebuffer fb;
	struct drm_rect src, dst;
	struct iosys_map map[4];
	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
	unsigned int offset;
	unsigned int pitch;
	unsigned int cpp;
-- 
2.30.2

[RESEND v6 2/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info`

Details
Message ID
<20220819182411.20246-3-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +66 -65
Changes the name of this struct to a more meaningful name.
A name that represents better what this struct is about.

Composer is the code that do the compositing of the planes.
This struct contains information on the frame used in the output
composition. Thus, vkms_frame_info is a better name to represent
this.

V5: Fix a commit message typo(Melissa Wen).

Reviewed-by: Melissa Wen <mwen@igalia.com>
Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c | 87 ++++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_drv.h      |  6 +-
 drivers/gpu/drm/vkms/vkms_plane.c    | 38 ++++++------
 3 files changed, 66 insertions(+), 65 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 775b97766e08..0aded4e87e60 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -11,11 +11,11 @@
#include "vkms_drv.h"

static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
				 const struct vkms_composer *composer)
				 const struct vkms_frame_info *frame_info)
{
	u32 pixel;
	int src_offset = composer->offset + (y * composer->pitch)
				      + (x * composer->cpp);
	int src_offset = frame_info->offset + (y * frame_info->pitch)
					    + (x * frame_info->cpp);

	pixel = *(u32 *)&buffer[src_offset];

@@ -26,24 +26,24 @@ static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
 * compute_crc - Compute CRC value on output frame
 *
 * @vaddr: address to final framebuffer
 * @composer: framebuffer's metadata
 * @frame_info: framebuffer's metadata
 *
 * returns CRC value computed using crc32 on the visible portion of
 * the final framebuffer at vaddr_out
 */
static uint32_t compute_crc(const u8 *vaddr,
			    const struct vkms_composer *composer)
			    const struct vkms_frame_info *frame_info)
{
	int x, y;
	u32 crc = 0, pixel = 0;
	int x_src = composer->src.x1 >> 16;
	int y_src = composer->src.y1 >> 16;
	int h_src = drm_rect_height(&composer->src) >> 16;
	int w_src = drm_rect_width(&composer->src) >> 16;
	int x_src = frame_info->src.x1 >> 16;
	int y_src = frame_info->src.y1 >> 16;
	int h_src = drm_rect_height(&frame_info->src) >> 16;
	int w_src = drm_rect_width(&frame_info->src) >> 16;

	for (y = y_src; y < y_src + h_src; ++y) {
		for (x = x_src; x < x_src + w_src; ++x) {
			pixel = get_pixel_from_buffer(x, y, vaddr, composer);
			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
		}
	}
@@ -98,8 +98,8 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
 * blend - blend value at vaddr_src with value at vaddr_dst
 * @vaddr_dst: destination address
 * @vaddr_src: source address
 * @dst_composer: destination framebuffer's metadata
 * @src_composer: source framebuffer's metadata
 * @dst_frame_info: destination framebuffer's metadata
 * @src_frame_info: source framebuffer's metadata
 * @pixel_blend: blending equation based on plane format
 *
 * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
@@ -111,33 +111,33 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
 * pixel color values
 */
static void blend(void *vaddr_dst, void *vaddr_src,
		  struct vkms_composer *dst_composer,
		  struct vkms_composer *src_composer,
		  struct vkms_frame_info *dst_frame_info,
		  struct vkms_frame_info *src_frame_info,
		  void (*pixel_blend)(const u8 *, u8 *))
{
	int i, j, j_dst, i_dst;
	int offset_src, offset_dst;
	u8 *pixel_dst, *pixel_src;

	int x_src = src_composer->src.x1 >> 16;
	int y_src = src_composer->src.y1 >> 16;
	int x_src = src_frame_info->src.x1 >> 16;
	int y_src = src_frame_info->src.y1 >> 16;

	int x_dst = src_composer->dst.x1;
	int y_dst = src_composer->dst.y1;
	int h_dst = drm_rect_height(&src_composer->dst);
	int w_dst = drm_rect_width(&src_composer->dst);
	int x_dst = src_frame_info->dst.x1;
	int y_dst = src_frame_info->dst.y1;
	int h_dst = drm_rect_height(&src_frame_info->dst);
	int w_dst = drm_rect_width(&src_frame_info->dst);

	int y_limit = y_src + h_dst;
	int x_limit = x_src + w_dst;

	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
			offset_dst = dst_composer->offset
				     + (i_dst * dst_composer->pitch)
				     + (j_dst++ * dst_composer->cpp);
			offset_src = src_composer->offset
				     + (i * src_composer->pitch)
				     + (j * src_composer->cpp);
			offset_dst = dst_frame_info->offset
				     + (i_dst * dst_frame_info->pitch)
				     + (j_dst++ * dst_frame_info->cpp);
			offset_src = src_frame_info->offset
				     + (i * src_frame_info->pitch)
				     + (j * src_frame_info->cpp);

			pixel_src = (u8 *)(vaddr_src + offset_src);
			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
@@ -149,32 +149,33 @@ static void blend(void *vaddr_dst, void *vaddr_src,
	}
}

static void compose_plane(struct vkms_composer *primary_composer,
			  struct vkms_composer *plane_composer,
static void compose_plane(struct vkms_frame_info *primary_plane_info,
			  struct vkms_frame_info *plane_frame_info,
			  void *vaddr_out)
{
	struct drm_framebuffer *fb = &plane_composer->fb;
	struct drm_framebuffer *fb = &plane_frame_info->fb;
	void *vaddr;
	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);

	if (WARN_ON(iosys_map_is_null(&plane_composer->map[0])))
	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
		return;

	vaddr = plane_composer->map[0].vaddr;
	vaddr = plane_frame_info->map[0].vaddr;

	if (fb->format->format == DRM_FORMAT_ARGB8888)
		pixel_blend = &alpha_blend;
	else
		pixel_blend = &x_blend;

	blend(vaddr_out, vaddr, primary_composer, plane_composer, pixel_blend);
	blend(vaddr_out, vaddr, primary_plane_info,
	      plane_frame_info, pixel_blend);
}

static int compose_active_planes(void **vaddr_out,
				 struct vkms_composer *primary_composer,
				 struct vkms_frame_info *primary_plane_info,
				 struct vkms_crtc_state *crtc_state)
{
	struct drm_framebuffer *fb = &primary_composer->fb;
	struct drm_framebuffer *fb = &primary_plane_info->fb;
	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
	const void *vaddr;
	int i;
@@ -187,10 +188,10 @@ static int compose_active_planes(void **vaddr_out,
		}
	}

	if (WARN_ON(iosys_map_is_null(&primary_composer->map[0])))
	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
		return -EINVAL;

	vaddr = primary_composer->map[0].vaddr;
	vaddr = primary_plane_info->map[0].vaddr;

	memcpy(*vaddr_out, vaddr, gem_obj->size);

@@ -199,8 +200,8 @@ static int compose_active_planes(void **vaddr_out,
	 * ((primary <- overlay) <- cursor)
	 */
	for (i = 1; i < crtc_state->num_active_planes; i++)
		compose_plane(primary_composer,
			      crtc_state->active_planes[i]->composer,
		compose_plane(primary_plane_info,
			      crtc_state->active_planes[i]->frame_info,
			      *vaddr_out);

	return 0;
@@ -222,7 +223,7 @@ void vkms_composer_worker(struct work_struct *work)
						composer_work);
	struct drm_crtc *crtc = crtc_state->base.crtc;
	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
	struct vkms_composer *primary_composer = NULL;
	struct vkms_frame_info *primary_plane_info = NULL;
	struct vkms_plane_state *act_plane = NULL;
	bool crc_pending, wb_pending;
	void *vaddr_out = NULL;
@@ -250,16 +251,16 @@ void vkms_composer_worker(struct work_struct *work)
	if (crtc_state->num_active_planes >= 1) {
		act_plane = crtc_state->active_planes[0];
		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
			primary_composer = act_plane->composer;
			primary_plane_info = act_plane->frame_info;
	}

	if (!primary_composer)
	if (!primary_plane_info)
		return;

	if (wb_pending)
		vaddr_out = crtc_state->active_writeback->data[0].vaddr;

	ret = compose_active_planes(&vaddr_out, primary_composer,
	ret = compose_active_planes(&vaddr_out, primary_plane_info,
				    crtc_state);
	if (ret) {
		if (ret == -EINVAL && !wb_pending)
@@ -267,7 +268,7 @@ void vkms_composer_worker(struct work_struct *work)
		return;
	}

	crc32 = compute_crc(vaddr_out, primary_composer);
	crc32 = compute_crc(vaddr_out, primary_plane_info);

	if (wb_pending) {
		drm_writeback_signal_completion(&out->wb_connector, 0);
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 36fbab5989d1..5199c5f18e17 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -27,7 +27,7 @@ struct vkms_writeback_job {
	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
};

struct vkms_composer {
struct vkms_frame_info {
	struct drm_framebuffer fb;
	struct drm_rect src, dst;
	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
@@ -39,11 +39,11 @@ struct vkms_composer {
/**
 * vkms_plane_state - Driver specific plane state
 * @base: base plane state
 * @composer: data required for composing computation
 * @frame_info: data required for composing computation
 */
struct vkms_plane_state {
	struct drm_shadow_plane_state base;
	struct vkms_composer *composer;
	struct vkms_frame_info *frame_info;
};

struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index d8eb674b49a6..fcae6c508f4b 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -24,20 +24,20 @@ static struct drm_plane_state *
vkms_plane_duplicate_state(struct drm_plane *plane)
{
	struct vkms_plane_state *vkms_state;
	struct vkms_composer *composer;
	struct vkms_frame_info *frame_info;

	vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL);
	if (!vkms_state)
		return NULL;

	composer = kzalloc(sizeof(*composer), GFP_KERNEL);
	if (!composer) {
		DRM_DEBUG_KMS("Couldn't allocate composer\n");
	frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL);
	if (!frame_info) {
		DRM_DEBUG_KMS("Couldn't allocate frame_info\n");
		kfree(vkms_state);
		return NULL;
	}

	vkms_state->composer = composer;
	vkms_state->frame_info = frame_info;

	__drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base);

@@ -54,12 +54,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
		/* dropping the reference we acquired in
		 * vkms_primary_plane_update()
		 */
		if (drm_framebuffer_read_refcount(&vkms_state->composer->fb))
			drm_framebuffer_put(&vkms_state->composer->fb);
		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
			drm_framebuffer_put(&vkms_state->frame_info->fb);
	}

	kfree(vkms_state->composer);
	vkms_state->composer = NULL;
	kfree(vkms_state->frame_info);
	vkms_state->frame_info = NULL;

	__drm_gem_destroy_shadow_plane_state(&vkms_state->base);
	kfree(vkms_state);
@@ -99,7 +99,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
	struct vkms_plane_state *vkms_plane_state;
	struct drm_shadow_plane_state *shadow_plane_state;
	struct drm_framebuffer *fb = new_state->fb;
	struct vkms_composer *composer;
	struct vkms_frame_info *frame_info;

	if (!new_state->crtc || !fb)
		return;
@@ -107,15 +107,15 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
	vkms_plane_state = to_vkms_plane_state(new_state);
	shadow_plane_state = &vkms_plane_state->base;

	composer = vkms_plane_state->composer;
	memcpy(&composer->src, &new_state->src, sizeof(struct drm_rect));
	memcpy(&composer->dst, &new_state->dst, sizeof(struct drm_rect));
	memcpy(&composer->fb, fb, sizeof(struct drm_framebuffer));
	memcpy(&composer->map, &shadow_plane_state->data, sizeof(composer->map));
	drm_framebuffer_get(&composer->fb);
	composer->offset = fb->offsets[0];
	composer->pitch = fb->pitches[0];
	composer->cpp = fb->format->cpp[0];
	frame_info = vkms_plane_state->frame_info;
	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
	drm_framebuffer_get(&frame_info->fb);
	frame_info->offset = fb->offsets[0];
	frame_info->pitch = fb->pitches[0];
	frame_info->cpp = fb->format->cpp[0];
}

static int vkms_plane_atomic_check(struct drm_plane *plane,
-- 
2.30.2

[RESEND v6 3/9] drm: drm_atomic_helper: Add a new helper to deal with the writeback connector validation

Details
Message ID
<20220819182411.20246-4-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +46 -5
Add a helper function to validate the connector configuration received in
the encoder atomic_check by the drivers.

So the drivers don't need to do these common validations themselves.

V2: Move the format verification to a new helper at the drm_atomic_helper.c
    (Thomas Zimmermann).
V3: Format check improvements (Leandro Ribeiro).
    Minor improvements(Thomas Zimmermann).
V5: Fix some grammar issues in the commit message (André Almeida).

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/drm_atomic_helper.c   | 39 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_writeback.c |  9 +++----
 include/drm/drm_atomic_helper.h       |  3 +++
 3 files changed, 46 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 9603193d2fa1..2052e18fa64c 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -776,6 +776,45 @@ drm_atomic_helper_check_modeset(struct drm_device *dev,
}
EXPORT_SYMBOL(drm_atomic_helper_check_modeset);

/**
 * drm_atomic_helper_check_wb_connector_state() - Check writeback encoder state
 * @encoder: encoder state to check
 * @conn_state: connector state to check
 *
 * Checks if the writeback connector state is valid, and returns an error if it
 * isn't.
 *
 * RETURNS:
 * Zero for success or -errno
 */
int
drm_atomic_helper_check_wb_encoder_state(struct drm_encoder *encoder,
					 struct drm_connector_state *conn_state)
{
	struct drm_writeback_job *wb_job = conn_state->writeback_job;
	struct drm_property_blob *pixel_format_blob;
	struct drm_framebuffer *fb;
	size_t i, nformats;
	u32 *formats;

	if (!wb_job || !wb_job->fb)
		return 0;

	pixel_format_blob = wb_job->connector->pixel_formats_blob_ptr;
	nformats = pixel_format_blob->length / sizeof(u32);
	formats = pixel_format_blob->data;
	fb = wb_job->fb;

	for (i = 0; i < nformats; i++)
		if (fb->format->format == formats[i])
			return 0;

	drm_dbg_kms(encoder->dev, "Invalid pixel format %p4cc\n", &fb->format->format);

	return -EINVAL;
}
EXPORT_SYMBOL(drm_atomic_helper_check_wb_encoder_state);

/**
 * drm_atomic_helper_check_plane_state() - Check plane state for validity
 * @plane_state: plane state to check
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index af1604dfbbaf..250e509a298f 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -30,6 +30,7 @@ static int vkms_wb_encoder_atomic_check(struct drm_encoder *encoder,
{
	struct drm_framebuffer *fb;
	const struct drm_display_mode *mode = &crtc_state->mode;
	int ret;

	if (!conn_state->writeback_job || !conn_state->writeback_job->fb)
		return 0;
@@ -41,11 +42,9 @@ static int vkms_wb_encoder_atomic_check(struct drm_encoder *encoder,
		return -EINVAL;
	}

	if (fb->format->format != vkms_wb_formats[0]) {
		DRM_DEBUG_KMS("Invalid pixel format %p4cc\n",
			      &fb->format->format);
		return -EINVAL;
	}
	ret = drm_atomic_helper_check_wb_encoder_state(encoder, conn_state);
	if (ret < 0)
		return ret;

	return 0;
}
diff --git a/include/drm/drm_atomic_helper.h b/include/drm/drm_atomic_helper.h
index 4045e2507e11..3fbf695da60f 100644
--- a/include/drm/drm_atomic_helper.h
+++ b/include/drm/drm_atomic_helper.h
@@ -40,6 +40,9 @@ struct drm_private_state;

int drm_atomic_helper_check_modeset(struct drm_device *dev,
				struct drm_atomic_state *state);
int
drm_atomic_helper_check_wb_encoder_state(struct drm_encoder *encoder,
					 struct drm_connector_state *conn_state);
int drm_atomic_helper_check_plane_state(struct drm_plane_state *plane_state,
					const struct drm_crtc_state *crtc_state,
					int min_scale,
-- 
2.30.2

[RESEND v6 4/9] drm: vkms: get the reference to `drm_framebuffer` instead if coping it

Details
Message ID
<20220819182411.20246-5-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +8 -8
Instead of coping `drm_framebuffer` - which can cause problems -
we just get the reference and add the ref count.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_composer.c |  4 ++--
 drivers/gpu/drm/vkms/vkms_drv.h      |  2 +-
 drivers/gpu/drm/vkms/vkms_plane.c    | 10 +++++-----
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 0aded4e87e60..b9fb408e8973 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -153,7 +153,7 @@ static void compose_plane(struct vkms_frame_info *primary_plane_info,
			  struct vkms_frame_info *plane_frame_info,
			  void *vaddr_out)
{
	struct drm_framebuffer *fb = &plane_frame_info->fb;
	struct drm_framebuffer *fb = plane_frame_info->fb;
	void *vaddr;
	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);

@@ -175,7 +175,7 @@ static int compose_active_planes(void **vaddr_out,
				 struct vkms_frame_info *primary_plane_info,
				 struct vkms_crtc_state *crtc_state)
{
	struct drm_framebuffer *fb = &primary_plane_info->fb;
	struct drm_framebuffer *fb = primary_plane_info->fb;
	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
	const void *vaddr;
	int i;
diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 5199c5f18e17..95d71322500b 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -28,7 +28,7 @@ struct vkms_writeback_job {
};

struct vkms_frame_info {
	struct drm_framebuffer fb;
	struct drm_framebuffer *fb;
	struct drm_rect src, dst;
	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
	unsigned int offset;
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index fcae6c508f4b..8adbfdc05e50 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -50,12 +50,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
	struct vkms_plane_state *vkms_state = to_vkms_plane_state(old_state);
	struct drm_crtc *crtc = vkms_state->base.base.crtc;

	if (crtc) {
	if (crtc && vkms_state->frame_info->fb) {
		/* dropping the reference we acquired in
		 * vkms_primary_plane_update()
		 */
		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
			drm_framebuffer_put(&vkms_state->frame_info->fb);
		if (drm_framebuffer_read_refcount(vkms_state->frame_info->fb))
			drm_framebuffer_put(vkms_state->frame_info->fb);
	}

	kfree(vkms_state->frame_info);
@@ -110,9 +110,9 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
	frame_info = vkms_plane_state->frame_info;
	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
	frame_info->fb = fb;
	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
	drm_framebuffer_get(&frame_info->fb);
	drm_framebuffer_get(frame_info->fb);
	frame_info->offset = fb->offsets[0];
	frame_info->pitch = fb->pitches[0];
	frame_info->cpp = fb->format->cpp[0];
-- 
2.30.2

[RESEND v6 5/9] drm: vkms: Add fb information to `vkms_writeback_job`

Details
Message ID
<20220819182411.20246-6-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +41 -8
This commit is the groundwork to introduce new formats to the planes and
writeback buffer. As part of it, a new buffer metadata field is added to
`vkms_writeback_job`, this metadata is represented by the `vkms_frame_info`
struct.

Also adds two new function pointers (`line_to_frame_func` and
`frame_to_line_func`) are defined to handle format conversion
from/to internal format.

A new internal format(`struct pixel_argb_u16`) is introduced to deal with
all possible inputs. It consists of 16 bits fields that represent each of
the channels.

These things will allow us, in the future, to have different compositing
and wb format types.

V2: Change the code to get the drm_framebuffer reference and not copy its
    contents (Thomas Zimmermann).
V3: Drop the refcount in the wb code (Thomas Zimmermann).
V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
    and vkms_plane_state (Pekka Paalanen)
V6: Improvements to some struct/struct members names (Pekka Paalanen).
    Splits this patch in two (Pekka Paalanen).

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_drv.h       | 29 ++++++++++++++++++++++-----
 drivers/gpu/drm/vkms/vkms_writeback.c | 20 +++++++++++++++---
 2 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
index 95d71322500b..0d407ec84f94 100644
--- a/drivers/gpu/drm/vkms/vkms_drv.h
+++ b/drivers/gpu/drm/vkms/vkms_drv.h
@@ -22,11 +22,6 @@

#define NUM_OVERLAY_PLANES 8

struct vkms_writeback_job {
	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
};

struct vkms_frame_info {
	struct drm_framebuffer *fb;
	struct drm_rect src, dst;
@@ -36,6 +31,29 @@ struct vkms_frame_info {
	unsigned int cpp;
};

struct pixel_argb_u16 {
	u16 a, r, g, b;
};

struct line_buffer {
	size_t n_pixels;
	struct pixel_argb_u16 *pixels;
};

typedef void
(*line_to_frame_func)(struct vkms_frame_info *frame_info,
		      const struct line_buffer *buffer, int y);

typedef void
(*frame_to_line_func)(struct line_buffer *buffer,
		      const struct vkms_frame_info *frame_info, int y);

struct vkms_writeback_job {
	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
	struct vkms_frame_info wb_frame_info;
	line_to_frame_func wb_write;
};

/**
 * vkms_plane_state - Driver specific plane state
 * @base: base plane state
@@ -44,6 +62,7 @@ struct vkms_frame_info {
struct vkms_plane_state {
	struct drm_shadow_plane_state base;
	struct vkms_frame_info *frame_info;
	frame_to_line_func plane_read;
};

struct vkms_plane {
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index 250e509a298f..c87f6c89e7b4 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -74,12 +74,15 @@ static int vkms_wb_prepare_job(struct drm_writeback_connector *wb_connector,
	if (!vkmsjob)
		return -ENOMEM;

	ret = drm_gem_fb_vmap(job->fb, vkmsjob->map, vkmsjob->data);
	ret = drm_gem_fb_vmap(job->fb, vkmsjob->wb_frame_info.map, vkmsjob->data);
	if (ret) {
		DRM_ERROR("vmap failed: %d\n", ret);
		goto err_kfree;
	}

	vkmsjob->wb_frame_info.fb = job->fb;
	drm_framebuffer_get(vkmsjob->wb_frame_info.fb);

	job->priv = vkmsjob;

	return 0;
@@ -98,7 +101,9 @@ static void vkms_wb_cleanup_job(struct drm_writeback_connector *connector,
	if (!job->fb)
		return;

	drm_gem_fb_vunmap(job->fb, vkmsjob->map);
	drm_gem_fb_vunmap(job->fb, vkmsjob->wb_frame_info.map);

	drm_framebuffer_put(vkmsjob->wb_frame_info.fb);

	vkmsdev = drm_device_to_vkms_device(job->fb->dev);
	vkms_set_composer(&vkmsdev->output, false);
@@ -115,14 +120,23 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
	struct drm_writeback_connector *wb_conn = &output->wb_connector;
	struct drm_connector_state *conn_state = wb_conn->base.state;
	struct vkms_crtc_state *crtc_state = output->composer_state;
	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
	struct vkms_writeback_job *active_wb;
	struct vkms_frame_info *wb_frame_info;

	if (!conn_state)
		return;

	vkms_set_composer(&vkmsdev->output, true);

	active_wb = conn_state->writeback_job->priv;
	wb_frame_info = &active_wb->wb_frame_info;

	spin_lock_irq(&output->composer_lock);
	crtc_state->active_writeback = conn_state->writeback_job->priv;
	crtc_state->active_writeback = active_wb;
	wb_frame_info->offset = fb->offsets[0];
	wb_frame_info->pitch = fb->pitches[0];
	wb_frame_info->cpp = fb->format->cpp[0];
	crtc_state->wb_pending = true;
	spin_unlock_irq(&output->composer_lock);
	drm_writeback_queue_job(wb_conn, connector_state);
-- 
2.30.2

[RESEND v6 6/9] drm: vkms: Refactor the plane composer to accept new formats

Details
Message ID
<20220819182411.20246-7-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +317 -181
Currently the blend function only accepts XRGB_8888 and ARGB_8888
as a color input.

This patch refactors all the functions related to the plane composition
to overcome this limitation.

The pixels blend is done using the new internal format. And new handlers
are being added to convert a specific format to/from this internal format.

So the blend operation depends on these handlers to convert to this common
format. The blended result, if necessary, is converted to the writeback
buffer format.

This patch introduces three major differences to the blend function.
1 - All the planes are blended at once.
2 - The blend calculus is done as per line instead of per pixel.
3 - It is responsible to calculates the CRC and writing the writeback
buffer(if necessary).

These changes allow us to allocate way less memory in the intermediate
buffer to compute these operations. Because now we don't need to
have the entire intermediate image lines at once, just one line is
enough.

| Memory consumption (output dimensions) |
|:--------------------------------------:|
|       Current      |     This patch    |
|:------------------:|:-----------------:|
|   Width * Heigth   |     2 * Width     |

Beyond memory, we also have a minor performance benefit from all
these changes. Results running the IGT[1] test
`igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:

|                 Frametime                  |
|:------------------------------------------:|
|  Implementation |  Current  |  This commit |
|:---------------:|:---------:|:------------:|
| frametime range |  9~22 ms  |    5~17 ms   |
|     Average     |  11.4 ms  |    7.8 ms    |

[1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4

V2: Improves the performance drastically, by performing the operations
    per-line and not per-pixel(Pekka Paalanen).
    Minor improvements(Pekka Paalanen).
V3: Changes the code to blend the planes all at once. This improves
    performance, memory consumption, and removes much of the weirdness
    of the V2(Pekka Paalanen and me).
    Minor improvements(Pekka Paalanen and me).
V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
    Several security/robustness improvents(Pekka Paalanen).
    Removes check_planes_x_bounds function and allows partial
    partly off-screen(Pekka Paalanen).
V6: Fix a mismatch of some variable sizes (Pekka Paalanen).
    Several minor improvements (Pekka Paalanen).

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 Documentation/gpu/vkms.rst            |   4 -
 drivers/gpu/drm/vkms/Makefile         |   1 +
 drivers/gpu/drm/vkms/vkms_composer.c  | 320 ++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_formats.c   | 155 +++++++++++++
 drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
 drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
 drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
 7 files changed, 317 insertions(+), 181 deletions(-)
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
 create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index 973e2d43108b..a49e4ae92653 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -118,10 +118,6 @@ Add Plane Features

There's lots of plane features we could add support for:

- Clearing primary plane: clear primary plane before plane composition (at the
  start) for correctness of pixel blend ops. It also guarantees alpha channel
  is cleared in the target buffer for stable crc. [Good to get started]

- ARGB format on primary plane: blend the primary plane into background with
  translucent alpha.

diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
index 72f779cbfedd..1b28a6a32948 100644
--- a/drivers/gpu/drm/vkms/Makefile
+++ b/drivers/gpu/drm/vkms/Makefile
@@ -3,6 +3,7 @@ vkms-y := \
	vkms_drv.o \
	vkms_plane.o \
	vkms_output.o \
	vkms_formats.o \
	vkms_crtc.o \
	vkms_composer.o \
	vkms_writeback.o
diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index b9fb408e8973..5b1a8bdd8268 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -7,204 +7,188 @@
#include <drm/drm_fourcc.h>
#include <drm/drm_gem_framebuffer_helper.h>
#include <drm/drm_vblank.h>
#include <linux/minmax.h>

#include "vkms_drv.h"

static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
				 const struct vkms_frame_info *frame_info)
static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
{
	u32 pixel;
	int src_offset = frame_info->offset + (y * frame_info->pitch)
					    + (x * frame_info->cpp);
	u32 new_color;

	pixel = *(u32 *)&buffer[src_offset];
	new_color = (src * 0xffff + dst * (0xffff - alpha));

	return pixel;
	return DIV_ROUND_CLOSEST(new_color, 0xffff);
}

/**
 * compute_crc - Compute CRC value on output frame
 * pre_mul_alpha_blend - alpha blending equation
 * @src_frame_info: source framebuffer's metadata
 * @stage_buffer: The line with the pixels from src_plane
 * @output_buffer: A line buffer that receives all the blends output
 *
 * @vaddr: address to final framebuffer
 * @frame_info: framebuffer's metadata
 * Using the information from the `frame_info`, this blends only the
 * necessary pixels from the `stage_buffer` to the `output_buffer`
 * using premultiplied blend formula.
 *
 * returns CRC value computed using crc32 on the visible portion of
 * the final framebuffer at vaddr_out
 * The current DRM assumption is that pixel color values have been already
 * pre-multiplied with the alpha channel values. See more
 * drm_plane_create_blend_mode_property(). Also, this formula assumes a
 * completely opaque background.
 */
static uint32_t compute_crc(const u8 *vaddr,
			    const struct vkms_frame_info *frame_info)
static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
				struct line_buffer *stage_buffer,
				struct line_buffer *output_buffer)
{
	int x, y;
	u32 crc = 0, pixel = 0;
	int x_src = frame_info->src.x1 >> 16;
	int y_src = frame_info->src.y1 >> 16;
	int h_src = drm_rect_height(&frame_info->src) >> 16;
	int w_src = drm_rect_width(&frame_info->src) >> 16;

	for (y = y_src; y < y_src + h_src; ++y) {
		for (x = x_src; x < x_src + w_src; ++x) {
			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
		}
	int x_dst = frame_info->dst.x1;
	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
	struct pixel_argb_u16 *in = stage_buffer->pixels;
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    stage_buffer->n_pixels);

	for (int x = 0; x < x_limit; x++) {
		out[x].a = (u16)0xffff;
		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
	}

	return crc;
}

static u8 blend_channel(u8 src, u8 dst, u8 alpha)
static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
{
	u32 pre_blend;
	u8 new_color;

	pre_blend = (src * 255 + dst * (255 - alpha));

	/* Faster div by 255 */
	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
		return true;

	return new_color;
	return false;
}

/**
 * alpha_blend - alpha blending equation
 * @argb_src: src pixel on premultiplied alpha mode
 * @argb_dst: dst pixel completely opaque
 * @wb_frame_info: The writeback frame buffer metadata
 * @crtc_state: The crtc state
 * @crc32: The crc output of the final frame
 * @output_buffer: A buffer of a row that will receive the result of the blend(s)
 * @stage_buffer: The line with the pixels from plane being blend to the output
 *
 * blend pixels using premultiplied blend formula. The current DRM assumption
 * is that pixel color values have been already pre-multiplied with the alpha
 * channel values. See more drm_plane_create_blend_mode_property(). Also, this
 * formula assumes a completely opaque background.
 * This function blends the pixels (Using the `pre_mul_alpha_blend`)
 * from all planes, calculates the crc32 of the output from the former step,
 * and, if necessary, convert and store the output to the writeback buffer.
 */
static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
static void blend(struct vkms_writeback_job *wb,
		  struct vkms_crtc_state *crtc_state,
		  u32 *crc32, struct line_buffer *stage_buffer,
		  struct line_buffer *output_buffer, size_t row_size)
{
	u8 alpha;
	struct vkms_plane_state **plane = crtc_state->active_planes;
	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
	u32 n_active_planes = crtc_state->num_active_planes;

	int y_dst = primary_plane_info->dst.y1;
	int h_dst = drm_rect_height(&primary_plane_info->dst);
	int y_limit = y_dst + h_dst;

	for (size_t y = y_dst; y < y_limit; y++) {
		plane[0]->plane_read(output_buffer, primary_plane_info, y);

		/* If there are other planes besides primary, we consider the active
		 * planes should be in z-order and compose them associatively:
		 * ((primary <- overlay) <- cursor)
		 */
		for (size_t i = 1; i < n_active_planes; i++) {
			if (!check_y_limit(plane[i]->frame_info, y))
				continue;

			plane[i]->plane_read(stage_buffer, plane[i]->frame_info, y);
			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
					    output_buffer);
		}

		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);

	alpha = argb_src[3];
	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
		if (wb)
			wb->wb_write(&wb->wb_frame_info, output_buffer, y);
	}
}

/**
 * x_blend - blending equation that ignores the pixel alpha
 *
 * overwrites RGB color value from src pixel to dst pixel.
 */
static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
static int check_format_funcs(struct vkms_crtc_state *crtc_state,
			      struct vkms_writeback_job *active_wb)
{
	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
	struct vkms_plane_state **planes = crtc_state->active_planes;
	u32 n_active_planes = crtc_state->num_active_planes;

	for (size_t i = 0; i < n_active_planes; i++)
		if (!planes[i]->plane_read)
			return -1;

	if (active_wb && !active_wb->wb_write)
		return -1;

	return 0;
}

/**
 * blend - blend value at vaddr_src with value at vaddr_dst
 * @vaddr_dst: destination address
 * @vaddr_src: source address
 * @dst_frame_info: destination framebuffer's metadata
 * @src_frame_info: source framebuffer's metadata
 * @pixel_blend: blending equation based on plane format
 *
 * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
 * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
 * and clearing alpha channel to an completely opaque background. This function
 * uses buffer's metadata to locate the new composite values at vaddr_dst.
 *
 * TODO: completely clear the primary plane (a = 0xff) before starting to blend
 * pixel color values
 */
static void blend(void *vaddr_dst, void *vaddr_src,
		  struct vkms_frame_info *dst_frame_info,
		  struct vkms_frame_info *src_frame_info,
		  void (*pixel_blend)(const u8 *, u8 *))
static int compose_active_planes(struct vkms_writeback_job *active_wb,
				 struct vkms_crtc_state *crtc_state,
				 u32 *crc32)
{
	int i, j, j_dst, i_dst;
	int offset_src, offset_dst;
	u8 *pixel_dst, *pixel_src;

	int x_src = src_frame_info->src.x1 >> 16;
	int y_src = src_frame_info->src.y1 >> 16;

	int x_dst = src_frame_info->dst.x1;
	int y_dst = src_frame_info->dst.y1;
	int h_dst = drm_rect_height(&src_frame_info->dst);
	int w_dst = drm_rect_width(&src_frame_info->dst);

	int y_limit = y_src + h_dst;
	int x_limit = x_src + w_dst;

	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
			offset_dst = dst_frame_info->offset
				     + (i_dst * dst_frame_info->pitch)
				     + (j_dst++ * dst_frame_info->cpp);
			offset_src = src_frame_info->offset
				     + (i * src_frame_info->pitch)
				     + (j * src_frame_info->cpp);

			pixel_src = (u8 *)(vaddr_src + offset_src);
			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
			pixel_blend(pixel_src, pixel_dst);
			/* clearing alpha channel (0xff)*/
			pixel_dst[3] = 0xff;
		}
		i_dst++;
	size_t line_width, pixel_size = sizeof(struct pixel_argb_u16);
	struct vkms_frame_info *primary_plane_info = NULL;
	struct line_buffer output_buffer, stage_buffer;
	struct vkms_plane_state *act_plane = NULL;
	int ret = 0;

	/*
	 * This check exists so we can call `crc32_le` for the entire line
	 * instead doing it for each channel of each pixel in case
	 * `struct `pixel_argb_u16` had any gap added by the compiler
	 * between the struct fields.
	 */
	static_assert(sizeof(struct pixel_argb_u16) == 8);

	if (crtc_state->num_active_planes >= 1) {
		act_plane = crtc_state->active_planes[0];
		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
			primary_plane_info = act_plane->frame_info;
	}
}

static void compose_plane(struct vkms_frame_info *primary_plane_info,
			  struct vkms_frame_info *plane_frame_info,
			  void *vaddr_out)
{
	struct drm_framebuffer *fb = plane_frame_info->fb;
	void *vaddr;
	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
	if (!primary_plane_info)
		return -EINVAL;

	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
		return;
		return -EINVAL;

	vaddr = plane_frame_info->map[0].vaddr;
	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
		return -EINVAL;

	if (fb->format->format == DRM_FORMAT_ARGB8888)
		pixel_blend = &alpha_blend;
	else
		pixel_blend = &x_blend;
	line_width = drm_rect_width(&primary_plane_info->dst);
	stage_buffer.n_pixels = line_width;
	output_buffer.n_pixels = line_width;

	blend(vaddr_out, vaddr, primary_plane_info,
	      plane_frame_info, pixel_blend);
}
	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
	if (!stage_buffer.pixels) {
		DRM_ERROR("Cannot allocate memory for the output line buffer");
		return -ENOMEM;
	}

static int compose_active_planes(void **vaddr_out,
				 struct vkms_frame_info *primary_plane_info,
				 struct vkms_crtc_state *crtc_state)
{
	struct drm_framebuffer *fb = primary_plane_info->fb;
	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
	const void *vaddr;
	int i;

	if (!*vaddr_out) {
		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
		if (!*vaddr_out) {
			DRM_ERROR("Cannot allocate memory for output frame.");
			return -ENOMEM;
		}
	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
	if (!output_buffer.pixels) {
		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
		ret = -ENOMEM;
		goto free_stage_buffer;
	}

	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
		return -EINVAL;
	if (active_wb) {
		struct vkms_frame_info *wb_frame_info = &active_wb->wb_frame_info;

	vaddr = primary_plane_info->map[0].vaddr;
		wb_frame_info->src = primary_plane_info->src;
		wb_frame_info->dst = primary_plane_info->dst;
	}

	memcpy(*vaddr_out, vaddr, gem_obj->size);
	blend(active_wb, crtc_state, crc32, &stage_buffer,
	      &output_buffer, line_width * pixel_size);

	/* If there are other planes besides primary, we consider the active
	 * planes should be in z-order and compose them associatively:
	 * ((primary <- overlay) <- cursor)
	 */
	for (i = 1; i < crtc_state->num_active_planes; i++)
		compose_plane(primary_plane_info,
			      crtc_state->active_planes[i]->frame_info,
			      *vaddr_out);
	kvfree(output_buffer.pixels);
free_stage_buffer:
	kvfree(stage_buffer.pixels);

	return 0;
	return ret;
}

/**
@@ -222,13 +206,11 @@ void vkms_composer_worker(struct work_struct *work)
						struct vkms_crtc_state,
						composer_work);
	struct drm_crtc *crtc = crtc_state->base.crtc;
	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
	struct vkms_frame_info *primary_plane_info = NULL;
	struct vkms_plane_state *act_plane = NULL;
	bool crc_pending, wb_pending;
	void *vaddr_out = NULL;
	u32 crc32 = 0;
	u64 frame_start, frame_end;
	u32 crc32 = 0;
	int ret;

	spin_lock_irq(&out->composer_lock);
@@ -248,35 +230,19 @@ void vkms_composer_worker(struct work_struct *work)
	if (!crc_pending)
		return;

	if (crtc_state->num_active_planes >= 1) {
		act_plane = crtc_state->active_planes[0];
		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
			primary_plane_info = act_plane->frame_info;
	}

	if (!primary_plane_info)
		return;

	if (wb_pending)
		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
		ret = compose_active_planes(active_wb, crtc_state, &crc32);
	else
		ret = compose_active_planes(NULL, crtc_state, &crc32);

	ret = compose_active_planes(&vaddr_out, primary_plane_info,
				    crtc_state);
	if (ret) {
		if (ret == -EINVAL && !wb_pending)
			kvfree(vaddr_out);
	if (ret)
		return;
	}

	crc32 = compute_crc(vaddr_out, primary_plane_info);

	if (wb_pending) {
		drm_writeback_signal_completion(&out->wb_connector, 0);
		spin_lock_irq(&out->composer_lock);
		crtc_state->wb_pending = false;
		spin_unlock_irq(&out->composer_lock);
	} else {
		kvfree(vaddr_out);
	}

	/*
diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
new file mode 100644
index 000000000000..ca4bfcac686b
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -0,0 +1,155 @@
// SPDX-License-Identifier: GPL-2.0+

#include <drm/drm_rect.h>
#include <linux/minmax.h>

#include "vkms_formats.h"

static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
{
	return frame_info->offset + (y * frame_info->pitch)
				  + (x * frame_info->cpp);
}

/*
 * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
 *
 * @frame_info: Buffer metadata
 * @x: The x(width) coordinate of the 2D buffer
 * @y: The y(Heigth) coordinate of the 2D buffer
 *
 * Takes the information stored in the frame_info, a pair of coordinates, and
 * returns the address of the first color channel.
 * This function assumes the channels are packed together, i.e. a color channel
 * comes immediately after another in the memory. And therefore, this function
 * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
 */
static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
				int x, int y)
{
	size_t offset = pixel_offset(frame_info, x, y);

	return (u8 *)frame_info->map[0].vaddr + offset;
}

static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
{
	int x_src = frame_info->src.x1 >> 16;
	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);

	return packed_pixels_addr(frame_info, x_src, y_src);
}

static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
				 const struct vkms_frame_info *frame_info, int y)
{
	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
	u8 *src_pixels = get_packed_src_addr(frame_info, y);
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    stage_buffer->n_pixels);

	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
		/*
		 * The 257 is the "conversion ratio". This number is obtained by the
		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
		 * the best color value in a pixel format with more possibilities.
		 * A similar idea applies to others RGB color conversions.
		 */
		out_pixels[x].a = (u16)src_pixels[3] * 257;
		out_pixels[x].r = (u16)src_pixels[2] * 257;
		out_pixels[x].g = (u16)src_pixels[1] * 257;
		out_pixels[x].b = (u16)src_pixels[0] * 257;
	}
}

static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
				 const struct vkms_frame_info *frame_info, int y)
{
	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
	u8 *src_pixels = get_packed_src_addr(frame_info, y);
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    stage_buffer->n_pixels);

	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
		out_pixels[x].a = (u16)0xffff;
		out_pixels[x].r = (u16)src_pixels[2] * 257;
		out_pixels[x].g = (u16)src_pixels[1] * 257;
		out_pixels[x].b = (u16)src_pixels[0] * 257;
	}
}

/*
 * The following  functions take an line of argb_u16 pixels from the
 * src_buffer, convert them to a specific format, and store them in the
 * destination.
 *
 * They are used in the `compose_active_planes` to convert and store a line
 * from the src_buffer to the writeback buffer.
 */
static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
				 const struct line_buffer *src_buffer, int y)
{
	int x_dst = frame_info->dst.x1;
	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    src_buffer->n_pixels);

	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
		/*
		 * This sequence below is important because the format's byte order is
		 * in little-endian. In the case of the ARGB8888 the memory is
		 * organized this way:
		 *
		 * | Addr     | = blue channel
		 * | Addr + 1 | = green channel
		 * | Addr + 2 | = Red channel
		 * | Addr + 3 | = Alpha channel
		 */
		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
	}
}

static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
				 const struct line_buffer *src_buffer, int y)
{
	int x_dst = frame_info->dst.x1;
	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    src_buffer->n_pixels);

	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
		dst_pixels[3] = 0xff;
		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
	}
}

frame_to_line_func get_frame_to_line_function(u32 format)
{
	switch (format) {
	case DRM_FORMAT_ARGB8888:
		return &ARGB8888_to_argb_u16;
	case DRM_FORMAT_XRGB8888:
		return &XRGB8888_to_argb_u16;
	default:
		return NULL;
	}
}

line_to_frame_func get_line_to_frame_function(u32 format)
{
	switch (format) {
	case DRM_FORMAT_ARGB8888:
		return &argb_u16_to_ARGB8888;
	case DRM_FORMAT_XRGB8888:
		return &argb_u16_to_XRGB8888;
	default:
		return NULL;
	}
}
diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
new file mode 100644
index 000000000000..053ca42d5b31
--- /dev/null
+++ b/drivers/gpu/drm/vkms/vkms_formats.h
@@ -0,0 +1,12 @@
// SPDX-License-Identifier: GPL-2.0+

#ifndef _VKMS_FORMATS_H_
#define _VKMS_FORMATS_H_

#include "vkms_drv.h"

frame_to_line_func get_frame_to_line_function(u32 format);

line_to_frame_func get_line_to_frame_function(u32 format);

#endif /* _VKMS_FORMATS_H_ */
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 8adbfdc05e50..7a479a714565 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -10,6 +10,7 @@
#include <drm/drm_plane_helper.h>

#include "vkms_drv.h"
#include "vkms_formats.h"

static const u32 vkms_formats[] = {
	DRM_FORMAT_XRGB8888,
@@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
	struct drm_shadow_plane_state *shadow_plane_state;
	struct drm_framebuffer *fb = new_state->fb;
	struct vkms_frame_info *frame_info;
	u32 fmt = fb->format->format;

	if (!new_state->crtc || !fb)
		return;
@@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
	frame_info->offset = fb->offsets[0];
	frame_info->pitch = fb->pitches[0];
	frame_info->cpp = fb->format->cpp[0];
	vkms_plane_state->plane_read = get_frame_to_line_function(fmt);
}

static int vkms_plane_atomic_check(struct drm_plane *plane,
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index c87f6c89e7b4..d2aabb52cb46 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -11,6 +11,7 @@
#include <drm/drm_gem_shmem_helper.h>

#include "vkms_drv.h"
#include "vkms_formats.h"

static const u32 vkms_wb_formats[] = {
	DRM_FORMAT_XRGB8888,
@@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
	struct vkms_writeback_job *active_wb;
	struct vkms_frame_info *wb_frame_info;
	u32 wb_format = fb->format->format;

	if (!conn_state)
		return;
@@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
	crtc_state->wb_pending = true;
	spin_unlock_irq(&output->composer_lock);
	drm_writeback_queue_job(wb_conn, connector_state);
	active_wb->wb_write = get_line_to_frame_function(wb_format);
}

static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
-- 
2.30.2

[RESEND v6 7/9] drm: vkms: Supports to the case where primary plane doesn't match the CRTC

Details
Message ID
<20220819182411.20246-8-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +33 -33
We will remove the current assumption that the primary plane has the
same size and position as CRTC and that the primary plane is the
bottom-most in zpos order, or is even enabled. At least as far
as the blending machinery is concerned.

For that we will add CRTC dimension information to `vkms_crtc_state`
and add a opaque black backgound color.

Because now we need to fill the background, we had a loss in
performance with this change. Results running the IGT[1] test
`igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:

|                  Frametime                   |
|:--------------------------------------------:|
|  Implementation |  Previous |   This commit  |
|:---------------:|:---------:|:--------------:|
| frametime range |  5~18 ms  |     10~22 ms   |
|     Average     |  8.47 ms  |     12.32 ms   |

[1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4

V6: Improve the commit description (Pekka Paalanen).
    Update some comments (Pekka Paalanen).
    Remove some fields from `vkms_crtc_state` and move where
    some variables are set (Pekka Paalanen).

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 Documentation/gpu/vkms.rst            |  3 +-
 drivers/gpu/drm/vkms/vkms_composer.c  | 59 +++++++++++++--------------
 drivers/gpu/drm/vkms/vkms_writeback.c |  4 ++
 3 files changed, 33 insertions(+), 33 deletions(-)

diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
index a49e4ae92653..49db221c0f52 100644
--- a/Documentation/gpu/vkms.rst
+++ b/Documentation/gpu/vkms.rst
@@ -121,8 +121,7 @@ There's lots of plane features we could add support for:
- ARGB format on primary plane: blend the primary plane into background with
  translucent alpha.

- Support when the primary plane isn't exactly matching the output size: blend
  the primary plane into the black background.
- Add background color KMS property[Good to get started].

- Full alpha blending on all planes.

diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
index 5b1a8bdd8268..8e53fa80742b 100644
--- a/drivers/gpu/drm/vkms/vkms_composer.c
+++ b/drivers/gpu/drm/vkms/vkms_composer.c
@@ -61,6 +61,13 @@ static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
	return false;
}

static void fill_background(const struct pixel_argb_u16 *background_color,
			    struct line_buffer *output_buffer)
{
	for (size_t i = 0; i < output_buffer->n_pixels; i++)
		output_buffer->pixels[i] = *background_color;
}

/**
 * @wb_frame_info: The writeback frame buffer metadata
 * @crtc_state: The crtc state
@@ -78,21 +85,17 @@ static void blend(struct vkms_writeback_job *wb,
		  struct line_buffer *output_buffer, size_t row_size)
{
	struct vkms_plane_state **plane = crtc_state->active_planes;
	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
	u32 n_active_planes = crtc_state->num_active_planes;

	int y_dst = primary_plane_info->dst.y1;
	int h_dst = drm_rect_height(&primary_plane_info->dst);
	int y_limit = y_dst + h_dst;
	const struct pixel_argb_u16 background_color = { .a = 0xffff };

	for (size_t y = y_dst; y < y_limit; y++) {
		plane[0]->plane_read(output_buffer, primary_plane_info, y);
	size_t crtc_y_limit = crtc_state->base.crtc->mode.vdisplay;

		/* If there are other planes besides primary, we consider the active
		 * planes should be in z-order and compose them associatively:
		 * ((primary <- overlay) <- cursor)
		 */
		for (size_t i = 1; i < n_active_planes; i++) {
	for (size_t y = 0; y < crtc_y_limit; y++) {
		fill_background(&background_color, output_buffer);

		/* The active planes are composed associatively in z-order. */
		for (size_t i = 0; i < n_active_planes; i++) {
			if (!check_y_limit(plane[i]->frame_info, y))
				continue;

@@ -124,14 +127,24 @@ static int check_format_funcs(struct vkms_crtc_state *crtc_state,
	return 0;
}

static int check_iosys_map(struct vkms_crtc_state *crtc_state)
{
	struct vkms_plane_state **plane_state = crtc_state->active_planes;
	u32 n_active_planes = crtc_state->num_active_planes;

	for (size_t i = 0; i < n_active_planes; i++)
		if (iosys_map_is_null(&plane_state[i]->frame_info->map[0]))
			return -1;

	return 0;
}

static int compose_active_planes(struct vkms_writeback_job *active_wb,
				 struct vkms_crtc_state *crtc_state,
				 u32 *crc32)
{
	size_t line_width, pixel_size = sizeof(struct pixel_argb_u16);
	struct vkms_frame_info *primary_plane_info = NULL;
	struct line_buffer output_buffer, stage_buffer;
	struct vkms_plane_state *act_plane = NULL;
	int ret = 0;

	/*
@@ -142,22 +155,13 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
	 */
	static_assert(sizeof(struct pixel_argb_u16) == 8);

	if (crtc_state->num_active_planes >= 1) {
		act_plane = crtc_state->active_planes[0];
		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
			primary_plane_info = act_plane->frame_info;
	}

	if (!primary_plane_info)
		return -EINVAL;

	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
	if (WARN_ON(check_iosys_map(crtc_state)))
		return -EINVAL;

	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
		return -EINVAL;

	line_width = drm_rect_width(&primary_plane_info->dst);
	line_width = crtc_state->base.crtc->mode.hdisplay;
	stage_buffer.n_pixels = line_width;
	output_buffer.n_pixels = line_width;

@@ -174,13 +178,6 @@ static int compose_active_planes(struct vkms_writeback_job *active_wb,
		goto free_stage_buffer;
	}

	if (active_wb) {
		struct vkms_frame_info *wb_frame_info = &active_wb->wb_frame_info;

		wb_frame_info->src = primary_plane_info->src;
		wb_frame_info->dst = primary_plane_info->dst;
	}

	blend(active_wb, crtc_state, crc32, &stage_buffer,
	      &output_buffer, line_width * pixel_size);

diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index d2aabb52cb46..974db5defce4 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -122,6 +122,8 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
	struct drm_connector_state *conn_state = wb_conn->base.state;
	struct vkms_crtc_state *crtc_state = output->composer_state;
	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
	u16 crtc_height = crtc_state->base.crtc->mode.vdisplay;
	u16 crtc_width = crtc_state->base.crtc->mode.hdisplay;
	struct vkms_writeback_job *active_wb;
	struct vkms_frame_info *wb_frame_info;
	u32 wb_format = fb->format->format;
@@ -143,6 +145,8 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
	spin_unlock_irq(&output->composer_lock);
	drm_writeback_queue_job(wb_conn, connector_state);
	active_wb->wb_write = get_line_to_frame_function(wb_format);
	drm_rect_init(&wb_frame_info->src, 0, 0, crtc_width, crtc_height);
	drm_rect_init(&wb_frame_info->dst, 0, 0, crtc_width, crtc_height);
}

static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
-- 
2.30.2

[RESEND v6 8/9] drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats

Details
Message ID
<20220819182411.20246-9-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +83 -1
This will be useful to write tests that depends on these formats.

ARGB and XRGB follows the a similar implementation of the former formats.
Just adjusting for 16 bits per channel.

V3: Adapt the handlers to the new format introduced in patch 7 V3.
V5: Minor improvements
    Added le16_to_cpu/cpu_to_le16 to the 16 bits color read/writes.

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_formats.c   | 77 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_plane.c     |  5 +-
 drivers/gpu/drm/vkms/vkms_writeback.c |  2 +
 3 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index ca4bfcac686b..8b651ffcc743 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -78,6 +78,41 @@ static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
	}
}

static void ARGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
				     const struct vkms_frame_info *frame_info,
				     int y)
{
	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
	u16 *src_pixels = get_packed_src_addr(frame_info, y);
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    stage_buffer->n_pixels);

	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
		out_pixels[x].a = le16_to_cpu(src_pixels[3]);
		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
	}
}

static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
				     const struct vkms_frame_info *frame_info,
				     int y)
{
	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
	u16 *src_pixels = get_packed_src_addr(frame_info, y);
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    stage_buffer->n_pixels);

	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
		out_pixels[x].a = (u16)0xffff;
		out_pixels[x].r = le16_to_cpu(src_pixels[2]);
		out_pixels[x].g = le16_to_cpu(src_pixels[1]);
		out_pixels[x].b = le16_to_cpu(src_pixels[0]);
	}
}


/*
 * The following  functions take an line of argb_u16 pixels from the
 * src_buffer, convert them to a specific format, and store them in the
@@ -130,6 +165,40 @@ static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
	}
}

static void argb_u16_to_ARGB16161616(struct vkms_frame_info *frame_info,
				     const struct line_buffer *src_buffer, int y)
{
	int x_dst = frame_info->dst.x1;
	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    src_buffer->n_pixels);

	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
		dst_pixels[3] = cpu_to_le16(in_pixels[x].a);
		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
	}
}

static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
				     const struct line_buffer *src_buffer, int y)
{
	int x_dst = frame_info->dst.x1;
	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    src_buffer->n_pixels);

	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
		dst_pixels[3] = 0xffff;
		dst_pixels[2] = cpu_to_le16(in_pixels[x].r);
		dst_pixels[1] = cpu_to_le16(in_pixels[x].g);
		dst_pixels[0] = cpu_to_le16(in_pixels[x].b);
	}
}

frame_to_line_func get_frame_to_line_function(u32 format)
{
	switch (format) {
@@ -137,6 +206,10 @@ frame_to_line_func get_frame_to_line_function(u32 format)
		return &ARGB8888_to_argb_u16;
	case DRM_FORMAT_XRGB8888:
		return &XRGB8888_to_argb_u16;
	case DRM_FORMAT_ARGB16161616:
		return &ARGB16161616_to_argb_u16;
	case DRM_FORMAT_XRGB16161616:
		return &XRGB16161616_to_argb_u16;
	default:
		return NULL;
	}
@@ -149,6 +222,10 @@ line_to_frame_func get_line_to_frame_function(u32 format)
		return &argb_u16_to_ARGB8888;
	case DRM_FORMAT_XRGB8888:
		return &argb_u16_to_XRGB8888;
	case DRM_FORMAT_ARGB16161616:
		return &argb_u16_to_ARGB16161616;
	case DRM_FORMAT_XRGB16161616:
		return &argb_u16_to_XRGB16161616;
	default:
		return NULL;
	}
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 7a479a714565..0e33e3471d40 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -14,11 +14,14 @@

static const u32 vkms_formats[] = {
	DRM_FORMAT_XRGB8888,
	DRM_FORMAT_XRGB16161616
};

static const u32 vkms_plane_formats[] = {
	DRM_FORMAT_ARGB8888,
	DRM_FORMAT_XRGB8888
	DRM_FORMAT_XRGB8888,
	DRM_FORMAT_XRGB16161616,
	DRM_FORMAT_ARGB16161616
};

static struct drm_plane_state *
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index 974db5defce4..c417f94be2a2 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -15,6 +15,8 @@

static const u32 vkms_wb_formats[] = {
	DRM_FORMAT_XRGB8888,
	DRM_FORMAT_XRGB16161616,
	DRM_FORMAT_ARGB16161616
};

static const struct drm_connector_funcs vkms_wb_connector_funcs = {
-- 
2.30.2

[RESEND v6 9/9] drm: vkms: Add support to the RGB565 format

Details
Message ID
<20220819182411.20246-10-igormtorrente@gmail.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
Patch: +76 -3
This commit also adds new helper macros to deal with fixed-point
arithmetic.

It was done to improve the precision of the conversion to ARGB16161616
since the "conversion ratio" is not an integer.

V3: Adapt the handlers to the new format introduced in patch 7 V3.
V5: Minor improvements
V6: Minor improvements (Pekka Paalanen)

Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
---
 drivers/gpu/drm/vkms/vkms_formats.c   | 70 +++++++++++++++++++++++++++
 drivers/gpu/drm/vkms/vkms_plane.c     |  6 ++-
 drivers/gpu/drm/vkms/vkms_writeback.c |  3 +-
 3 files changed, 76 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
index 8b651ffcc743..3f6a3cdb81e5 100644
--- a/drivers/gpu/drm/vkms/vkms_formats.c
+++ b/drivers/gpu/drm/vkms/vkms_formats.c
@@ -5,6 +5,23 @@

#include "vkms_formats.h"

/* The following macros help doing fixed point arithmetic. */
/*
 * With Fixed-Point scale 15 we have 17 and 15 bits of integer and fractional
 * parts respectively.
 *  | 0000 0000 0000 0000 0.000 0000 0000 0000 |
 * 31                                          0
 */
#define SHIFT 15

#define INT_TO_FIXED(a) ((a) << SHIFT)
#define FIXED_MUL(a, b) ((s32)(((s64)(a) * (b)) >> SHIFT))
#define FIXED_DIV(a, b) ((s32)(((s64)(a) << SHIFT) / (b)))
/* This macro converts a fixed point number to int, and round half up it */
#define FIXED_TO_INT_ROUND(a) (((a) + (1 << (SHIFT - 1))) >> SHIFT)
#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))
#define INT_TO_FIXED_DIV(a, b) (FIXED_DIV(INT_TO_FIXED(a), INT_TO_FIXED(b)))

static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
{
	return frame_info->offset + (y * frame_info->pitch)
@@ -112,6 +129,30 @@ static void XRGB16161616_to_argb_u16(struct line_buffer *stage_buffer,
	}
}

static void RGB565_to_argb_u16(struct line_buffer *stage_buffer,
			       const struct vkms_frame_info *frame_info, int y)
{
	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
	u16 *src_pixels = get_packed_src_addr(frame_info, y);
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			       stage_buffer->n_pixels);

	s32 fp_rb_ratio = INT_TO_FIXED_DIV(65535, 31);
	s32 fp_g_ratio = INT_TO_FIXED_DIV(65535, 63);

	for (size_t x = 0; x < x_limit; x++, src_pixels++) {
		u16 rgb_565 = le16_to_cpu(*src_pixels);
		s32 fp_r = INT_TO_FIXED((rgb_565 >> 11) & 0x1f);
		s32 fp_g = INT_TO_FIXED((rgb_565 >> 5) & 0x3f);
		s32 fp_b = INT_TO_FIXED(rgb_565 & 0x1f);

		out_pixels[x].a = (u16)0xffff;
		out_pixels[x].r = FIXED_TO_INT_ROUND(FIXED_MUL(fp_r, fp_rb_ratio));
		out_pixels[x].g = FIXED_TO_INT_ROUND(FIXED_MUL(fp_g, fp_g_ratio));
		out_pixels[x].b = FIXED_TO_INT_ROUND(FIXED_MUL(fp_b, fp_rb_ratio));
	}
}


/*
 * The following  functions take an line of argb_u16 pixels from the
@@ -199,6 +240,31 @@ static void argb_u16_to_XRGB16161616(struct vkms_frame_info *frame_info,
	}
}

static void argb_u16_to_RGB565(struct vkms_frame_info *frame_info,
			       const struct line_buffer *src_buffer, int y)
{
	int x_dst = frame_info->dst.x1;
	u16 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
			    src_buffer->n_pixels);

	s32 fp_rb_ratio = INT_TO_FIXED_DIV(65535, 31);
	s32 fp_g_ratio = INT_TO_FIXED_DIV(65535, 63);

	for (size_t x = 0; x < x_limit; x++, dst_pixels++) {
		s32 fp_r = INT_TO_FIXED(in_pixels[x].r);
		s32 fp_g = INT_TO_FIXED(in_pixels[x].g);
		s32 fp_b = INT_TO_FIXED(in_pixels[x].b);

		u16 r = FIXED_TO_INT_ROUND(FIXED_DIV(fp_r, fp_rb_ratio));
		u16 g = FIXED_TO_INT_ROUND(FIXED_DIV(fp_g, fp_g_ratio));
		u16 b = FIXED_TO_INT_ROUND(FIXED_DIV(fp_b, fp_rb_ratio));

		*dst_pixels = cpu_to_le16(r << 11 | g << 5 | b);
	}
}

frame_to_line_func get_frame_to_line_function(u32 format)
{
	switch (format) {
@@ -210,6 +276,8 @@ frame_to_line_func get_frame_to_line_function(u32 format)
		return &ARGB16161616_to_argb_u16;
	case DRM_FORMAT_XRGB16161616:
		return &XRGB16161616_to_argb_u16;
	case DRM_FORMAT_RGB565:
		return &RGB565_to_argb_u16;
	default:
		return NULL;
	}
@@ -226,6 +294,8 @@ line_to_frame_func get_line_to_frame_function(u32 format)
		return &argb_u16_to_ARGB16161616;
	case DRM_FORMAT_XRGB16161616:
		return &argb_u16_to_XRGB16161616;
	case DRM_FORMAT_RGB565:
		return &argb_u16_to_RGB565;
	default:
		return NULL;
	}
diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
index 0e33e3471d40..53646ccf141b 100644
--- a/drivers/gpu/drm/vkms/vkms_plane.c
+++ b/drivers/gpu/drm/vkms/vkms_plane.c
@@ -14,14 +14,16 @@

static const u32 vkms_formats[] = {
	DRM_FORMAT_XRGB8888,
	DRM_FORMAT_XRGB16161616
	DRM_FORMAT_XRGB16161616,
	DRM_FORMAT_RGB565
};

static const u32 vkms_plane_formats[] = {
	DRM_FORMAT_ARGB8888,
	DRM_FORMAT_XRGB8888,
	DRM_FORMAT_XRGB16161616,
	DRM_FORMAT_ARGB16161616
	DRM_FORMAT_ARGB16161616,
	DRM_FORMAT_RGB565
};

static struct drm_plane_state *
diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
index c417f94be2a2..c6e4f3d7aa0d 100644
--- a/drivers/gpu/drm/vkms/vkms_writeback.c
+++ b/drivers/gpu/drm/vkms/vkms_writeback.c
@@ -16,7 +16,8 @@
static const u32 vkms_wb_formats[] = {
	DRM_FORMAT_XRGB8888,
	DRM_FORMAT_XRGB16161616,
	DRM_FORMAT_ARGB16161616
	DRM_FORMAT_ARGB16161616,
	DRM_FORMAT_RGB565
};

static const struct drm_connector_funcs vkms_wb_connector_funcs = {
-- 
2.30.2

Re: [RESEND v6 6/9] drm: vkms: Refactor the plane composer to accept new formats

Melissa Wen <mwen@igalia.com>
Details
Message ID
<20220820105046.cittsquvjvenw54z@mail.igalia.com>
In-Reply-To
<20220819182411.20246-7-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 08/19, Igor Torrente wrote:
> Currently the blend function only accepts XRGB_8888 and ARGB_8888
> as a color input.
> 
> This patch refactors all the functions related to the plane composition
> to overcome this limitation.
> 
> The pixels blend is done using the new internal format. And new handlers
> are being added to convert a specific format to/from this internal format.
> 
> So the blend operation depends on these handlers to convert to this common
> format. The blended result, if necessary, is converted to the writeback
> buffer format.
> 
> This patch introduces three major differences to the blend function.
> 1 - All the planes are blended at once.
> 2 - The blend calculus is done as per line instead of per pixel.
> 3 - It is responsible to calculates the CRC and writing the writeback
> buffer(if necessary).
> 
> These changes allow us to allocate way less memory in the intermediate
> buffer to compute these operations. Because now we don't need to
> have the entire intermediate image lines at once, just one line is
> enough.
> 
> | Memory consumption (output dimensions) |
> |:--------------------------------------:|
> |       Current      |     This patch    |
> |:------------------:|:-----------------:|
> |   Width * Heigth   |     2 * Width     |
> 
> Beyond memory, we also have a minor performance benefit from all
> these changes. Results running the IGT[1] test
> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
> 
> |                 Frametime                  |
> |:------------------------------------------:|
> |  Implementation |  Current  |  This commit |
> |:---------------:|:---------:|:------------:|
> | frametime range |  9~22 ms  |    5~17 ms   |
> |     Average     |  11.4 ms  |    7.8 ms    |
> 
> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
> 
> V2: Improves the performance drastically, by performing the operations
>     per-line and not per-pixel(Pekka Paalanen).
>     Minor improvements(Pekka Paalanen).
> V3: Changes the code to blend the planes all at once. This improves
>     performance, memory consumption, and removes much of the weirdness
>     of the V2(Pekka Paalanen and me).
>     Minor improvements(Pekka Paalanen and me).
> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
>     Several security/robustness improvents(Pekka Paalanen).
>     Removes check_planes_x_bounds function and allows partial
>     partly off-screen(Pekka Paalanen).
> V6: Fix a mismatch of some variable sizes (Pekka Paalanen).
>     Several minor improvements (Pekka Paalanen).
> 
> Reported-by: kernel test robot <lkp@intel.com>
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  Documentation/gpu/vkms.rst            |   4 -
>  drivers/gpu/drm/vkms/Makefile         |   1 +
>  drivers/gpu/drm/vkms/vkms_composer.c  | 320 ++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_formats.c   | 155 +++++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>  drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
>  drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
>  7 files changed, 317 insertions(+), 181 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> 
> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
> index 973e2d43108b..a49e4ae92653 100644
> --- a/Documentation/gpu/vkms.rst
> +++ b/Documentation/gpu/vkms.rst
> @@ -118,10 +118,6 @@ Add Plane Features
>  
>  There's lots of plane features we could add support for:
>  
> -- Clearing primary plane: clear primary plane before plane composition (at the
> -  start) for correctness of pixel blend ops. It also guarantees alpha channel
> -  is cleared in the target buffer for stable crc. [Good to get started]
> -
>  - ARGB format on primary plane: blend the primary plane into background with
>    translucent alpha.
>  
> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> index 72f779cbfedd..1b28a6a32948 100644
> --- a/drivers/gpu/drm/vkms/Makefile
> +++ b/drivers/gpu/drm/vkms/Makefile
> @@ -3,6 +3,7 @@ vkms-y := \
>  	vkms_drv.o \
>  	vkms_plane.o \
>  	vkms_output.o \
> +	vkms_formats.o \
>  	vkms_crtc.o \
>  	vkms_composer.o \
>  	vkms_writeback.o
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index b9fb408e8973..5b1a8bdd8268 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -7,204 +7,188 @@
>  #include <drm/drm_fourcc.h>
>  #include <drm/drm_gem_framebuffer_helper.h>
>  #include <drm/drm_vblank.h>
> +#include <linux/minmax.h>
>  
>  #include "vkms_drv.h"
>  
> -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> -				 const struct vkms_frame_info *frame_info)
> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>  {
> -	u32 pixel;
> -	int src_offset = frame_info->offset + (y * frame_info->pitch)
> -					    + (x * frame_info->cpp);
> +	u32 new_color;
>  
> -	pixel = *(u32 *)&buffer[src_offset];
> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>  
> -	return pixel;
> +	return DIV_ROUND_CLOSEST(new_color, 0xffff);
>  }
>  
>  /**
> - * compute_crc - Compute CRC value on output frame
> + * pre_mul_alpha_blend - alpha blending equation
> + * @src_frame_info: source framebuffer's metadata
> + * @stage_buffer: The line with the pixels from src_plane
> + * @output_buffer: A line buffer that receives all the blends output
>   *
> - * @vaddr: address to final framebuffer
> - * @frame_info: framebuffer's metadata
> + * Using the information from the `frame_info`, this blends only the
> + * necessary pixels from the `stage_buffer` to the `output_buffer`
> + * using premultiplied blend formula.
>   *
> - * returns CRC value computed using crc32 on the visible portion of
> - * the final framebuffer at vaddr_out
> + * The current DRM assumption is that pixel color values have been already
> + * pre-multiplied with the alpha channel values. See more
> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> + * completely opaque background.
>   */
> -static uint32_t compute_crc(const u8 *vaddr,
> -			    const struct vkms_frame_info *frame_info)
> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> +				struct line_buffer *stage_buffer,
> +				struct line_buffer *output_buffer)
>  {
> -	int x, y;
> -	u32 crc = 0, pixel = 0;
> -	int x_src = frame_info->src.x1 >> 16;
> -	int y_src = frame_info->src.y1 >> 16;
> -	int h_src = drm_rect_height(&frame_info->src) >> 16;
> -	int w_src = drm_rect_width(&frame_info->src) >> 16;
> -
> -	for (y = y_src; y < y_src + h_src; ++y) {
> -		for (x = x_src; x < x_src + w_src; ++x) {
> -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
> -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
> -		}
> +	int x_dst = frame_info->dst.x1;
> +	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> +	struct pixel_argb_u16 *in = stage_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    stage_buffer->n_pixels);
> +
> +	for (int x = 0; x < x_limit; x++) {
> +		out[x].a = (u16)0xffff;
> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>  	}
> -
> -	return crc;
>  }
>  
> -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>  {
> -	u32 pre_blend;
> -	u8 new_color;
> -
> -	pre_blend = (src * 255 + dst * (255 - alpha));
> -
> -	/* Faster div by 255 */
> -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
> +		return true;
>  
> -	return new_color;
> +	return false;
>  }
>  
>  /**
> - * alpha_blend - alpha blending equation
> - * @argb_src: src pixel on premultiplied alpha mode
> - * @argb_dst: dst pixel completely opaque
> + * @wb_frame_info: The writeback frame buffer metadata
> + * @crtc_state: The crtc state
> + * @crc32: The crc output of the final frame
> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
> + * @stage_buffer: The line with the pixels from plane being blend to the output
>   *
> - * blend pixels using premultiplied blend formula. The current DRM assumption
> - * is that pixel color values have been already pre-multiplied with the alpha
> - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
> - * formula assumes a completely opaque background.
> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
> + * from all planes, calculates the crc32 of the output from the former step,
> + * and, if necessary, convert and store the output to the writeback buffer.
>   */
> -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
> +static void blend(struct vkms_writeback_job *wb,
> +		  struct vkms_crtc_state *crtc_state,
> +		  u32 *crc32, struct line_buffer *stage_buffer,
> +		  struct line_buffer *output_buffer, size_t row_size)
>  {
> -	u8 alpha;
> +	struct vkms_plane_state **plane = crtc_state->active_planes;
> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
> +	u32 n_active_planes = crtc_state->num_active_planes;
> +
> +	int y_dst = primary_plane_info->dst.y1;
> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
> +	int y_limit = y_dst + h_dst;
> +
> +	for (size_t y = y_dst; y < y_limit; y++) {
> +		plane[0]->plane_read(output_buffer, primary_plane_info, y);
> +
> +		/* If there are other planes besides primary, we consider the active
> +		 * planes should be in z-order and compose them associatively:
> +		 * ((primary <- overlay) <- cursor)
> +		 */
> +		for (size_t i = 1; i < n_active_planes; i++) {
> +			if (!check_y_limit(plane[i]->frame_info, y))
> +				continue;
> +
> +			plane[i]->plane_read(stage_buffer, plane[i]->frame_info, y);
> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> +					    output_buffer);
> +		}
> +
> +		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
>  
> -	alpha = argb_src[3];
> -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
> -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
> -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
> +		if (wb)
> +			wb->wb_write(&wb->wb_frame_info, output_buffer, y);
> +	}
>  }
>  
> -/**
> - * x_blend - blending equation that ignores the pixel alpha
> - *
> - * overwrites RGB color value from src pixel to dst pixel.
> - */
> -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
> +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
> +			      struct vkms_writeback_job *active_wb)
>  {
> -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
> +	struct vkms_plane_state **planes = crtc_state->active_planes;
> +	u32 n_active_planes = crtc_state->num_active_planes;
> +
> +	for (size_t i = 0; i < n_active_planes; i++)
> +		if (!planes[i]->plane_read)
> +			return -1;
> +
> +	if (active_wb && !active_wb->wb_write)
> +		return -1;
> +
> +	return 0;
>  }
>  
> -/**
> - * blend - blend value at vaddr_src with value at vaddr_dst
> - * @vaddr_dst: destination address
> - * @vaddr_src: source address
> - * @dst_frame_info: destination framebuffer's metadata
> - * @src_frame_info: source framebuffer's metadata
> - * @pixel_blend: blending equation based on plane format
> - *
> - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
> - * and clearing alpha channel to an completely opaque background. This function
> - * uses buffer's metadata to locate the new composite values at vaddr_dst.
> - *
> - * TODO: completely clear the primary plane (a = 0xff) before starting to blend
> - * pixel color values
> - */
> -static void blend(void *vaddr_dst, void *vaddr_src,
> -		  struct vkms_frame_info *dst_frame_info,
> -		  struct vkms_frame_info *src_frame_info,
> -		  void (*pixel_blend)(const u8 *, u8 *))
> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
> +				 struct vkms_crtc_state *crtc_state,
> +				 u32 *crc32)
>  {
> -	int i, j, j_dst, i_dst;
> -	int offset_src, offset_dst;
> -	u8 *pixel_dst, *pixel_src;
> -
> -	int x_src = src_frame_info->src.x1 >> 16;
> -	int y_src = src_frame_info->src.y1 >> 16;
> -
> -	int x_dst = src_frame_info->dst.x1;
> -	int y_dst = src_frame_info->dst.y1;
> -	int h_dst = drm_rect_height(&src_frame_info->dst);
> -	int w_dst = drm_rect_width(&src_frame_info->dst);
> -
> -	int y_limit = y_src + h_dst;
> -	int x_limit = x_src + w_dst;
> -
> -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
> -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
> -			offset_dst = dst_frame_info->offset
> -				     + (i_dst * dst_frame_info->pitch)
> -				     + (j_dst++ * dst_frame_info->cpp);
> -			offset_src = src_frame_info->offset
> -				     + (i * src_frame_info->pitch)
> -				     + (j * src_frame_info->cpp);
> -
> -			pixel_src = (u8 *)(vaddr_src + offset_src);
> -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
> -			pixel_blend(pixel_src, pixel_dst);
> -			/* clearing alpha channel (0xff)*/
> -			pixel_dst[3] = 0xff;
> -		}
> -		i_dst++;
> +	size_t line_width, pixel_size = sizeof(struct pixel_argb_u16);
> +	struct vkms_frame_info *primary_plane_info = NULL;
> +	struct line_buffer output_buffer, stage_buffer;
> +	struct vkms_plane_state *act_plane = NULL;
> +	int ret = 0;
> +
> +	/*
> +	 * This check exists so we can call `crc32_le` for the entire line
> +	 * instead doing it for each channel of each pixel in case
> +	 * `struct `pixel_argb_u16` had any gap added by the compiler
> +	 * between the struct fields.
> +	 */
> +	static_assert(sizeof(struct pixel_argb_u16) == 8);
> +
> +	if (crtc_state->num_active_planes >= 1) {
> +		act_plane = crtc_state->active_planes[0];
> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> +			primary_plane_info = act_plane->frame_info;
>  	}
> -}
>  
> -static void compose_plane(struct vkms_frame_info *primary_plane_info,
> -			  struct vkms_frame_info *plane_frame_info,
> -			  void *vaddr_out)
> -{
> -	struct drm_framebuffer *fb = plane_frame_info->fb;
> -	void *vaddr;
> -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
> +	if (!primary_plane_info)
> +		return -EINVAL;
>  
>  	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> -		return;
> +		return -EINVAL;
>  
> -	vaddr = plane_frame_info->map[0].vaddr;
> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
> +		return -EINVAL;
>  
> -	if (fb->format->format == DRM_FORMAT_ARGB8888)
> -		pixel_blend = &alpha_blend;
> -	else
> -		pixel_blend = &x_blend;
> +	line_width = drm_rect_width(&primary_plane_info->dst);
> +	stage_buffer.n_pixels = line_width;
> +	output_buffer.n_pixels = line_width;
>  
> -	blend(vaddr_out, vaddr, primary_plane_info,
> -	      plane_frame_info, pixel_blend);
> -}
> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!stage_buffer.pixels) {
> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> +		return -ENOMEM;
> +	}
>  
> -static int compose_active_planes(void **vaddr_out,
> -				 struct vkms_frame_info *primary_plane_info,
> -				 struct vkms_crtc_state *crtc_state)
> -{
> -	struct drm_framebuffer *fb = primary_plane_info->fb;
> -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
> -	const void *vaddr;
> -	int i;
> -
> -	if (!*vaddr_out) {
> -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
> -		if (!*vaddr_out) {
> -			DRM_ERROR("Cannot allocate memory for output frame.");
> -			return -ENOMEM;
> -		}
> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> +	if (!output_buffer.pixels) {
> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> +		ret = -ENOMEM;
> +		goto free_stage_buffer;
>  	}
>  
> -	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> -		return -EINVAL;
> +	if (active_wb) {
> +		struct vkms_frame_info *wb_frame_info = &active_wb->wb_frame_info;
>  
> -	vaddr = primary_plane_info->map[0].vaddr;
> +		wb_frame_info->src = primary_plane_info->src;
> +		wb_frame_info->dst = primary_plane_info->dst;
> +	}
>  
> -	memcpy(*vaddr_out, vaddr, gem_obj->size);
> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
> +	      &output_buffer, line_width * pixel_size);
>  
> -	/* If there are other planes besides primary, we consider the active
> -	 * planes should be in z-order and compose them associatively:
> -	 * ((primary <- overlay) <- cursor)
> -	 */
> -	for (i = 1; i < crtc_state->num_active_planes; i++)
> -		compose_plane(primary_plane_info,
> -			      crtc_state->active_planes[i]->frame_info,
> -			      *vaddr_out);
> +	kvfree(output_buffer.pixels);
> +free_stage_buffer:
> +	kvfree(stage_buffer.pixels);
>  
> -	return 0;
> +	return ret;
>  }
>  
>  /**
> @@ -222,13 +206,11 @@ void vkms_composer_worker(struct work_struct *work)
>  						struct vkms_crtc_state,
>  						composer_work);
>  	struct drm_crtc *crtc = crtc_state->base.crtc;
> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>  	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> -	struct vkms_frame_info *primary_plane_info = NULL;
> -	struct vkms_plane_state *act_plane = NULL;
>  	bool crc_pending, wb_pending;
> -	void *vaddr_out = NULL;
> -	u32 crc32 = 0;
>  	u64 frame_start, frame_end;
> +	u32 crc32 = 0;
>  	int ret;
>  
>  	spin_lock_irq(&out->composer_lock);
> @@ -248,35 +230,19 @@ void vkms_composer_worker(struct work_struct *work)
>  	if (!crc_pending)
>  		return;
>  
> -	if (crtc_state->num_active_planes >= 1) {
> -		act_plane = crtc_state->active_planes[0];
> -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> -			primary_plane_info = act_plane->frame_info;
> -	}
> -
> -	if (!primary_plane_info)
> -		return;
> -
>  	if (wb_pending)
> -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
> +	else
> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>  
> -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
> -				    crtc_state);
> -	if (ret) {
> -		if (ret == -EINVAL && !wb_pending)
> -			kvfree(vaddr_out);
> +	if (ret)
>  		return;
> -	}
> -
> -	crc32 = compute_crc(vaddr_out, primary_plane_info);
>  
>  	if (wb_pending) {
>  		drm_writeback_signal_completion(&out->wb_connector, 0);
>  		spin_lock_irq(&out->composer_lock);
>  		crtc_state->wb_pending = false;
>  		spin_unlock_irq(&out->composer_lock);
> -	} else {
> -		kvfree(vaddr_out);
>  	}
>  
>  	/*
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> new file mode 100644
> index 000000000000..ca4bfcac686b
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> @@ -0,0 +1,155 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#include <drm/drm_rect.h>
> +#include <linux/minmax.h>
> +
> +#include "vkms_formats.h"
> +
> +static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> +{
> +	return frame_info->offset + (y * frame_info->pitch)
> +				  + (x * frame_info->cpp);
> +}
> +
> +/*
> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> + *
> + * @frame_info: Buffer metadata
> + * @x: The x(width) coordinate of the 2D buffer
> + * @y: The y(Heigth) coordinate of the 2D buffer
> + *
> + * Takes the information stored in the frame_info, a pair of coordinates, and
> + * returns the address of the first color channel.
> + * This function assumes the channels are packed together, i.e. a color channel
> + * comes immediately after another in the memory. And therefore, this function
> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> + */
> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> +				int x, int y)
> +{
> +	size_t offset = pixel_offset(frame_info, x, y);
> +
> +	return (u8 *)frame_info->map[0].vaddr + offset;
> +}
> +
> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> +{
> +	int x_src = frame_info->src.x1 >> 16;
> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> +
> +	return packed_pixels_addr(frame_info, x_src, y_src);
> +}
> +
> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> +				 const struct vkms_frame_info *frame_info, int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    stage_buffer->n_pixels);
> +
> +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
> +		/*
> +		 * The 257 is the "conversion ratio". This number is obtained by the
> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> +		 * the best color value in a pixel format with more possibilities.
> +		 * A similar idea applies to others RGB color conversions.
> +		 */
> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> +				 const struct vkms_frame_info *frame_info, int y)
> +{
> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    stage_buffer->n_pixels);
> +
> +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
> +		out_pixels[x].a = (u16)0xffff;
> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> +	}
> +}
> +
> +/*
> + * The following  functions take an line of argb_u16 pixels from the
> + * src_buffer, convert them to a specific format, and store them in the
> + * destination.
> + *
> + * They are used in the `compose_active_planes` to convert and store a line
> + * from the src_buffer to the writeback buffer.
> + */
> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
> +				 const struct line_buffer *src_buffer, int y)
> +{
> +	int x_dst = frame_info->dst.x1;
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		/*
> +		 * This sequence below is important because the format's byte order is
> +		 * in little-endian. In the case of the ARGB8888 the memory is
> +		 * organized this way:
> +		 *
> +		 * | Addr     | = blue channel
> +		 * | Addr + 1 | = green channel
> +		 * | Addr + 2 | = Red channel
> +		 * | Addr + 3 | = Alpha channel
> +		 */
> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> +	}
> +}
> +
> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> +				 const struct line_buffer *src_buffer, int y)
> +{
> +	int x_dst = frame_info->dst.x1;
> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> +			    src_buffer->n_pixels);
> +
> +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
> +		dst_pixels[3] = 0xff;
> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> +	}
> +}
> +
> +frame_to_line_func get_frame_to_line_function(u32 format)
> +{
> +	switch (format) {
> +	case DRM_FORMAT_ARGB8888:
> +		return &ARGB8888_to_argb_u16;
> +	case DRM_FORMAT_XRGB8888:
> +		return &XRGB8888_to_argb_u16;
> +	default:
> +		return NULL;
> +	}
> +}
> +
> +line_to_frame_func get_line_to_frame_function(u32 format)
> +{
> +	switch (format) {
> +	case DRM_FORMAT_ARGB8888:
> +		return &argb_u16_to_ARGB8888;
> +	case DRM_FORMAT_XRGB8888:
> +		return &argb_u16_to_XRGB8888;
> +	default:
> +		return NULL;
> +	}
> +}
> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> new file mode 100644
> index 000000000000..053ca42d5b31
> --- /dev/null
> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +
> +#ifndef _VKMS_FORMATS_H_
> +#define _VKMS_FORMATS_H_
> +
> +#include "vkms_drv.h"
> +
> +frame_to_line_func get_frame_to_line_function(u32 format);
> +
> +line_to_frame_func get_line_to_frame_function(u32 format);
> +
> +#endif /* _VKMS_FORMATS_H_ */
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index 8adbfdc05e50..7a479a714565 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -10,6 +10,7 @@
>  #include <drm/drm_plane_helper.h>
>  
>  #include "vkms_drv.h"
> +#include "vkms_formats.h"
^ this line no longer applies (needs to rebase), but I can manage it before apply to drm-misc-next
>  
>  static const u32 vkms_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	struct drm_shadow_plane_state *shadow_plane_state;
>  	struct drm_framebuffer *fb = new_state->fb;
>  	struct vkms_frame_info *frame_info;
> +	u32 fmt = fb->format->format;
>  
>  	if (!new_state->crtc || !fb)
>  		return;
> @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	frame_info->offset = fb->offsets[0];
>  	frame_info->pitch = fb->pitches[0];
>  	frame_info->cpp = fb->format->cpp[0];
> +	vkms_plane_state->plane_read = get_frame_to_line_function(fmt);
>  }
>  
>  static int vkms_plane_atomic_check(struct drm_plane *plane,
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index c87f6c89e7b4..d2aabb52cb46 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -11,6 +11,7 @@
>  #include <drm/drm_gem_shmem_helper.h>
>  
>  #include "vkms_drv.h"
> +#include "vkms_formats.h"
>  
>  static const u32 vkms_wb_formats[] = {
>  	DRM_FORMAT_XRGB8888,
> @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
>  	struct vkms_writeback_job *active_wb;
>  	struct vkms_frame_info *wb_frame_info;
> +	u32 wb_format = fb->format->format;
>  
>  	if (!conn_state)
>  		return;
> @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	crtc_state->wb_pending = true;
>  	spin_unlock_irq(&output->composer_lock);
>  	drm_writeback_queue_job(wb_conn, connector_state);
> +	active_wb->wb_write = get_line_to_frame_function(wb_format);
>  }
>  
>  static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
> -- 
> 2.30.2
> 

Re: [RESEND v6 2/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info`

Melissa Wen <mwen@igalia.com>
Details
Message ID
<20220820110007.wk5wugdfpya4eb7w@mail.igalia.com>
In-Reply-To
<20220819182411.20246-3-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 08/19, Igor Torrente wrote:
> Changes the name of this struct to a more meaningful name.
> A name that represents better what this struct is about.
> 
> Composer is the code that do the compositing of the planes.
> This struct contains information on the frame used in the output
> composition. Thus, vkms_frame_info is a better name to represent
> this.
> 
> V5: Fix a commit message typo(Melissa Wen).
> 
> Reviewed-by: Melissa Wen <mwen@igalia.com>
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  drivers/gpu/drm/vkms/vkms_composer.c | 87 ++++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_drv.h      |  6 +-
>  drivers/gpu/drm/vkms/vkms_plane.c    | 38 ++++++------
>  3 files changed, 66 insertions(+), 65 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> index 775b97766e08..0aded4e87e60 100644
> --- a/drivers/gpu/drm/vkms/vkms_composer.c
> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> @@ -11,11 +11,11 @@
>  #include "vkms_drv.h"
>  
>  static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> -				 const struct vkms_composer *composer)
> +				 const struct vkms_frame_info *frame_info)
>  {
>  	u32 pixel;
> -	int src_offset = composer->offset + (y * composer->pitch)
> -				      + (x * composer->cpp);
> +	int src_offset = frame_info->offset + (y * frame_info->pitch)
> +					    + (x * frame_info->cpp);
>  
>  	pixel = *(u32 *)&buffer[src_offset];
>  
> @@ -26,24 +26,24 @@ static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>   * compute_crc - Compute CRC value on output frame
>   *
>   * @vaddr: address to final framebuffer
> - * @composer: framebuffer's metadata
> + * @frame_info: framebuffer's metadata
>   *
>   * returns CRC value computed using crc32 on the visible portion of
>   * the final framebuffer at vaddr_out
>   */
>  static uint32_t compute_crc(const u8 *vaddr,
> -			    const struct vkms_composer *composer)
> +			    const struct vkms_frame_info *frame_info)
>  {
>  	int x, y;
>  	u32 crc = 0, pixel = 0;
> -	int x_src = composer->src.x1 >> 16;
> -	int y_src = composer->src.y1 >> 16;
> -	int h_src = drm_rect_height(&composer->src) >> 16;
> -	int w_src = drm_rect_width(&composer->src) >> 16;
> +	int x_src = frame_info->src.x1 >> 16;
> +	int y_src = frame_info->src.y1 >> 16;
> +	int h_src = drm_rect_height(&frame_info->src) >> 16;
> +	int w_src = drm_rect_width(&frame_info->src) >> 16;
>  
>  	for (y = y_src; y < y_src + h_src; ++y) {
>  		for (x = x_src; x < x_src + w_src; ++x) {
> -			pixel = get_pixel_from_buffer(x, y, vaddr, composer);
> +			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
>  			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
>  		}
>  	}
> @@ -98,8 +98,8 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>   * blend - blend value at vaddr_src with value at vaddr_dst
>   * @vaddr_dst: destination address
>   * @vaddr_src: source address
> - * @dst_composer: destination framebuffer's metadata
> - * @src_composer: source framebuffer's metadata
> + * @dst_frame_info: destination framebuffer's metadata
> + * @src_frame_info: source framebuffer's metadata
>   * @pixel_blend: blending equation based on plane format
>   *
>   * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> @@ -111,33 +111,33 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>   * pixel color values
>   */
>  static void blend(void *vaddr_dst, void *vaddr_src,
> -		  struct vkms_composer *dst_composer,
> -		  struct vkms_composer *src_composer,
> +		  struct vkms_frame_info *dst_frame_info,
> +		  struct vkms_frame_info *src_frame_info,
>  		  void (*pixel_blend)(const u8 *, u8 *))
>  {
>  	int i, j, j_dst, i_dst;
>  	int offset_src, offset_dst;
>  	u8 *pixel_dst, *pixel_src;
>  
> -	int x_src = src_composer->src.x1 >> 16;
> -	int y_src = src_composer->src.y1 >> 16;
> +	int x_src = src_frame_info->src.x1 >> 16;
> +	int y_src = src_frame_info->src.y1 >> 16;
>  
> -	int x_dst = src_composer->dst.x1;
> -	int y_dst = src_composer->dst.y1;
> -	int h_dst = drm_rect_height(&src_composer->dst);
> -	int w_dst = drm_rect_width(&src_composer->dst);
> +	int x_dst = src_frame_info->dst.x1;
> +	int y_dst = src_frame_info->dst.y1;
> +	int h_dst = drm_rect_height(&src_frame_info->dst);
> +	int w_dst = drm_rect_width(&src_frame_info->dst);
>  
>  	int y_limit = y_src + h_dst;
>  	int x_limit = x_src + w_dst;
>  
>  	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
>  		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
> -			offset_dst = dst_composer->offset
> -				     + (i_dst * dst_composer->pitch)
> -				     + (j_dst++ * dst_composer->cpp);
> -			offset_src = src_composer->offset
> -				     + (i * src_composer->pitch)
> -				     + (j * src_composer->cpp);
> +			offset_dst = dst_frame_info->offset
> +				     + (i_dst * dst_frame_info->pitch)
> +				     + (j_dst++ * dst_frame_info->cpp);
> +			offset_src = src_frame_info->offset
> +				     + (i * src_frame_info->pitch)
> +				     + (j * src_frame_info->cpp);
>  
>  			pixel_src = (u8 *)(vaddr_src + offset_src);
>  			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
> @@ -149,32 +149,33 @@ static void blend(void *vaddr_dst, void *vaddr_src,
>  	}
>  }
>  
> -static void compose_plane(struct vkms_composer *primary_composer,
> -			  struct vkms_composer *plane_composer,
> +static void compose_plane(struct vkms_frame_info *primary_plane_info,
> +			  struct vkms_frame_info *plane_frame_info,
>  			  void *vaddr_out)
>  {
> -	struct drm_framebuffer *fb = &plane_composer->fb;
> +	struct drm_framebuffer *fb = &plane_frame_info->fb;
>  	void *vaddr;
>  	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>  
> -	if (WARN_ON(iosys_map_is_null(&plane_composer->map[0])))
> +	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
^ here you are reintroducing an error that we were checking primary
plane repeatedly, instead of plane_composer (renamed to:
plane_frame_info here). The issue is fixed in a following patch of this
series when you decouple check_iosys_map.
But I don't mind fixing it before apply.

>  		return;
>  
> -	vaddr = plane_composer->map[0].vaddr;
> +	vaddr = plane_frame_info->map[0].vaddr;
>  
>  	if (fb->format->format == DRM_FORMAT_ARGB8888)
>  		pixel_blend = &alpha_blend;
>  	else
>  		pixel_blend = &x_blend;
>  
> -	blend(vaddr_out, vaddr, primary_composer, plane_composer, pixel_blend);
> +	blend(vaddr_out, vaddr, primary_plane_info,
> +	      plane_frame_info, pixel_blend);
>  }
>  
>  static int compose_active_planes(void **vaddr_out,
> -				 struct vkms_composer *primary_composer,
> +				 struct vkms_frame_info *primary_plane_info,
>  				 struct vkms_crtc_state *crtc_state)
>  {
> -	struct drm_framebuffer *fb = &primary_composer->fb;
> +	struct drm_framebuffer *fb = &primary_plane_info->fb;
>  	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>  	const void *vaddr;
>  	int i;
> @@ -187,10 +188,10 @@ static int compose_active_planes(void **vaddr_out,
>  		}
>  	}
>  
> -	if (WARN_ON(iosys_map_is_null(&primary_composer->map[0])))
> +	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
>  		return -EINVAL;
>  
> -	vaddr = primary_composer->map[0].vaddr;
> +	vaddr = primary_plane_info->map[0].vaddr;
>  
>  	memcpy(*vaddr_out, vaddr, gem_obj->size);
>  
> @@ -199,8 +200,8 @@ static int compose_active_planes(void **vaddr_out,
>  	 * ((primary <- overlay) <- cursor)
>  	 */
>  	for (i = 1; i < crtc_state->num_active_planes; i++)
> -		compose_plane(primary_composer,
> -			      crtc_state->active_planes[i]->composer,
> +		compose_plane(primary_plane_info,
> +			      crtc_state->active_planes[i]->frame_info,
>  			      *vaddr_out);
>  
>  	return 0;
> @@ -222,7 +223,7 @@ void vkms_composer_worker(struct work_struct *work)
>  						composer_work);
>  	struct drm_crtc *crtc = crtc_state->base.crtc;
>  	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> -	struct vkms_composer *primary_composer = NULL;
> +	struct vkms_frame_info *primary_plane_info = NULL;
>  	struct vkms_plane_state *act_plane = NULL;
>  	bool crc_pending, wb_pending;
>  	void *vaddr_out = NULL;
> @@ -250,16 +251,16 @@ void vkms_composer_worker(struct work_struct *work)
>  	if (crtc_state->num_active_planes >= 1) {
>  		act_plane = crtc_state->active_planes[0];
>  		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> -			primary_composer = act_plane->composer;
> +			primary_plane_info = act_plane->frame_info;
>  	}
>  
> -	if (!primary_composer)
> +	if (!primary_plane_info)
>  		return;
>  
>  	if (wb_pending)
>  		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
>  
> -	ret = compose_active_planes(&vaddr_out, primary_composer,
> +	ret = compose_active_planes(&vaddr_out, primary_plane_info,
>  				    crtc_state);
>  	if (ret) {
>  		if (ret == -EINVAL && !wb_pending)
> @@ -267,7 +268,7 @@ void vkms_composer_worker(struct work_struct *work)
>  		return;
>  	}
>  
> -	crc32 = compute_crc(vaddr_out, primary_composer);
> +	crc32 = compute_crc(vaddr_out, primary_plane_info);
>  
>  	if (wb_pending) {
>  		drm_writeback_signal_completion(&out->wb_connector, 0);
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 36fbab5989d1..5199c5f18e17 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -27,7 +27,7 @@ struct vkms_writeback_job {
>  	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
>  };
>  
> -struct vkms_composer {
> +struct vkms_frame_info {
>  	struct drm_framebuffer fb;
>  	struct drm_rect src, dst;
>  	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
> @@ -39,11 +39,11 @@ struct vkms_composer {
>  /**
>   * vkms_plane_state - Driver specific plane state
>   * @base: base plane state
> - * @composer: data required for composing computation
> + * @frame_info: data required for composing computation
>   */
>  struct vkms_plane_state {
>  	struct drm_shadow_plane_state base;
> -	struct vkms_composer *composer;
> +	struct vkms_frame_info *frame_info;
>  };
>  
>  struct vkms_plane {
> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> index d8eb674b49a6..fcae6c508f4b 100644
> --- a/drivers/gpu/drm/vkms/vkms_plane.c
> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> @@ -24,20 +24,20 @@ static struct drm_plane_state *
>  vkms_plane_duplicate_state(struct drm_plane *plane)
>  {
>  	struct vkms_plane_state *vkms_state;
> -	struct vkms_composer *composer;
> +	struct vkms_frame_info *frame_info;
>  
>  	vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL);
>  	if (!vkms_state)
>  		return NULL;
>  
> -	composer = kzalloc(sizeof(*composer), GFP_KERNEL);
> -	if (!composer) {
> -		DRM_DEBUG_KMS("Couldn't allocate composer\n");
> +	frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL);
> +	if (!frame_info) {
> +		DRM_DEBUG_KMS("Couldn't allocate frame_info\n");
>  		kfree(vkms_state);
>  		return NULL;
>  	}
>  
> -	vkms_state->composer = composer;
> +	vkms_state->frame_info = frame_info;
>  
>  	__drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base);
>  
> @@ -54,12 +54,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
>  		/* dropping the reference we acquired in
>  		 * vkms_primary_plane_update()
>  		 */
> -		if (drm_framebuffer_read_refcount(&vkms_state->composer->fb))
> -			drm_framebuffer_put(&vkms_state->composer->fb);
> +		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
> +			drm_framebuffer_put(&vkms_state->frame_info->fb);
>  	}
>  
> -	kfree(vkms_state->composer);
> -	vkms_state->composer = NULL;
> +	kfree(vkms_state->frame_info);
> +	vkms_state->frame_info = NULL;
>  
>  	__drm_gem_destroy_shadow_plane_state(&vkms_state->base);
>  	kfree(vkms_state);
> @@ -99,7 +99,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	struct vkms_plane_state *vkms_plane_state;
>  	struct drm_shadow_plane_state *shadow_plane_state;
>  	struct drm_framebuffer *fb = new_state->fb;
> -	struct vkms_composer *composer;
> +	struct vkms_frame_info *frame_info;
>  
>  	if (!new_state->crtc || !fb)
>  		return;
> @@ -107,15 +107,15 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>  	vkms_plane_state = to_vkms_plane_state(new_state);
>  	shadow_plane_state = &vkms_plane_state->base;
>  
> -	composer = vkms_plane_state->composer;
> -	memcpy(&composer->src, &new_state->src, sizeof(struct drm_rect));
> -	memcpy(&composer->dst, &new_state->dst, sizeof(struct drm_rect));
> -	memcpy(&composer->fb, fb, sizeof(struct drm_framebuffer));
> -	memcpy(&composer->map, &shadow_plane_state->data, sizeof(composer->map));
> -	drm_framebuffer_get(&composer->fb);
> -	composer->offset = fb->offsets[0];
> -	composer->pitch = fb->pitches[0];
> -	composer->cpp = fb->format->cpp[0];
> +	frame_info = vkms_plane_state->frame_info;
> +	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
> +	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
> +	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
> +	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
> +	drm_framebuffer_get(&frame_info->fb);
> +	frame_info->offset = fb->offsets[0];
> +	frame_info->pitch = fb->pitches[0];
> +	frame_info->cpp = fb->format->cpp[0];
>  }
>  
>  static int vkms_plane_atomic_check(struct drm_plane *plane,
> -- 
> 2.30.2
> 
Melissa Wen <mwen@igalia.com>
Details
Message ID
<20220820110708.pdqau4g4mc4r54hp@mail.igalia.com>
In-Reply-To
<20220819182411.20246-1-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 08/19, Igor Torrente wrote:
> Summary
> =======
> This series of patches refactor some vkms components in order to introduce
> new formats to the planes and writeback connector.
> 
> Now in the blend function, the plane's pixels are converted to ARGB16161616
> and then blended together.
> 
> The CRC is calculated based on the ARGB1616161616 buffer. And if required,
> this buffer is copied/converted to the writeback buffer format.
> 
> And to handle the pixel conversion, new functions were added to convert
> from a specific format to ARGB16161616 (the reciprocal is also true).

Hi Igor,

I missed it after taking some weeks off.

The entire series LGTM.
I pointed out some nitpicks, but I'll handle when applying to
drm-misc-next.

The series is:
Reviewed-by: Melissa Wen <mwen@igalia.com>

Thank you,

Melissa

> 
> Tests
> =====
> This patch series was tested using the following igt tests:
> -t ".*kms_plane.*"
> -t ".*kms_writeback.*"
> -t ".*kms_cursor_crc*"
> -t ".*kms_flip.*"
> 
> New tests passing
> -------------------
> - pipe-A-cursor-size-change
> - pipe-A-cursor-alpha-transparent
> 
> Performance
> -----------
> It's running slightly faster than the current implementation.
> 
> Results running the IGT[1] test
> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
> 
> |                  Frametime                   |
> |:--------------------------------------------:|
> |  Implementation |  Current  |   This commit  |
> |:---------------:|:---------:|:--------------:|
> | frametime range |  9~22 ms  |     10~22 ms   |
> |     Average     |  11.4 ms  |     12.32 ms   |
> 
> Memory consumption
> ==================
> It consumes less memory than the current implementation in
> the common case (more detail in the commit message).
> 
> | Memory consumption (output dimensions) |
> |:--------------------------------------:|
> |       Current      |     This patch    |
> |:------------------:|:-----------------:|
> |   Width * Heigth   |     2 * Width     |
> 
> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
> 
> XRGB to ARGB behavior
> =====================
> During the development, I decided to always fill the alpha channel of
> the output pixel whenever the conversion from a format without an alpha
> channel to ARGB16161616 is necessary. Therefore, I ignore the value
> received from the XRGB and overwrite the value with 0xFFFF.
> 
> Primary plane and CRTC size
> ===========================
> This patch series reworks the blend function to accept a primary plane with
> a different size and position from CRTC.
> Because now we need to fill the background, we had a loss in
> performance with this change
> 
> Alpha channel output for XRGB formats
> =====================================
> There's still an open question about which value the writeback alpha channel
> should be for XRGB formats.
> The current igt test implementation is expecting the channel to not be change.
> But it's not entirely clear if this should be the behavior followed by vkms
> (or any other driver).
> 
> Open issue: https://gitlab.freedesktop.org/drm/igt-gpu-tools/-/issues/118
> ---
> 
> Igor Torrente (9):
>   drm: vkms: Replace hardcoded value of `vkms_composer.map` to
>     DRM_FORMAT_MAX_PLANES
>   drm: vkms: Rename `vkms_composer` to `vkms_frame_info`
>   drm: drm_atomic_helper: Add a new helper to deal with the writeback
>     connector validation
>   drm: vkms: get the reference to `drm_framebuffer` instead if coping it
>   drm: vkms: Add fb information to `vkms_writeback_job`
>   drm: vkms: Refactor the plane composer to accept new formats
>   drm: vkms: Supports to the case where primary plane doesn't match the
>     CRTC
>   drm: vkms: Adds XRGB_16161616 and ARGB_1616161616 formats
>   drm: vkms: Add support to the RGB565 format
> 
>  Documentation/gpu/vkms.rst            |   7 +-
>  drivers/gpu/drm/drm_atomic_helper.c   |  39 ++++
>  drivers/gpu/drm/vkms/Makefile         |   1 +
>  drivers/gpu/drm/vkms/vkms_composer.c  | 314 ++++++++++++--------------
>  drivers/gpu/drm/vkms/vkms_drv.h       |  39 +++-
>  drivers/gpu/drm/vkms/vkms_formats.c   | 302 +++++++++++++++++++++++++
>  drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>  drivers/gpu/drm/vkms/vkms_plane.c     |  50 ++--
>  drivers/gpu/drm/vkms/vkms_writeback.c |  39 +++-
>  include/drm/drm_atomic_helper.h       |   3 +
>  10 files changed, 586 insertions(+), 220 deletions(-)
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>  create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> 
> -- 
> 2.30.2
> 

Re: [RESEND v6 5/9] drm: vkms: Add fb information to `vkms_writeback_job`

Melissa Wen <mwen@igalia.com>
Details
Message ID
<20220821235918.2ver4c2vzlcmkbaa@mail.igalia.com>
In-Reply-To
<20220819182411.20246-6-igormtorrente@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 08/19, Igor Torrente wrote:
> This commit is the groundwork to introduce new formats to the planes and
> writeback buffer. As part of it, a new buffer metadata field is added to
> `vkms_writeback_job`, this metadata is represented by the `vkms_frame_info`
> struct.
> 
> Also adds two new function pointers (`line_to_frame_func` and
> `frame_to_line_func`) are defined to handle format conversion
> from/to internal format.
> 
> A new internal format(`struct pixel_argb_u16`) is introduced to deal with
> all possible inputs. It consists of 16 bits fields that represent each of
> the channels.
> 
> These things will allow us, in the future, to have different compositing
> and wb format types.
> 
> V2: Change the code to get the drm_framebuffer reference and not copy its
>     contents (Thomas Zimmermann).
> V3: Drop the refcount in the wb code (Thomas Zimmermann).
> V5: Add {wb,plane}_format_transform_func to vkms_writeback_job
>     and vkms_plane_state (Pekka Paalanen)
> V6: Improvements to some struct/struct members names (Pekka Paalanen).
>     Splits this patch in two (Pekka Paalanen).
> 
> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> ---
>  drivers/gpu/drm/vkms/vkms_drv.h       | 29 ++++++++++++++++++++++-----
>  drivers/gpu/drm/vkms/vkms_writeback.c | 20 +++++++++++++++---
>  2 files changed, 41 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> index 95d71322500b..0d407ec84f94 100644
> --- a/drivers/gpu/drm/vkms/vkms_drv.h
> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> @@ -22,11 +22,6 @@
>  
>  #define NUM_OVERLAY_PLANES 8
>  
> -struct vkms_writeback_job {
> -	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
> -	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
> -};
> -
>  struct vkms_frame_info {
>  	struct drm_framebuffer *fb;
>  	struct drm_rect src, dst;
> @@ -36,6 +31,29 @@ struct vkms_frame_info {
>  	unsigned int cpp;
>  };
>  
> +struct pixel_argb_u16 {
> +	u16 a, r, g, b;
> +};
> +
> +struct line_buffer {
> +	size_t n_pixels;
> +	struct pixel_argb_u16 *pixels;
> +};
> +
> +typedef void
> +(*line_to_frame_func)(struct vkms_frame_info *frame_info,
> +		      const struct line_buffer *buffer, int y);
> +
> +typedef void
> +(*frame_to_line_func)(struct line_buffer *buffer,
> +		      const struct vkms_frame_info *frame_info, int y);

Checkpatch complains about this two new typedefs. In fact, I think a
better approach is to move line_to_frame_func as a element of struct
vkms_writeback_job and frame_to_line_func to vkms_plane_state and the
return type of get functions as void *, accordingly.

That said, now I think is better to send a next version that rebases,
corrects the iosys_map (an issue that I mentioned before) and addresses
this typedef issue. Also, you can already add my r-b in the next
version.

Thanks,

Melissa
> +
> +struct vkms_writeback_job {
> +	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
> +	struct vkms_frame_info wb_frame_info;
> +	line_to_frame_func wb_write;
> +};
> +
>  /**
>   * vkms_plane_state - Driver specific plane state
>   * @base: base plane state
> @@ -44,6 +62,7 @@ struct vkms_frame_info {
>  struct vkms_plane_state {
>  	struct drm_shadow_plane_state base;
>  	struct vkms_frame_info *frame_info;
> +	frame_to_line_func plane_read;
>  };
>  
>  struct vkms_plane {
> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> index 250e509a298f..c87f6c89e7b4 100644
> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> @@ -74,12 +74,15 @@ static int vkms_wb_prepare_job(struct drm_writeback_connector *wb_connector,
>  	if (!vkmsjob)
>  		return -ENOMEM;
>  
> -	ret = drm_gem_fb_vmap(job->fb, vkmsjob->map, vkmsjob->data);
> +	ret = drm_gem_fb_vmap(job->fb, vkmsjob->wb_frame_info.map, vkmsjob->data);
>  	if (ret) {
>  		DRM_ERROR("vmap failed: %d\n", ret);
>  		goto err_kfree;
>  	}
>  
> +	vkmsjob->wb_frame_info.fb = job->fb;
> +	drm_framebuffer_get(vkmsjob->wb_frame_info.fb);
> +
>  	job->priv = vkmsjob;
>  
>  	return 0;
> @@ -98,7 +101,9 @@ static void vkms_wb_cleanup_job(struct drm_writeback_connector *connector,
>  	if (!job->fb)
>  		return;
>  
> -	drm_gem_fb_vunmap(job->fb, vkmsjob->map);
> +	drm_gem_fb_vunmap(job->fb, vkmsjob->wb_frame_info.map);
> +
> +	drm_framebuffer_put(vkmsjob->wb_frame_info.fb);
>  
>  	vkmsdev = drm_device_to_vkms_device(job->fb->dev);
>  	vkms_set_composer(&vkmsdev->output, false);
> @@ -115,14 +120,23 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>  	struct drm_writeback_connector *wb_conn = &output->wb_connector;
>  	struct drm_connector_state *conn_state = wb_conn->base.state;
>  	struct vkms_crtc_state *crtc_state = output->composer_state;
> +	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
> +	struct vkms_writeback_job *active_wb;
> +	struct vkms_frame_info *wb_frame_info;
>  
>  	if (!conn_state)
>  		return;
>  
>  	vkms_set_composer(&vkmsdev->output, true);
>  
> +	active_wb = conn_state->writeback_job->priv;
> +	wb_frame_info = &active_wb->wb_frame_info;
> +
>  	spin_lock_irq(&output->composer_lock);
> -	crtc_state->active_writeback = conn_state->writeback_job->priv;
> +	crtc_state->active_writeback = active_wb;
> +	wb_frame_info->offset = fb->offsets[0];
> +	wb_frame_info->pitch = fb->pitches[0];
> +	wb_frame_info->cpp = fb->format->cpp[0];
>  	crtc_state->wb_pending = true;
>  	spin_unlock_irq(&output->composer_lock);
>  	drm_writeback_queue_job(wb_conn, connector_state);
> -- 
> 2.30.2
> 

Re: [RESEND v6 6/9] drm: vkms: Refactor the plane composer to accept new formats

Igor Matheus Andrade Torrente <igormtorrente@gmail.com>
Details
Message ID
<b90f2c07-18ad-411f-82ec-914974cf8d2c@gmail.com>
In-Reply-To
<20220820105046.cittsquvjvenw54z@mail.igalia.com> (view parent)
DKIM signature
missing
Download raw message
Hi Melissa,

On 8/20/22 07:51, Melissa Wen wrote:
> On 08/19, Igor Torrente wrote:
>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>> as a color input.
>>
>> This patch refactors all the functions related to the plane composition
>> to overcome this limitation.
>>
>> The pixels blend is done using the new internal format. And new handlers
>> are being added to convert a specific format to/from this internal format.
>>
>> So the blend operation depends on these handlers to convert to this common
>> format. The blended result, if necessary, is converted to the writeback
>> buffer format.
>>
>> This patch introduces three major differences to the blend function.
>> 1 - All the planes are blended at once.
>> 2 - The blend calculus is done as per line instead of per pixel.
>> 3 - It is responsible to calculates the CRC and writing the writeback
>> buffer(if necessary).
>>
>> These changes allow us to allocate way less memory in the intermediate
>> buffer to compute these operations. Because now we don't need to
>> have the entire intermediate image lines at once, just one line is
>> enough.
>>
>> | Memory consumption (output dimensions) |
>> |:--------------------------------------:|
>> |       Current      |     This patch    |
>> |:------------------:|:-----------------:|
>> |   Width * Heigth   |     2 * Width     |
>>
>> Beyond memory, we also have a minor performance benefit from all
>> these changes. Results running the IGT[1] test
>> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
>>
>> |                 Frametime                  |
>> |:------------------------------------------:|
>> |  Implementation |  Current  |  This commit |
>> |:---------------:|:---------:|:------------:|
>> | frametime range |  9~22 ms  |    5~17 ms   |
>> |     Average     |  11.4 ms  |    7.8 ms    |
>>
>> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
>>
>> V2: Improves the performance drastically, by performing the operations
>>      per-line and not per-pixel(Pekka Paalanen).
>>      Minor improvements(Pekka Paalanen).
>> V3: Changes the code to blend the planes all at once. This improves
>>      performance, memory consumption, and removes much of the weirdness
>>      of the V2(Pekka Paalanen and me).
>>      Minor improvements(Pekka Paalanen and me).
>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
>> V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
>>      Several security/robustness improvents(Pekka Paalanen).
>>      Removes check_planes_x_bounds function and allows partial
>>      partly off-screen(Pekka Paalanen).
>> V6: Fix a mismatch of some variable sizes (Pekka Paalanen).
>>      Several minor improvements (Pekka Paalanen).
>>
>> Reported-by: kernel test robot <lkp@intel.com>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>>   Documentation/gpu/vkms.rst            |   4 -
>>   drivers/gpu/drm/vkms/Makefile         |   1 +
>>   drivers/gpu/drm/vkms/vkms_composer.c  | 320 ++++++++++++--------------
>>   drivers/gpu/drm/vkms/vkms_formats.c   | 155 +++++++++++++
>>   drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>>   drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
>>   drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
>>   7 files changed, 317 insertions(+), 181 deletions(-)
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>
>> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
>> index 973e2d43108b..a49e4ae92653 100644
>> --- a/Documentation/gpu/vkms.rst
>> +++ b/Documentation/gpu/vkms.rst
>> @@ -118,10 +118,6 @@ Add Plane Features
>>   
>>   There's lots of plane features we could add support for:
>>   
>> -- Clearing primary plane: clear primary plane before plane composition (at the
>> -  start) for correctness of pixel blend ops. It also guarantees alpha channel
>> -  is cleared in the target buffer for stable crc. [Good to get started]
>> -
>>   - ARGB format on primary plane: blend the primary plane into background with
>>     translucent alpha.
>>   
>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>> index 72f779cbfedd..1b28a6a32948 100644
>> --- a/drivers/gpu/drm/vkms/Makefile
>> +++ b/drivers/gpu/drm/vkms/Makefile
>> @@ -3,6 +3,7 @@ vkms-y := \
>>   	vkms_drv.o \
>>   	vkms_plane.o \
>>   	vkms_output.o \
>> +	vkms_formats.o \
>>   	vkms_crtc.o \
>>   	vkms_composer.o \
>>   	vkms_writeback.o
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index b9fb408e8973..5b1a8bdd8268 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -7,204 +7,188 @@
>>   #include <drm/drm_fourcc.h>
>>   #include <drm/drm_gem_framebuffer_helper.h>
>>   #include <drm/drm_vblank.h>
>> +#include <linux/minmax.h>
>>   
>>   #include "vkms_drv.h"
>>   
>> -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>> -				 const struct vkms_frame_info *frame_info)
>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>   {
>> -	u32 pixel;
>> -	int src_offset = frame_info->offset + (y * frame_info->pitch)
>> -					    + (x * frame_info->cpp);
>> +	u32 new_color;
>>   
>> -	pixel = *(u32 *)&buffer[src_offset];
>> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>>   
>> -	return pixel;
>> +	return DIV_ROUND_CLOSEST(new_color, 0xffff);
>>   }
>>   
>>   /**
>> - * compute_crc - Compute CRC value on output frame
>> + * pre_mul_alpha_blend - alpha blending equation
>> + * @src_frame_info: source framebuffer's metadata
>> + * @stage_buffer: The line with the pixels from src_plane
>> + * @output_buffer: A line buffer that receives all the blends output
>>    *
>> - * @vaddr: address to final framebuffer
>> - * @frame_info: framebuffer's metadata
>> + * Using the information from the `frame_info`, this blends only the
>> + * necessary pixels from the `stage_buffer` to the `output_buffer`
>> + * using premultiplied blend formula.
>>    *
>> - * returns CRC value computed using crc32 on the visible portion of
>> - * the final framebuffer at vaddr_out
>> + * The current DRM assumption is that pixel color values have been already
>> + * pre-multiplied with the alpha channel values. See more
>> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
>> + * completely opaque background.
>>    */
>> -static uint32_t compute_crc(const u8 *vaddr,
>> -			    const struct vkms_frame_info *frame_info)
>> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
>> +				struct line_buffer *stage_buffer,
>> +				struct line_buffer *output_buffer)
>>   {
>> -	int x, y;
>> -	u32 crc = 0, pixel = 0;
>> -	int x_src = frame_info->src.x1 >> 16;
>> -	int y_src = frame_info->src.y1 >> 16;
>> -	int h_src = drm_rect_height(&frame_info->src) >> 16;
>> -	int w_src = drm_rect_width(&frame_info->src) >> 16;
>> -
>> -	for (y = y_src; y < y_src + h_src; ++y) {
>> -		for (x = x_src; x < x_src + w_src; ++x) {
>> -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
>> -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
>> -		}
>> +	int x_dst = frame_info->dst.x1;
>> +	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
>> +	struct pixel_argb_u16 *in = stage_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    stage_buffer->n_pixels);
>> +
>> +	for (int x = 0; x < x_limit; x++) {
>> +		out[x].a = (u16)0xffff;
>> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
>> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
>> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>>   	}
>> -
>> -	return crc;
>>   }
>>   
>> -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
>> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>>   {
>> -	u32 pre_blend;
>> -	u8 new_color;
>> -
>> -	pre_blend = (src * 255 + dst * (255 - alpha));
>> -
>> -	/* Faster div by 255 */
>> -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
>> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
>> +		return true;
>>   
>> -	return new_color;
>> +	return false;
>>   }
>>   
>>   /**
>> - * alpha_blend - alpha blending equation
>> - * @argb_src: src pixel on premultiplied alpha mode
>> - * @argb_dst: dst pixel completely opaque
>> + * @wb_frame_info: The writeback frame buffer metadata
>> + * @crtc_state: The crtc state
>> + * @crc32: The crc output of the final frame
>> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
>> + * @stage_buffer: The line with the pixels from plane being blend to the output
>>    *
>> - * blend pixels using premultiplied blend formula. The current DRM assumption
>> - * is that pixel color values have been already pre-multiplied with the alpha
>> - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
>> - * formula assumes a completely opaque background.
>> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
>> + * from all planes, calculates the crc32 of the output from the former step,
>> + * and, if necessary, convert and store the output to the writeback buffer.
>>    */
>> -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
>> +static void blend(struct vkms_writeback_job *wb,
>> +		  struct vkms_crtc_state *crtc_state,
>> +		  u32 *crc32, struct line_buffer *stage_buffer,
>> +		  struct line_buffer *output_buffer, size_t row_size)
>>   {
>> -	u8 alpha;
>> +	struct vkms_plane_state **plane = crtc_state->active_planes;
>> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>> +	u32 n_active_planes = crtc_state->num_active_planes;
>> +
>> +	int y_dst = primary_plane_info->dst.y1;
>> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
>> +	int y_limit = y_dst + h_dst;
>> +
>> +	for (size_t y = y_dst; y < y_limit; y++) {
>> +		plane[0]->plane_read(output_buffer, primary_plane_info, y);
>> +
>> +		/* If there are other planes besides primary, we consider the active
>> +		 * planes should be in z-order and compose them associatively:
>> +		 * ((primary <- overlay) <- cursor)
>> +		 */
>> +		for (size_t i = 1; i < n_active_planes; i++) {
>> +			if (!check_y_limit(plane[i]->frame_info, y))
>> +				continue;
>> +
>> +			plane[i]->plane_read(stage_buffer, plane[i]->frame_info, y);
>> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
>> +					    output_buffer);
>> +		}
>> +
>> +		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
>>   
>> -	alpha = argb_src[3];
>> -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
>> -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
>> -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
>> +		if (wb)
>> +			wb->wb_write(&wb->wb_frame_info, output_buffer, y);
>> +	}
>>   }
>>   
>> -/**
>> - * x_blend - blending equation that ignores the pixel alpha
>> - *
>> - * overwrites RGB color value from src pixel to dst pixel.
>> - */
>> -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>> +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
>> +			      struct vkms_writeback_job *active_wb)
>>   {
>> -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
>> +	struct vkms_plane_state **planes = crtc_state->active_planes;
>> +	u32 n_active_planes = crtc_state->num_active_planes;
>> +
>> +	for (size_t i = 0; i < n_active_planes; i++)
>> +		if (!planes[i]->plane_read)
>> +			return -1;
>> +
>> +	if (active_wb && !active_wb->wb_write)
>> +		return -1;
>> +
>> +	return 0;
>>   }
>>   
>> -/**
>> - * blend - blend value at vaddr_src with value at vaddr_dst
>> - * @vaddr_dst: destination address
>> - * @vaddr_src: source address
>> - * @dst_frame_info: destination framebuffer's metadata
>> - * @src_frame_info: source framebuffer's metadata
>> - * @pixel_blend: blending equation based on plane format
>> - *
>> - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
>> - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
>> - * and clearing alpha channel to an completely opaque background. This function
>> - * uses buffer's metadata to locate the new composite values at vaddr_dst.
>> - *
>> - * TODO: completely clear the primary plane (a = 0xff) before starting to blend
>> - * pixel color values
>> - */
>> -static void blend(void *vaddr_dst, void *vaddr_src,
>> -		  struct vkms_frame_info *dst_frame_info,
>> -		  struct vkms_frame_info *src_frame_info,
>> -		  void (*pixel_blend)(const u8 *, u8 *))
>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
>> +				 struct vkms_crtc_state *crtc_state,
>> +				 u32 *crc32)
>>   {
>> -	int i, j, j_dst, i_dst;
>> -	int offset_src, offset_dst;
>> -	u8 *pixel_dst, *pixel_src;
>> -
>> -	int x_src = src_frame_info->src.x1 >> 16;
>> -	int y_src = src_frame_info->src.y1 >> 16;
>> -
>> -	int x_dst = src_frame_info->dst.x1;
>> -	int y_dst = src_frame_info->dst.y1;
>> -	int h_dst = drm_rect_height(&src_frame_info->dst);
>> -	int w_dst = drm_rect_width(&src_frame_info->dst);
>> -
>> -	int y_limit = y_src + h_dst;
>> -	int x_limit = x_src + w_dst;
>> -
>> -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
>> -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
>> -			offset_dst = dst_frame_info->offset
>> -				     + (i_dst * dst_frame_info->pitch)
>> -				     + (j_dst++ * dst_frame_info->cpp);
>> -			offset_src = src_frame_info->offset
>> -				     + (i * src_frame_info->pitch)
>> -				     + (j * src_frame_info->cpp);
>> -
>> -			pixel_src = (u8 *)(vaddr_src + offset_src);
>> -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
>> -			pixel_blend(pixel_src, pixel_dst);
>> -			/* clearing alpha channel (0xff)*/
>> -			pixel_dst[3] = 0xff;
>> -		}
>> -		i_dst++;
>> +	size_t line_width, pixel_size = sizeof(struct pixel_argb_u16);
>> +	struct vkms_frame_info *primary_plane_info = NULL;
>> +	struct line_buffer output_buffer, stage_buffer;
>> +	struct vkms_plane_state *act_plane = NULL;
>> +	int ret = 0;
>> +
>> +	/*
>> +	 * This check exists so we can call `crc32_le` for the entire line
>> +	 * instead doing it for each channel of each pixel in case
>> +	 * `struct `pixel_argb_u16` had any gap added by the compiler
>> +	 * between the struct fields.
>> +	 */
>> +	static_assert(sizeof(struct pixel_argb_u16) == 8);
>> +
>> +	if (crtc_state->num_active_planes >= 1) {
>> +		act_plane = crtc_state->active_planes[0];
>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>> +			primary_plane_info = act_plane->frame_info;
>>   	}
>> -}
>>   
>> -static void compose_plane(struct vkms_frame_info *primary_plane_info,
>> -			  struct vkms_frame_info *plane_frame_info,
>> -			  void *vaddr_out)
>> -{
>> -	struct drm_framebuffer *fb = plane_frame_info->fb;
>> -	void *vaddr;
>> -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>> +	if (!primary_plane_info)
>> +		return -EINVAL;
>>   
>>   	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
>> -		return;
>> +		return -EINVAL;
>>   
>> -	vaddr = plane_frame_info->map[0].vaddr;
>> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>> +		return -EINVAL;
>>   
>> -	if (fb->format->format == DRM_FORMAT_ARGB8888)
>> -		pixel_blend = &alpha_blend;
>> -	else
>> -		pixel_blend = &x_blend;
>> +	line_width = drm_rect_width(&primary_plane_info->dst);
>> +	stage_buffer.n_pixels = line_width;
>> +	output_buffer.n_pixels = line_width;
>>   
>> -	blend(vaddr_out, vaddr, primary_plane_info,
>> -	      plane_frame_info, pixel_blend);
>> -}
>> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>> +	if (!stage_buffer.pixels) {
>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>> +		return -ENOMEM;
>> +	}
>>   
>> -static int compose_active_planes(void **vaddr_out,
>> -				 struct vkms_frame_info *primary_plane_info,
>> -				 struct vkms_crtc_state *crtc_state)
>> -{
>> -	struct drm_framebuffer *fb = primary_plane_info->fb;
>> -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>> -	const void *vaddr;
>> -	int i;
>> -
>> -	if (!*vaddr_out) {
>> -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
>> -		if (!*vaddr_out) {
>> -			DRM_ERROR("Cannot allocate memory for output frame.");
>> -			return -ENOMEM;
>> -		}
>> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>> +	if (!output_buffer.pixels) {
>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>> +		ret = -ENOMEM;
>> +		goto free_stage_buffer;
>>   	}
>>   
>> -	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
>> -		return -EINVAL;
>> +	if (active_wb) {
>> +		struct vkms_frame_info *wb_frame_info = &active_wb->wb_frame_info;
>>   
>> -	vaddr = primary_plane_info->map[0].vaddr;
>> +		wb_frame_info->src = primary_plane_info->src;
>> +		wb_frame_info->dst = primary_plane_info->dst;
>> +	}
>>   
>> -	memcpy(*vaddr_out, vaddr, gem_obj->size);
>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
>> +	      &output_buffer, line_width * pixel_size);
>>   
>> -	/* If there are other planes besides primary, we consider the active
>> -	 * planes should be in z-order and compose them associatively:
>> -	 * ((primary <- overlay) <- cursor)
>> -	 */
>> -	for (i = 1; i < crtc_state->num_active_planes; i++)
>> -		compose_plane(primary_plane_info,
>> -			      crtc_state->active_planes[i]->frame_info,
>> -			      *vaddr_out);
>> +	kvfree(output_buffer.pixels);
>> +free_stage_buffer:
>> +	kvfree(stage_buffer.pixels);
>>   
>> -	return 0;
>> +	return ret;
>>   }
>>   
>>   /**
>> @@ -222,13 +206,11 @@ void vkms_composer_worker(struct work_struct *work)
>>   						struct vkms_crtc_state,
>>   						composer_work);
>>   	struct drm_crtc *crtc = crtc_state->base.crtc;
>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>>   	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>> -	struct vkms_frame_info *primary_plane_info = NULL;
>> -	struct vkms_plane_state *act_plane = NULL;
>>   	bool crc_pending, wb_pending;
>> -	void *vaddr_out = NULL;
>> -	u32 crc32 = 0;
>>   	u64 frame_start, frame_end;
>> +	u32 crc32 = 0;
>>   	int ret;
>>   
>>   	spin_lock_irq(&out->composer_lock);
>> @@ -248,35 +230,19 @@ void vkms_composer_worker(struct work_struct *work)
>>   	if (!crc_pending)
>>   		return;
>>   
>> -	if (crtc_state->num_active_planes >= 1) {
>> -		act_plane = crtc_state->active_planes[0];
>> -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>> -			primary_plane_info = act_plane->frame_info;
>> -	}
>> -
>> -	if (!primary_plane_info)
>> -		return;
>> -
>>   	if (wb_pending)
>> -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
>> +	else
>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>   
>> -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
>> -				    crtc_state);
>> -	if (ret) {
>> -		if (ret == -EINVAL && !wb_pending)
>> -			kvfree(vaddr_out);
>> +	if (ret)
>>   		return;
>> -	}
>> -
>> -	crc32 = compute_crc(vaddr_out, primary_plane_info);
>>   
>>   	if (wb_pending) {
>>   		drm_writeback_signal_completion(&out->wb_connector, 0);
>>   		spin_lock_irq(&out->composer_lock);
>>   		crtc_state->wb_pending = false;
>>   		spin_unlock_irq(&out->composer_lock);
>> -	} else {
>> -		kvfree(vaddr_out);
>>   	}
>>   
>>   	/*
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>> new file mode 100644
>> index 000000000000..ca4bfcac686b
>> --- /dev/null
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>> @@ -0,0 +1,155 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +
>> +#include <drm/drm_rect.h>
>> +#include <linux/minmax.h>
>> +
>> +#include "vkms_formats.h"
>> +
>> +static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>> +{
>> +	return frame_info->offset + (y * frame_info->pitch)
>> +				  + (x * frame_info->cpp);
>> +}
>> +
>> +/*
>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>> + *
>> + * @frame_info: Buffer metadata
>> + * @x: The x(width) coordinate of the 2D buffer
>> + * @y: The y(Heigth) coordinate of the 2D buffer
>> + *
>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>> + * returns the address of the first color channel.
>> + * This function assumes the channels are packed together, i.e. a color channel
>> + * comes immediately after another in the memory. And therefore, this function
>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>> + */
>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
>> +				int x, int y)
>> +{
>> +	size_t offset = pixel_offset(frame_info, x, y);
>> +
>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>> +}
>> +
>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
>> +{
>> +	int x_src = frame_info->src.x1 >> 16;
>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>> +
>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>> +}
>> +
>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>> +				 const struct vkms_frame_info *frame_info, int y)
>> +{
>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    stage_buffer->n_pixels);
>> +
>> +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
>> +		/*
>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>> +		 * the best color value in a pixel format with more possibilities.
>> +		 * A similar idea applies to others RGB color conversions.
>> +		 */
>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>> +	}
>> +}
>> +
>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>> +				 const struct vkms_frame_info *frame_info, int y)
>> +{
>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    stage_buffer->n_pixels);
>> +
>> +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
>> +		out_pixels[x].a = (u16)0xffff;
>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>> +	}
>> +}
>> +
>> +/*
>> + * The following  functions take an line of argb_u16 pixels from the
>> + * src_buffer, convert them to a specific format, and store them in the
>> + * destination.
>> + *
>> + * They are used in the `compose_active_planes` to convert and store a line
>> + * from the src_buffer to the writeback buffer.
>> + */
>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
>> +				 const struct line_buffer *src_buffer, int y)
>> +{
>> +	int x_dst = frame_info->dst.x1;
>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    src_buffer->n_pixels);
>> +
>> +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +		/*
>> +		 * This sequence below is important because the format's byte order is
>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>> +		 * organized this way:
>> +		 *
>> +		 * | Addr     | = blue channel
>> +		 * | Addr + 1 | = green channel
>> +		 * | Addr + 2 | = Red channel
>> +		 * | Addr + 3 | = Alpha channel
>> +		 */
>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>> +	}
>> +}
>> +
>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>> +				 const struct line_buffer *src_buffer, int y)
>> +{
>> +	int x_dst = frame_info->dst.x1;
>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>> +			    src_buffer->n_pixels);
>> +
>> +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
>> +		dst_pixels[3] = 0xff;
>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>> +	}
>> +}
>> +
>> +frame_to_line_func get_frame_to_line_function(u32 format)
>> +{
>> +	switch (format) {
>> +	case DRM_FORMAT_ARGB8888:
>> +		return &ARGB8888_to_argb_u16;
>> +	case DRM_FORMAT_XRGB8888:
>> +		return &XRGB8888_to_argb_u16;
>> +	default:
>> +		return NULL;
>> +	}
>> +}
>> +
>> +line_to_frame_func get_line_to_frame_function(u32 format)
>> +{
>> +	switch (format) {
>> +	case DRM_FORMAT_ARGB8888:
>> +		return &argb_u16_to_ARGB8888;
>> +	case DRM_FORMAT_XRGB8888:
>> +		return &argb_u16_to_XRGB8888;
>> +	default:
>> +		return NULL;
>> +	}
>> +}
>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
>> new file mode 100644
>> index 000000000000..053ca42d5b31
>> --- /dev/null
>> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
>> @@ -0,0 +1,12 @@
>> +// SPDX-License-Identifier: GPL-2.0+
>> +
>> +#ifndef _VKMS_FORMATS_H_
>> +#define _VKMS_FORMATS_H_
>> +
>> +#include "vkms_drv.h"
>> +
>> +frame_to_line_func get_frame_to_line_function(u32 format);
>> +
>> +line_to_frame_func get_line_to_frame_function(u32 format);
>> +
>> +#endif /* _VKMS_FORMATS_H_ */
>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>> index 8adbfdc05e50..7a479a714565 100644
>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>> @@ -10,6 +10,7 @@
>>   #include <drm/drm_plane_helper.h>
>>   
>>   #include "vkms_drv.h"
>> +#include "vkms_formats.h"
> ^ this line no longer applies (needs to rebase), but I can manage it before apply to drm-misc-next

I did the rebase and I hadn't any issues.

I'm using `git://anongit.freedesktop.org/drm/drm-misc` remote. Should I 
be using another git remote for vkms?

>>   
>>   static const u32 vkms_formats[] = {
>>   	DRM_FORMAT_XRGB8888,
>> @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>   	struct drm_shadow_plane_state *shadow_plane_state;
>>   	struct drm_framebuffer *fb = new_state->fb;
>>   	struct vkms_frame_info *frame_info;
>> +	u32 fmt = fb->format->format;
>>   
>>   	if (!new_state->crtc || !fb)
>>   		return;
>> @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>   	frame_info->offset = fb->offsets[0];
>>   	frame_info->pitch = fb->pitches[0];
>>   	frame_info->cpp = fb->format->cpp[0];
>> +	vkms_plane_state->plane_read = get_frame_to_line_function(fmt);
>>   }
>>   
>>   static int vkms_plane_atomic_check(struct drm_plane *plane,
>> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
>> index c87f6c89e7b4..d2aabb52cb46 100644
>> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
>> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
>> @@ -11,6 +11,7 @@
>>   #include <drm/drm_gem_shmem_helper.h>
>>   
>>   #include "vkms_drv.h"
>> +#include "vkms_formats.h"
>>   
>>   static const u32 vkms_wb_formats[] = {
>>   	DRM_FORMAT_XRGB8888,
>> @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>   	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
>>   	struct vkms_writeback_job *active_wb;
>>   	struct vkms_frame_info *wb_frame_info;
>> +	u32 wb_format = fb->format->format;
>>   
>>   	if (!conn_state)
>>   		return;
>> @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>   	crtc_state->wb_pending = true;
>>   	spin_unlock_irq(&output->composer_lock);
>>   	drm_writeback_queue_job(wb_conn, connector_state);
>> +	active_wb->wb_write = get_line_to_frame_function(wb_format);
>>   }
>>   
>>   static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
>> -- 
>> 2.30.2
>>

Re: [RESEND v6 2/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info`

Igor Matheus Andrade Torrente <igormtorrente@gmail.com>
Details
Message ID
<4ba3957d-6358-2b4e-fb31-68a4dab6f01b@gmail.com>
In-Reply-To
<20220820110007.wk5wugdfpya4eb7w@mail.igalia.com> (view parent)
DKIM signature
missing
Download raw message
Hi Mellisa,

On 8/20/22 08:00, Melissa Wen wrote:
> On 08/19, Igor Torrente wrote:
>> Changes the name of this struct to a more meaningful name.
>> A name that represents better what this struct is about.
>>
>> Composer is the code that do the compositing of the planes.
>> This struct contains information on the frame used in the output
>> composition. Thus, vkms_frame_info is a better name to represent
>> this.
>>
>> V5: Fix a commit message typo(Melissa Wen).
>>
>> Reviewed-by: Melissa Wen <mwen@igalia.com>
>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>> ---
>>   drivers/gpu/drm/vkms/vkms_composer.c | 87 ++++++++++++++--------------
>>   drivers/gpu/drm/vkms/vkms_drv.h      |  6 +-
>>   drivers/gpu/drm/vkms/vkms_plane.c    | 38 ++++++------
>>   3 files changed, 66 insertions(+), 65 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>> index 775b97766e08..0aded4e87e60 100644
>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>> @@ -11,11 +11,11 @@
>>   #include "vkms_drv.h"
>>   
>>   static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>> -				 const struct vkms_composer *composer)
>> +				 const struct vkms_frame_info *frame_info)
>>   {
>>   	u32 pixel;
>> -	int src_offset = composer->offset + (y * composer->pitch)
>> -				      + (x * composer->cpp);
>> +	int src_offset = frame_info->offset + (y * frame_info->pitch)
>> +					    + (x * frame_info->cpp);
>>   
>>   	pixel = *(u32 *)&buffer[src_offset];
>>   
>> @@ -26,24 +26,24 @@ static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>>    * compute_crc - Compute CRC value on output frame
>>    *
>>    * @vaddr: address to final framebuffer
>> - * @composer: framebuffer's metadata
>> + * @frame_info: framebuffer's metadata
>>    *
>>    * returns CRC value computed using crc32 on the visible portion of
>>    * the final framebuffer at vaddr_out
>>    */
>>   static uint32_t compute_crc(const u8 *vaddr,
>> -			    const struct vkms_composer *composer)
>> +			    const struct vkms_frame_info *frame_info)
>>   {
>>   	int x, y;
>>   	u32 crc = 0, pixel = 0;
>> -	int x_src = composer->src.x1 >> 16;
>> -	int y_src = composer->src.y1 >> 16;
>> -	int h_src = drm_rect_height(&composer->src) >> 16;
>> -	int w_src = drm_rect_width(&composer->src) >> 16;
>> +	int x_src = frame_info->src.x1 >> 16;
>> +	int y_src = frame_info->src.y1 >> 16;
>> +	int h_src = drm_rect_height(&frame_info->src) >> 16;
>> +	int w_src = drm_rect_width(&frame_info->src) >> 16;
>>   
>>   	for (y = y_src; y < y_src + h_src; ++y) {
>>   		for (x = x_src; x < x_src + w_src; ++x) {
>> -			pixel = get_pixel_from_buffer(x, y, vaddr, composer);
>> +			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
>>   			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
>>   		}
>>   	}
>> @@ -98,8 +98,8 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>>    * blend - blend value at vaddr_src with value at vaddr_dst
>>    * @vaddr_dst: destination address
>>    * @vaddr_src: source address
>> - * @dst_composer: destination framebuffer's metadata
>> - * @src_composer: source framebuffer's metadata
>> + * @dst_frame_info: destination framebuffer's metadata
>> + * @src_frame_info: source framebuffer's metadata
>>    * @pixel_blend: blending equation based on plane format
>>    *
>>    * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
>> @@ -111,33 +111,33 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>>    * pixel color values
>>    */
>>   static void blend(void *vaddr_dst, void *vaddr_src,
>> -		  struct vkms_composer *dst_composer,
>> -		  struct vkms_composer *src_composer,
>> +		  struct vkms_frame_info *dst_frame_info,
>> +		  struct vkms_frame_info *src_frame_info,
>>   		  void (*pixel_blend)(const u8 *, u8 *))
>>   {
>>   	int i, j, j_dst, i_dst;
>>   	int offset_src, offset_dst;
>>   	u8 *pixel_dst, *pixel_src;
>>   
>> -	int x_src = src_composer->src.x1 >> 16;
>> -	int y_src = src_composer->src.y1 >> 16;
>> +	int x_src = src_frame_info->src.x1 >> 16;
>> +	int y_src = src_frame_info->src.y1 >> 16;
>>   
>> -	int x_dst = src_composer->dst.x1;
>> -	int y_dst = src_composer->dst.y1;
>> -	int h_dst = drm_rect_height(&src_composer->dst);
>> -	int w_dst = drm_rect_width(&src_composer->dst);
>> +	int x_dst = src_frame_info->dst.x1;
>> +	int y_dst = src_frame_info->dst.y1;
>> +	int h_dst = drm_rect_height(&src_frame_info->dst);
>> +	int w_dst = drm_rect_width(&src_frame_info->dst);
>>   
>>   	int y_limit = y_src + h_dst;
>>   	int x_limit = x_src + w_dst;
>>   
>>   	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
>>   		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
>> -			offset_dst = dst_composer->offset
>> -				     + (i_dst * dst_composer->pitch)
>> -				     + (j_dst++ * dst_composer->cpp);
>> -			offset_src = src_composer->offset
>> -				     + (i * src_composer->pitch)
>> -				     + (j * src_composer->cpp);
>> +			offset_dst = dst_frame_info->offset
>> +				     + (i_dst * dst_frame_info->pitch)
>> +				     + (j_dst++ * dst_frame_info->cpp);
>> +			offset_src = src_frame_info->offset
>> +				     + (i * src_frame_info->pitch)
>> +				     + (j * src_frame_info->cpp);
>>   
>>   			pixel_src = (u8 *)(vaddr_src + offset_src);
>>   			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
>> @@ -149,32 +149,33 @@ static void blend(void *vaddr_dst, void *vaddr_src,
>>   	}
>>   }
>>   
>> -static void compose_plane(struct vkms_composer *primary_composer,
>> -			  struct vkms_composer *plane_composer,
>> +static void compose_plane(struct vkms_frame_info *primary_plane_info,
>> +			  struct vkms_frame_info *plane_frame_info,
>>   			  void *vaddr_out)
>>   {
>> -	struct drm_framebuffer *fb = &plane_composer->fb;
>> +	struct drm_framebuffer *fb = &plane_frame_info->fb;
>>   	void *vaddr;
>>   	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>>   
>> -	if (WARN_ON(iosys_map_is_null(&plane_composer->map[0])))
>> +	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> ^ here you are reintroducing an error that we were checking primary
> plane repeatedly, instead of plane_composer (renamed to:
> plane_frame_info here). The issue is fixed in a following patch of this
> series when you decouple check_iosys_map.
> But I don't mind fixing it before apply.
Should I simply delete this line in the patch? Or there's something
else to do?

> 
>>   		return;
>>   
>> -	vaddr = plane_composer->map[0].vaddr;
>> +	vaddr = plane_frame_info->map[0].vaddr;
>>   
>>   	if (fb->format->format == DRM_FORMAT_ARGB8888)
>>   		pixel_blend = &alpha_blend;
>>   	else
>>   		pixel_blend = &x_blend;
>>   
>> -	blend(vaddr_out, vaddr, primary_composer, plane_composer, pixel_blend);
>> +	blend(vaddr_out, vaddr, primary_plane_info,
>> +	      plane_frame_info, pixel_blend);
>>   }
>>   
>>   static int compose_active_planes(void **vaddr_out,
>> -				 struct vkms_composer *primary_composer,
>> +				 struct vkms_frame_info *primary_plane_info,
>>   				 struct vkms_crtc_state *crtc_state)
>>   {
>> -	struct drm_framebuffer *fb = &primary_composer->fb;
>> +	struct drm_framebuffer *fb = &primary_plane_info->fb;
>>   	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>>   	const void *vaddr;
>>   	int i;
>> @@ -187,10 +188,10 @@ static int compose_active_planes(void **vaddr_out,
>>   		}
>>   	}
>>   
>> -	if (WARN_ON(iosys_map_is_null(&primary_composer->map[0])))
>> +	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
>>   		return -EINVAL;
>>   
>> -	vaddr = primary_composer->map[0].vaddr;
>> +	vaddr = primary_plane_info->map[0].vaddr;
>>   
>>   	memcpy(*vaddr_out, vaddr, gem_obj->size);
>>   
>> @@ -199,8 +200,8 @@ static int compose_active_planes(void **vaddr_out,
>>   	 * ((primary <- overlay) <- cursor)
>>   	 */
>>   	for (i = 1; i < crtc_state->num_active_planes; i++)
>> -		compose_plane(primary_composer,
>> -			      crtc_state->active_planes[i]->composer,
>> +		compose_plane(primary_plane_info,
>> +			      crtc_state->active_planes[i]->frame_info,
>>   			      *vaddr_out);
>>   
>>   	return 0;
>> @@ -222,7 +223,7 @@ void vkms_composer_worker(struct work_struct *work)
>>   						composer_work);
>>   	struct drm_crtc *crtc = crtc_state->base.crtc;
>>   	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>> -	struct vkms_composer *primary_composer = NULL;
>> +	struct vkms_frame_info *primary_plane_info = NULL;
>>   	struct vkms_plane_state *act_plane = NULL;
>>   	bool crc_pending, wb_pending;
>>   	void *vaddr_out = NULL;
>> @@ -250,16 +251,16 @@ void vkms_composer_worker(struct work_struct *work)
>>   	if (crtc_state->num_active_planes >= 1) {
>>   		act_plane = crtc_state->active_planes[0];
>>   		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>> -			primary_composer = act_plane->composer;
>> +			primary_plane_info = act_plane->frame_info;
>>   	}
>>   
>> -	if (!primary_composer)
>> +	if (!primary_plane_info)
>>   		return;
>>   
>>   	if (wb_pending)
>>   		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
>>   
>> -	ret = compose_active_planes(&vaddr_out, primary_composer,
>> +	ret = compose_active_planes(&vaddr_out, primary_plane_info,
>>   				    crtc_state);
>>   	if (ret) {
>>   		if (ret == -EINVAL && !wb_pending)
>> @@ -267,7 +268,7 @@ void vkms_composer_worker(struct work_struct *work)
>>   		return;
>>   	}
>>   
>> -	crc32 = compute_crc(vaddr_out, primary_composer);
>> +	crc32 = compute_crc(vaddr_out, primary_plane_info);
>>   
>>   	if (wb_pending) {
>>   		drm_writeback_signal_completion(&out->wb_connector, 0);
>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>> index 36fbab5989d1..5199c5f18e17 100644
>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>> @@ -27,7 +27,7 @@ struct vkms_writeback_job {
>>   	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
>>   };
>>   
>> -struct vkms_composer {
>> +struct vkms_frame_info {
>>   	struct drm_framebuffer fb;
>>   	struct drm_rect src, dst;
>>   	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
>> @@ -39,11 +39,11 @@ struct vkms_composer {
>>   /**
>>    * vkms_plane_state - Driver specific plane state
>>    * @base: base plane state
>> - * @composer: data required for composing computation
>> + * @frame_info: data required for composing computation
>>    */
>>   struct vkms_plane_state {
>>   	struct drm_shadow_plane_state base;
>> -	struct vkms_composer *composer;
>> +	struct vkms_frame_info *frame_info;
>>   };
>>   
>>   struct vkms_plane {
>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>> index d8eb674b49a6..fcae6c508f4b 100644
>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>> @@ -24,20 +24,20 @@ static struct drm_plane_state *
>>   vkms_plane_duplicate_state(struct drm_plane *plane)
>>   {
>>   	struct vkms_plane_state *vkms_state;
>> -	struct vkms_composer *composer;
>> +	struct vkms_frame_info *frame_info;
>>   
>>   	vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL);
>>   	if (!vkms_state)
>>   		return NULL;
>>   
>> -	composer = kzalloc(sizeof(*composer), GFP_KERNEL);
>> -	if (!composer) {
>> -		DRM_DEBUG_KMS("Couldn't allocate composer\n");
>> +	frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL);
>> +	if (!frame_info) {
>> +		DRM_DEBUG_KMS("Couldn't allocate frame_info\n");
>>   		kfree(vkms_state);
>>   		return NULL;
>>   	}
>>   
>> -	vkms_state->composer = composer;
>> +	vkms_state->frame_info = frame_info;
>>   
>>   	__drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base);
>>   
>> @@ -54,12 +54,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
>>   		/* dropping the reference we acquired in
>>   		 * vkms_primary_plane_update()
>>   		 */
>> -		if (drm_framebuffer_read_refcount(&vkms_state->composer->fb))
>> -			drm_framebuffer_put(&vkms_state->composer->fb);
>> +		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
>> +			drm_framebuffer_put(&vkms_state->frame_info->fb);
>>   	}
>>   
>> -	kfree(vkms_state->composer);
>> -	vkms_state->composer = NULL;
>> +	kfree(vkms_state->frame_info);
>> +	vkms_state->frame_info = NULL;
>>   
>>   	__drm_gem_destroy_shadow_plane_state(&vkms_state->base);
>>   	kfree(vkms_state);
>> @@ -99,7 +99,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>   	struct vkms_plane_state *vkms_plane_state;
>>   	struct drm_shadow_plane_state *shadow_plane_state;
>>   	struct drm_framebuffer *fb = new_state->fb;
>> -	struct vkms_composer *composer;
>> +	struct vkms_frame_info *frame_info;
>>   
>>   	if (!new_state->crtc || !fb)
>>   		return;
>> @@ -107,15 +107,15 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>   	vkms_plane_state = to_vkms_plane_state(new_state);
>>   	shadow_plane_state = &vkms_plane_state->base;
>>   
>> -	composer = vkms_plane_state->composer;
>> -	memcpy(&composer->src, &new_state->src, sizeof(struct drm_rect));
>> -	memcpy(&composer->dst, &new_state->dst, sizeof(struct drm_rect));
>> -	memcpy(&composer->fb, fb, sizeof(struct drm_framebuffer));
>> -	memcpy(&composer->map, &shadow_plane_state->data, sizeof(composer->map));
>> -	drm_framebuffer_get(&composer->fb);
>> -	composer->offset = fb->offsets[0];
>> -	composer->pitch = fb->pitches[0];
>> -	composer->cpp = fb->format->cpp[0];
>> +	frame_info = vkms_plane_state->frame_info;
>> +	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
>> +	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
>> +	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
>> +	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
>> +	drm_framebuffer_get(&frame_info->fb);
>> +	frame_info->offset = fb->offsets[0];
>> +	frame_info->pitch = fb->pitches[0];
>> +	frame_info->cpp = fb->format->cpp[0];
>>   }
>>   
>>   static int vkms_plane_atomic_check(struct drm_plane *plane,
>> -- 
>> 2.30.2
>>

Re: [RESEND v6 2/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info`

Melissa Wen <mwen@igalia.com>
Details
Message ID
<20220822183735.fgddmurhgs472tz2@mail.igalia.com>
In-Reply-To
<4ba3957d-6358-2b4e-fb31-68a4dab6f01b@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 08/22, Igor Matheus Andrade Torrente wrote:
> Hi Mellisa,
> 
> On 8/20/22 08:00, Melissa Wen wrote:
> > On 08/19, Igor Torrente wrote:
> > > Changes the name of this struct to a more meaningful name.
> > > A name that represents better what this struct is about.
> > > 
> > > Composer is the code that do the compositing of the planes.
> > > This struct contains information on the frame used in the output
> > > composition. Thus, vkms_frame_info is a better name to represent
> > > this.
> > > 
> > > V5: Fix a commit message typo(Melissa Wen).
> > > 
> > > Reviewed-by: Melissa Wen <mwen@igalia.com>
> > > Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> > > ---
> > >   drivers/gpu/drm/vkms/vkms_composer.c | 87 ++++++++++++++--------------
> > >   drivers/gpu/drm/vkms/vkms_drv.h      |  6 +-
> > >   drivers/gpu/drm/vkms/vkms_plane.c    | 38 ++++++------
> > >   3 files changed, 66 insertions(+), 65 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > > index 775b97766e08..0aded4e87e60 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > > @@ -11,11 +11,11 @@
> > >   #include "vkms_drv.h"
> > >   static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> > > -				 const struct vkms_composer *composer)
> > > +				 const struct vkms_frame_info *frame_info)
> > >   {
> > >   	u32 pixel;
> > > -	int src_offset = composer->offset + (y * composer->pitch)
> > > -				      + (x * composer->cpp);
> > > +	int src_offset = frame_info->offset + (y * frame_info->pitch)
> > > +					    + (x * frame_info->cpp);
> > >   	pixel = *(u32 *)&buffer[src_offset];
> > > @@ -26,24 +26,24 @@ static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> > >    * compute_crc - Compute CRC value on output frame
> > >    *
> > >    * @vaddr: address to final framebuffer
> > > - * @composer: framebuffer's metadata
> > > + * @frame_info: framebuffer's metadata
> > >    *
> > >    * returns CRC value computed using crc32 on the visible portion of
> > >    * the final framebuffer at vaddr_out
> > >    */
> > >   static uint32_t compute_crc(const u8 *vaddr,
> > > -			    const struct vkms_composer *composer)
> > > +			    const struct vkms_frame_info *frame_info)
> > >   {
> > >   	int x, y;
> > >   	u32 crc = 0, pixel = 0;
> > > -	int x_src = composer->src.x1 >> 16;
> > > -	int y_src = composer->src.y1 >> 16;
> > > -	int h_src = drm_rect_height(&composer->src) >> 16;
> > > -	int w_src = drm_rect_width(&composer->src) >> 16;
> > > +	int x_src = frame_info->src.x1 >> 16;
> > > +	int y_src = frame_info->src.y1 >> 16;
> > > +	int h_src = drm_rect_height(&frame_info->src) >> 16;
> > > +	int w_src = drm_rect_width(&frame_info->src) >> 16;
> > >   	for (y = y_src; y < y_src + h_src; ++y) {
> > >   		for (x = x_src; x < x_src + w_src; ++x) {
> > > -			pixel = get_pixel_from_buffer(x, y, vaddr, composer);
> > > +			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
> > >   			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
> > >   		}
> > >   	}
> > > @@ -98,8 +98,8 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
> > >    * blend - blend value at vaddr_src with value at vaddr_dst
> > >    * @vaddr_dst: destination address
> > >    * @vaddr_src: source address
> > > - * @dst_composer: destination framebuffer's metadata
> > > - * @src_composer: source framebuffer's metadata
> > > + * @dst_frame_info: destination framebuffer's metadata
> > > + * @src_frame_info: source framebuffer's metadata
> > >    * @pixel_blend: blending equation based on plane format
> > >    *
> > >    * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> > > @@ -111,33 +111,33 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
> > >    * pixel color values
> > >    */
> > >   static void blend(void *vaddr_dst, void *vaddr_src,
> > > -		  struct vkms_composer *dst_composer,
> > > -		  struct vkms_composer *src_composer,
> > > +		  struct vkms_frame_info *dst_frame_info,
> > > +		  struct vkms_frame_info *src_frame_info,
> > >   		  void (*pixel_blend)(const u8 *, u8 *))
> > >   {
> > >   	int i, j, j_dst, i_dst;
> > >   	int offset_src, offset_dst;
> > >   	u8 *pixel_dst, *pixel_src;
> > > -	int x_src = src_composer->src.x1 >> 16;
> > > -	int y_src = src_composer->src.y1 >> 16;
> > > +	int x_src = src_frame_info->src.x1 >> 16;
> > > +	int y_src = src_frame_info->src.y1 >> 16;
> > > -	int x_dst = src_composer->dst.x1;
> > > -	int y_dst = src_composer->dst.y1;
> > > -	int h_dst = drm_rect_height(&src_composer->dst);
> > > -	int w_dst = drm_rect_width(&src_composer->dst);
> > > +	int x_dst = src_frame_info->dst.x1;
> > > +	int y_dst = src_frame_info->dst.y1;
> > > +	int h_dst = drm_rect_height(&src_frame_info->dst);
> > > +	int w_dst = drm_rect_width(&src_frame_info->dst);
> > >   	int y_limit = y_src + h_dst;
> > >   	int x_limit = x_src + w_dst;
> > >   	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
> > >   		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
> > > -			offset_dst = dst_composer->offset
> > > -				     + (i_dst * dst_composer->pitch)
> > > -				     + (j_dst++ * dst_composer->cpp);
> > > -			offset_src = src_composer->offset
> > > -				     + (i * src_composer->pitch)
> > > -				     + (j * src_composer->cpp);
> > > +			offset_dst = dst_frame_info->offset
> > > +				     + (i_dst * dst_frame_info->pitch)
> > > +				     + (j_dst++ * dst_frame_info->cpp);
> > > +			offset_src = src_frame_info->offset
> > > +				     + (i * src_frame_info->pitch)
> > > +				     + (j * src_frame_info->cpp);
> > >   			pixel_src = (u8 *)(vaddr_src + offset_src);
> > >   			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
> > > @@ -149,32 +149,33 @@ static void blend(void *vaddr_dst, void *vaddr_src,
> > >   	}
> > >   }
> > > -static void compose_plane(struct vkms_composer *primary_composer,
> > > -			  struct vkms_composer *plane_composer,
> > > +static void compose_plane(struct vkms_frame_info *primary_plane_info,
> > > +			  struct vkms_frame_info *plane_frame_info,
> > >   			  void *vaddr_out)
> > >   {
> > > -	struct drm_framebuffer *fb = &plane_composer->fb;
> > > +	struct drm_framebuffer *fb = &plane_frame_info->fb;
> > >   	void *vaddr;
> > >   	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
> > > -	if (WARN_ON(iosys_map_is_null(&plane_composer->map[0])))
> > > +	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> > ^ here you are reintroducing an error that we were checking primary
> > plane repeatedly, instead of plane_composer (renamed to:
> > plane_frame_info here). The issue is fixed in a following patch of this
> > series when you decouple check_iosys_map.
> > But I don't mind fixing it before apply.
> Should I simply delete this line in the patch? Or there's something
> else to do?

No, you just need to check the correct plane (plane_frame_info), that means:

- if (WARN_ON(iosys_map_is_null(&plane_composer->map[0])))
+ if (WARN_ON(iosys_map_is_null(&plane_frame_info->map[0])))

because here you are renaming `plane_composer` to `plane_frame_info`,
and `primary_plane_info->map[0]` is already checked in the following
compose_active_planes() function.

Thanks,

Melissa

> 
> > 
> > >   		return;
> > > -	vaddr = plane_composer->map[0].vaddr;
> > > +	vaddr = plane_frame_info->map[0].vaddr;
> > >   	if (fb->format->format == DRM_FORMAT_ARGB8888)
> > >   		pixel_blend = &alpha_blend;
> > >   	else
> > >   		pixel_blend = &x_blend;
> > > -	blend(vaddr_out, vaddr, primary_composer, plane_composer, pixel_blend);
> > > +	blend(vaddr_out, vaddr, primary_plane_info,
> > > +	      plane_frame_info, pixel_blend);
> > >   }
> > >   static int compose_active_planes(void **vaddr_out,
> > > -				 struct vkms_composer *primary_composer,
> > > +				 struct vkms_frame_info *primary_plane_info,
> > >   				 struct vkms_crtc_state *crtc_state)
> > >   {
> > > -	struct drm_framebuffer *fb = &primary_composer->fb;
> > > +	struct drm_framebuffer *fb = &primary_plane_info->fb;
> > >   	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
> > >   	const void *vaddr;
> > >   	int i;
> > > @@ -187,10 +188,10 @@ static int compose_active_planes(void **vaddr_out,
> > >   		}
> > >   	}
> > > -	if (WARN_ON(iosys_map_is_null(&primary_composer->map[0])))
> > > +	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> > >   		return -EINVAL;
> > > -	vaddr = primary_composer->map[0].vaddr;
> > > +	vaddr = primary_plane_info->map[0].vaddr;
> > >   	memcpy(*vaddr_out, vaddr, gem_obj->size);
> > > @@ -199,8 +200,8 @@ static int compose_active_planes(void **vaddr_out,
> > >   	 * ((primary <- overlay) <- cursor)
> > >   	 */
> > >   	for (i = 1; i < crtc_state->num_active_planes; i++)
> > > -		compose_plane(primary_composer,
> > > -			      crtc_state->active_planes[i]->composer,
> > > +		compose_plane(primary_plane_info,
> > > +			      crtc_state->active_planes[i]->frame_info,
> > >   			      *vaddr_out);
> > >   	return 0;
> > > @@ -222,7 +223,7 @@ void vkms_composer_worker(struct work_struct *work)
> > >   						composer_work);
> > >   	struct drm_crtc *crtc = crtc_state->base.crtc;
> > >   	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> > > -	struct vkms_composer *primary_composer = NULL;
> > > +	struct vkms_frame_info *primary_plane_info = NULL;
> > >   	struct vkms_plane_state *act_plane = NULL;
> > >   	bool crc_pending, wb_pending;
> > >   	void *vaddr_out = NULL;
> > > @@ -250,16 +251,16 @@ void vkms_composer_worker(struct work_struct *work)
> > >   	if (crtc_state->num_active_planes >= 1) {
> > >   		act_plane = crtc_state->active_planes[0];
> > >   		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> > > -			primary_composer = act_plane->composer;
> > > +			primary_plane_info = act_plane->frame_info;
> > >   	}
> > > -	if (!primary_composer)
> > > +	if (!primary_plane_info)
> > >   		return;
> > >   	if (wb_pending)
> > >   		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
> > > -	ret = compose_active_planes(&vaddr_out, primary_composer,
> > > +	ret = compose_active_planes(&vaddr_out, primary_plane_info,
> > >   				    crtc_state);
> > >   	if (ret) {
> > >   		if (ret == -EINVAL && !wb_pending)
> > > @@ -267,7 +268,7 @@ void vkms_composer_worker(struct work_struct *work)
> > >   		return;
> > >   	}
> > > -	crc32 = compute_crc(vaddr_out, primary_composer);
> > > +	crc32 = compute_crc(vaddr_out, primary_plane_info);
> > >   	if (wb_pending) {
> > >   		drm_writeback_signal_completion(&out->wb_connector, 0);
> > > diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
> > > index 36fbab5989d1..5199c5f18e17 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_drv.h
> > > +++ b/drivers/gpu/drm/vkms/vkms_drv.h
> > > @@ -27,7 +27,7 @@ struct vkms_writeback_job {
> > >   	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
> > >   };
> > > -struct vkms_composer {
> > > +struct vkms_frame_info {
> > >   	struct drm_framebuffer fb;
> > >   	struct drm_rect src, dst;
> > >   	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
> > > @@ -39,11 +39,11 @@ struct vkms_composer {
> > >   /**
> > >    * vkms_plane_state - Driver specific plane state
> > >    * @base: base plane state
> > > - * @composer: data required for composing computation
> > > + * @frame_info: data required for composing computation
> > >    */
> > >   struct vkms_plane_state {
> > >   	struct drm_shadow_plane_state base;
> > > -	struct vkms_composer *composer;
> > > +	struct vkms_frame_info *frame_info;
> > >   };
> > >   struct vkms_plane {
> > > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > > index d8eb674b49a6..fcae6c508f4b 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > > @@ -24,20 +24,20 @@ static struct drm_plane_state *
> > >   vkms_plane_duplicate_state(struct drm_plane *plane)
> > >   {
> > >   	struct vkms_plane_state *vkms_state;
> > > -	struct vkms_composer *composer;
> > > +	struct vkms_frame_info *frame_info;
> > >   	vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL);
> > >   	if (!vkms_state)
> > >   		return NULL;
> > > -	composer = kzalloc(sizeof(*composer), GFP_KERNEL);
> > > -	if (!composer) {
> > > -		DRM_DEBUG_KMS("Couldn't allocate composer\n");
> > > +	frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL);
> > > +	if (!frame_info) {
> > > +		DRM_DEBUG_KMS("Couldn't allocate frame_info\n");
> > >   		kfree(vkms_state);
> > >   		return NULL;
> > >   	}
> > > -	vkms_state->composer = composer;
> > > +	vkms_state->frame_info = frame_info;
> > >   	__drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base);
> > > @@ -54,12 +54,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
> > >   		/* dropping the reference we acquired in
> > >   		 * vkms_primary_plane_update()
> > >   		 */
> > > -		if (drm_framebuffer_read_refcount(&vkms_state->composer->fb))
> > > -			drm_framebuffer_put(&vkms_state->composer->fb);
> > > +		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
> > > +			drm_framebuffer_put(&vkms_state->frame_info->fb);
> > >   	}
> > > -	kfree(vkms_state->composer);
> > > -	vkms_state->composer = NULL;
> > > +	kfree(vkms_state->frame_info);
> > > +	vkms_state->frame_info = NULL;
> > >   	__drm_gem_destroy_shadow_plane_state(&vkms_state->base);
> > >   	kfree(vkms_state);
> > > @@ -99,7 +99,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > >   	struct vkms_plane_state *vkms_plane_state;
> > >   	struct drm_shadow_plane_state *shadow_plane_state;
> > >   	struct drm_framebuffer *fb = new_state->fb;
> > > -	struct vkms_composer *composer;
> > > +	struct vkms_frame_info *frame_info;
> > >   	if (!new_state->crtc || !fb)
> > >   		return;
> > > @@ -107,15 +107,15 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > >   	vkms_plane_state = to_vkms_plane_state(new_state);
> > >   	shadow_plane_state = &vkms_plane_state->base;
> > > -	composer = vkms_plane_state->composer;
> > > -	memcpy(&composer->src, &new_state->src, sizeof(struct drm_rect));
> > > -	memcpy(&composer->dst, &new_state->dst, sizeof(struct drm_rect));
> > > -	memcpy(&composer->fb, fb, sizeof(struct drm_framebuffer));
> > > -	memcpy(&composer->map, &shadow_plane_state->data, sizeof(composer->map));
> > > -	drm_framebuffer_get(&composer->fb);
> > > -	composer->offset = fb->offsets[0];
> > > -	composer->pitch = fb->pitches[0];
> > > -	composer->cpp = fb->format->cpp[0];
> > > +	frame_info = vkms_plane_state->frame_info;
> > > +	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
> > > +	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
> > > +	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
> > > +	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
> > > +	drm_framebuffer_get(&frame_info->fb);
> > > +	frame_info->offset = fb->offsets[0];
> > > +	frame_info->pitch = fb->pitches[0];
> > > +	frame_info->cpp = fb->format->cpp[0];
> > >   }
> > >   static int vkms_plane_atomic_check(struct drm_plane *plane,
> > > -- 
> > > 2.30.2
> > > 
> 

Re: [RESEND v6 6/9] drm: vkms: Refactor the plane composer to accept new formats

Melissa Wen <mwen@igalia.com>
Details
Message ID
<20220822190110.u4evrujigrrcp3ud@mail.igalia.com>
In-Reply-To
<b90f2c07-18ad-411f-82ec-914974cf8d2c@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 08/22, Igor Matheus Andrade Torrente wrote:
> Hi Melissa,
> 
> On 8/20/22 07:51, Melissa Wen wrote:
> > On 08/19, Igor Torrente wrote:
> > > Currently the blend function only accepts XRGB_8888 and ARGB_8888
> > > as a color input.
> > > 
> > > This patch refactors all the functions related to the plane composition
> > > to overcome this limitation.
> > > 
> > > The pixels blend is done using the new internal format. And new handlers
> > > are being added to convert a specific format to/from this internal format.
> > > 
> > > So the blend operation depends on these handlers to convert to this common
> > > format. The blended result, if necessary, is converted to the writeback
> > > buffer format.
> > > 
> > > This patch introduces three major differences to the blend function.
> > > 1 - All the planes are blended at once.
> > > 2 - The blend calculus is done as per line instead of per pixel.
> > > 3 - It is responsible to calculates the CRC and writing the writeback
> > > buffer(if necessary).
> > > 
> > > These changes allow us to allocate way less memory in the intermediate
> > > buffer to compute these operations. Because now we don't need to
> > > have the entire intermediate image lines at once, just one line is
> > > enough.
> > > 
> > > | Memory consumption (output dimensions) |
> > > |:--------------------------------------:|
> > > |       Current      |     This patch    |
> > > |:------------------:|:-----------------:|
> > > |   Width * Heigth   |     2 * Width     |
> > > 
> > > Beyond memory, we also have a minor performance benefit from all
> > > these changes. Results running the IGT[1] test
> > > `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
> > > 
> > > |                 Frametime                  |
> > > |:------------------------------------------:|
> > > |  Implementation |  Current  |  This commit |
> > > |:---------------:|:---------:|:------------:|
> > > | frametime range |  9~22 ms  |    5~17 ms   |
> > > |     Average     |  11.4 ms  |    7.8 ms    |
> > > 
> > > [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
> > > 
> > > V2: Improves the performance drastically, by performing the operations
> > >      per-line and not per-pixel(Pekka Paalanen).
> > >      Minor improvements(Pekka Paalanen).
> > > V3: Changes the code to blend the planes all at once. This improves
> > >      performance, memory consumption, and removes much of the weirdness
> > >      of the V2(Pekka Paalanen and me).
> > >      Minor improvements(Pekka Paalanen and me).
> > > V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> > > V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
> > >      Several security/robustness improvents(Pekka Paalanen).
> > >      Removes check_planes_x_bounds function and allows partial
> > >      partly off-screen(Pekka Paalanen).
> > > V6: Fix a mismatch of some variable sizes (Pekka Paalanen).
> > >      Several minor improvements (Pekka Paalanen).
> > > 
> > > Reported-by: kernel test robot <lkp@intel.com>
> > > Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> > > ---
> > >   Documentation/gpu/vkms.rst            |   4 -
> > >   drivers/gpu/drm/vkms/Makefile         |   1 +
> > >   drivers/gpu/drm/vkms/vkms_composer.c  | 320 ++++++++++++--------------
> > >   drivers/gpu/drm/vkms/vkms_formats.c   | 155 +++++++++++++
> > >   drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
> > >   drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
> > >   drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
> > >   7 files changed, 317 insertions(+), 181 deletions(-)
> > >   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
> > >   create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> > > 
> > > diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
> > > index 973e2d43108b..a49e4ae92653 100644
> > > --- a/Documentation/gpu/vkms.rst
> > > +++ b/Documentation/gpu/vkms.rst
> > > @@ -118,10 +118,6 @@ Add Plane Features
> > >   There's lots of plane features we could add support for:
> > > -- Clearing primary plane: clear primary plane before plane composition (at the
> > > -  start) for correctness of pixel blend ops. It also guarantees alpha channel
> > > -  is cleared in the target buffer for stable crc. [Good to get started]
> > > -
> > >   - ARGB format on primary plane: blend the primary plane into background with
> > >     translucent alpha.
> > > diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> > > index 72f779cbfedd..1b28a6a32948 100644
> > > --- a/drivers/gpu/drm/vkms/Makefile
> > > +++ b/drivers/gpu/drm/vkms/Makefile
> > > @@ -3,6 +3,7 @@ vkms-y := \
> > >   	vkms_drv.o \
> > >   	vkms_plane.o \
> > >   	vkms_output.o \
> > > +	vkms_formats.o \
> > >   	vkms_crtc.o \
> > >   	vkms_composer.o \
> > >   	vkms_writeback.o
> > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > > index b9fb408e8973..5b1a8bdd8268 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > > @@ -7,204 +7,188 @@
> > >   #include <drm/drm_fourcc.h>
> > >   #include <drm/drm_gem_framebuffer_helper.h>
> > >   #include <drm/drm_vblank.h>
> > > +#include <linux/minmax.h>
> > >   #include "vkms_drv.h"
> > > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> > > -				 const struct vkms_frame_info *frame_info)
> > > +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
> > >   {
> > > -	u32 pixel;
> > > -	int src_offset = frame_info->offset + (y * frame_info->pitch)
> > > -					    + (x * frame_info->cpp);
> > > +	u32 new_color;
> > > -	pixel = *(u32 *)&buffer[src_offset];
> > > +	new_color = (src * 0xffff + dst * (0xffff - alpha));
> > > -	return pixel;
> > > +	return DIV_ROUND_CLOSEST(new_color, 0xffff);
> > >   }
> > >   /**
> > > - * compute_crc - Compute CRC value on output frame
> > > + * pre_mul_alpha_blend - alpha blending equation
> > > + * @src_frame_info: source framebuffer's metadata
> > > + * @stage_buffer: The line with the pixels from src_plane
> > > + * @output_buffer: A line buffer that receives all the blends output
> > >    *
> > > - * @vaddr: address to final framebuffer
> > > - * @frame_info: framebuffer's metadata
> > > + * Using the information from the `frame_info`, this blends only the
> > > + * necessary pixels from the `stage_buffer` to the `output_buffer`
> > > + * using premultiplied blend formula.
> > >    *
> > > - * returns CRC value computed using crc32 on the visible portion of
> > > - * the final framebuffer at vaddr_out
> > > + * The current DRM assumption is that pixel color values have been already
> > > + * pre-multiplied with the alpha channel values. See more
> > > + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> > > + * completely opaque background.
> > >    */
> > > -static uint32_t compute_crc(const u8 *vaddr,
> > > -			    const struct vkms_frame_info *frame_info)
> > > +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > > +				struct line_buffer *stage_buffer,
> > > +				struct line_buffer *output_buffer)
> > >   {
> > > -	int x, y;
> > > -	u32 crc = 0, pixel = 0;
> > > -	int x_src = frame_info->src.x1 >> 16;
> > > -	int y_src = frame_info->src.y1 >> 16;
> > > -	int h_src = drm_rect_height(&frame_info->src) >> 16;
> > > -	int w_src = drm_rect_width(&frame_info->src) >> 16;
> > > -
> > > -	for (y = y_src; y < y_src + h_src; ++y) {
> > > -		for (x = x_src; x < x_src + w_src; ++x) {
> > > -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
> > > -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
> > > -		}
> > > +	int x_dst = frame_info->dst.x1;
> > > +	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> > > +	struct pixel_argb_u16 *in = stage_buffer->pixels;
> > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > +			    stage_buffer->n_pixels);
> > > +
> > > +	for (int x = 0; x < x_limit; x++) {
> > > +		out[x].a = (u16)0xffff;
> > > +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > > +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > > +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > >   	}
> > > -
> > > -	return crc;
> > >   }
> > > -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
> > > +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
> > >   {
> > > -	u32 pre_blend;
> > > -	u8 new_color;
> > > -
> > > -	pre_blend = (src * 255 + dst * (255 - alpha));
> > > -
> > > -	/* Faster div by 255 */
> > > -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
> > > +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
> > > +		return true;
> > > -	return new_color;
> > > +	return false;
> > >   }
> > >   /**
> > > - * alpha_blend - alpha blending equation
> > > - * @argb_src: src pixel on premultiplied alpha mode
> > > - * @argb_dst: dst pixel completely opaque
> > > + * @wb_frame_info: The writeback frame buffer metadata
> > > + * @crtc_state: The crtc state
> > > + * @crc32: The crc output of the final frame
> > > + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
> > > + * @stage_buffer: The line with the pixels from plane being blend to the output
> > >    *
> > > - * blend pixels using premultiplied blend formula. The current DRM assumption
> > > - * is that pixel color values have been already pre-multiplied with the alpha
> > > - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
> > > - * formula assumes a completely opaque background.
> > > + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
> > > + * from all planes, calculates the crc32 of the output from the former step,
> > > + * and, if necessary, convert and store the output to the writeback buffer.
> > >    */
> > > -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
> > > +static void blend(struct vkms_writeback_job *wb,
> > > +		  struct vkms_crtc_state *crtc_state,
> > > +		  u32 *crc32, struct line_buffer *stage_buffer,
> > > +		  struct line_buffer *output_buffer, size_t row_size)
> > >   {
> > > -	u8 alpha;
> > > +	struct vkms_plane_state **plane = crtc_state->active_planes;
> > > +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
> > > +	u32 n_active_planes = crtc_state->num_active_planes;
> > > +
> > > +	int y_dst = primary_plane_info->dst.y1;
> > > +	int h_dst = drm_rect_height(&primary_plane_info->dst);
> > > +	int y_limit = y_dst + h_dst;
> > > +
> > > +	for (size_t y = y_dst; y < y_limit; y++) {
> > > +		plane[0]->plane_read(output_buffer, primary_plane_info, y);
> > > +
> > > +		/* If there are other planes besides primary, we consider the active
> > > +		 * planes should be in z-order and compose them associatively:
> > > +		 * ((primary <- overlay) <- cursor)
> > > +		 */
> > > +		for (size_t i = 1; i < n_active_planes; i++) {
> > > +			if (!check_y_limit(plane[i]->frame_info, y))
> > > +				continue;
> > > +
> > > +			plane[i]->plane_read(stage_buffer, plane[i]->frame_info, y);
> > > +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> > > +					    output_buffer);
> > > +		}
> > > +
> > > +		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
> > > -	alpha = argb_src[3];
> > > -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
> > > -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
> > > -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
> > > +		if (wb)
> > > +			wb->wb_write(&wb->wb_frame_info, output_buffer, y);
> > > +	}
> > >   }
> > > -/**
> > > - * x_blend - blending equation that ignores the pixel alpha
> > > - *
> > > - * overwrites RGB color value from src pixel to dst pixel.
> > > - */
> > > -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
> > > +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
> > > +			      struct vkms_writeback_job *active_wb)
> > >   {
> > > -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
> > > +	struct vkms_plane_state **planes = crtc_state->active_planes;
> > > +	u32 n_active_planes = crtc_state->num_active_planes;
> > > +
> > > +	for (size_t i = 0; i < n_active_planes; i++)
> > > +		if (!planes[i]->plane_read)
> > > +			return -1;
> > > +
> > > +	if (active_wb && !active_wb->wb_write)
> > > +		return -1;
> > > +
> > > +	return 0;
> > >   }
> > > -/**
> > > - * blend - blend value at vaddr_src with value at vaddr_dst
> > > - * @vaddr_dst: destination address
> > > - * @vaddr_src: source address
> > > - * @dst_frame_info: destination framebuffer's metadata
> > > - * @src_frame_info: source framebuffer's metadata
> > > - * @pixel_blend: blending equation based on plane format
> > > - *
> > > - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> > > - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
> > > - * and clearing alpha channel to an completely opaque background. This function
> > > - * uses buffer's metadata to locate the new composite values at vaddr_dst.
> > > - *
> > > - * TODO: completely clear the primary plane (a = 0xff) before starting to blend
> > > - * pixel color values
> > > - */
> > > -static void blend(void *vaddr_dst, void *vaddr_src,
> > > -		  struct vkms_frame_info *dst_frame_info,
> > > -		  struct vkms_frame_info *src_frame_info,
> > > -		  void (*pixel_blend)(const u8 *, u8 *))
> > > +static int compose_active_planes(struct vkms_writeback_job *active_wb,
> > > +				 struct vkms_crtc_state *crtc_state,
> > > +				 u32 *crc32)
> > >   {
> > > -	int i, j, j_dst, i_dst;
> > > -	int offset_src, offset_dst;
> > > -	u8 *pixel_dst, *pixel_src;
> > > -
> > > -	int x_src = src_frame_info->src.x1 >> 16;
> > > -	int y_src = src_frame_info->src.y1 >> 16;
> > > -
> > > -	int x_dst = src_frame_info->dst.x1;
> > > -	int y_dst = src_frame_info->dst.y1;
> > > -	int h_dst = drm_rect_height(&src_frame_info->dst);
> > > -	int w_dst = drm_rect_width(&src_frame_info->dst);
> > > -
> > > -	int y_limit = y_src + h_dst;
> > > -	int x_limit = x_src + w_dst;
> > > -
> > > -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
> > > -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
> > > -			offset_dst = dst_frame_info->offset
> > > -				     + (i_dst * dst_frame_info->pitch)
> > > -				     + (j_dst++ * dst_frame_info->cpp);
> > > -			offset_src = src_frame_info->offset
> > > -				     + (i * src_frame_info->pitch)
> > > -				     + (j * src_frame_info->cpp);
> > > -
> > > -			pixel_src = (u8 *)(vaddr_src + offset_src);
> > > -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
> > > -			pixel_blend(pixel_src, pixel_dst);
> > > -			/* clearing alpha channel (0xff)*/
> > > -			pixel_dst[3] = 0xff;
> > > -		}
> > > -		i_dst++;
> > > +	size_t line_width, pixel_size = sizeof(struct pixel_argb_u16);
> > > +	struct vkms_frame_info *primary_plane_info = NULL;
> > > +	struct line_buffer output_buffer, stage_buffer;
> > > +	struct vkms_plane_state *act_plane = NULL;
> > > +	int ret = 0;
> > > +
> > > +	/*
> > > +	 * This check exists so we can call `crc32_le` for the entire line
> > > +	 * instead doing it for each channel of each pixel in case
> > > +	 * `struct `pixel_argb_u16` had any gap added by the compiler
> > > +	 * between the struct fields.
> > > +	 */
> > > +	static_assert(sizeof(struct pixel_argb_u16) == 8);
> > > +
> > > +	if (crtc_state->num_active_planes >= 1) {
> > > +		act_plane = crtc_state->active_planes[0];
> > > +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> > > +			primary_plane_info = act_plane->frame_info;
> > >   	}
> > > -}
> > > -static void compose_plane(struct vkms_frame_info *primary_plane_info,
> > > -			  struct vkms_frame_info *plane_frame_info,
> > > -			  void *vaddr_out)
> > > -{
> > > -	struct drm_framebuffer *fb = plane_frame_info->fb;
> > > -	void *vaddr;
> > > -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
> > > +	if (!primary_plane_info)
> > > +		return -EINVAL;
> > >   	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> > > -		return;
> > > +		return -EINVAL;
> > > -	vaddr = plane_frame_info->map[0].vaddr;
> > > +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
> > > +		return -EINVAL;
> > > -	if (fb->format->format == DRM_FORMAT_ARGB8888)
> > > -		pixel_blend = &alpha_blend;
> > > -	else
> > > -		pixel_blend = &x_blend;
> > > +	line_width = drm_rect_width(&primary_plane_info->dst);
> > > +	stage_buffer.n_pixels = line_width;
> > > +	output_buffer.n_pixels = line_width;
> > > -	blend(vaddr_out, vaddr, primary_plane_info,
> > > -	      plane_frame_info, pixel_blend);
> > > -}
> > > +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> > > +	if (!stage_buffer.pixels) {
> > > +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> > > +		return -ENOMEM;
> > > +	}
> > > -static int compose_active_planes(void **vaddr_out,
> > > -				 struct vkms_frame_info *primary_plane_info,
> > > -				 struct vkms_crtc_state *crtc_state)
> > > -{
> > > -	struct drm_framebuffer *fb = primary_plane_info->fb;
> > > -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
> > > -	const void *vaddr;
> > > -	int i;
> > > -
> > > -	if (!*vaddr_out) {
> > > -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
> > > -		if (!*vaddr_out) {
> > > -			DRM_ERROR("Cannot allocate memory for output frame.");
> > > -			return -ENOMEM;
> > > -		}
> > > +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> > > +	if (!output_buffer.pixels) {
> > > +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> > > +		ret = -ENOMEM;
> > > +		goto free_stage_buffer;
> > >   	}
> > > -	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> > > -		return -EINVAL;
> > > +	if (active_wb) {
> > > +		struct vkms_frame_info *wb_frame_info = &active_wb->wb_frame_info;
> > > -	vaddr = primary_plane_info->map[0].vaddr;
> > > +		wb_frame_info->src = primary_plane_info->src;
> > > +		wb_frame_info->dst = primary_plane_info->dst;
> > > +	}
> > > -	memcpy(*vaddr_out, vaddr, gem_obj->size);
> > > +	blend(active_wb, crtc_state, crc32, &stage_buffer,
> > > +	      &output_buffer, line_width * pixel_size);
> > > -	/* If there are other planes besides primary, we consider the active
> > > -	 * planes should be in z-order and compose them associatively:
> > > -	 * ((primary <- overlay) <- cursor)
> > > -	 */
> > > -	for (i = 1; i < crtc_state->num_active_planes; i++)
> > > -		compose_plane(primary_plane_info,
> > > -			      crtc_state->active_planes[i]->frame_info,
> > > -			      *vaddr_out);
> > > +	kvfree(output_buffer.pixels);
> > > +free_stage_buffer:
> > > +	kvfree(stage_buffer.pixels);
> > > -	return 0;
> > > +	return ret;
> > >   }
> > >   /**
> > > @@ -222,13 +206,11 @@ void vkms_composer_worker(struct work_struct *work)
> > >   						struct vkms_crtc_state,
> > >   						composer_work);
> > >   	struct drm_crtc *crtc = crtc_state->base.crtc;
> > > +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
> > >   	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> > > -	struct vkms_frame_info *primary_plane_info = NULL;
> > > -	struct vkms_plane_state *act_plane = NULL;
> > >   	bool crc_pending, wb_pending;
> > > -	void *vaddr_out = NULL;
> > > -	u32 crc32 = 0;
> > >   	u64 frame_start, frame_end;
> > > +	u32 crc32 = 0;
> > >   	int ret;
> > >   	spin_lock_irq(&out->composer_lock);
> > > @@ -248,35 +230,19 @@ void vkms_composer_worker(struct work_struct *work)
> > >   	if (!crc_pending)
> > >   		return;
> > > -	if (crtc_state->num_active_planes >= 1) {
> > > -		act_plane = crtc_state->active_planes[0];
> > > -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> > > -			primary_plane_info = act_plane->frame_info;
> > > -	}
> > > -
> > > -	if (!primary_plane_info)
> > > -		return;
> > > -
> > >   	if (wb_pending)
> > > -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
> > > +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
> > > +	else
> > > +		ret = compose_active_planes(NULL, crtc_state, &crc32);
> > > -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
> > > -				    crtc_state);
> > > -	if (ret) {
> > > -		if (ret == -EINVAL && !wb_pending)
> > > -			kvfree(vaddr_out);
> > > +	if (ret)
> > >   		return;
> > > -	}
> > > -
> > > -	crc32 = compute_crc(vaddr_out, primary_plane_info);
> > >   	if (wb_pending) {
> > >   		drm_writeback_signal_completion(&out->wb_connector, 0);
> > >   		spin_lock_irq(&out->composer_lock);
> > >   		crtc_state->wb_pending = false;
> > >   		spin_unlock_irq(&out->composer_lock);
> > > -	} else {
> > > -		kvfree(vaddr_out);
> > >   	}
> > >   	/*
> > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > new file mode 100644
> > > index 000000000000..ca4bfcac686b
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > @@ -0,0 +1,155 @@
> > > +// SPDX-License-Identifier: GPL-2.0+
> > > +
> > > +#include <drm/drm_rect.h>
> > > +#include <linux/minmax.h>
> > > +
> > > +#include "vkms_formats.h"
> > > +
> > > +static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > +{
> > > +	return frame_info->offset + (y * frame_info->pitch)
> > > +				  + (x * frame_info->cpp);
> > > +}
> > > +
> > > +/*
> > > + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> > > + *
> > > + * @frame_info: Buffer metadata
> > > + * @x: The x(width) coordinate of the 2D buffer
> > > + * @y: The y(Heigth) coordinate of the 2D buffer
> > > + *
> > > + * Takes the information stored in the frame_info, a pair of coordinates, and
> > > + * returns the address of the first color channel.
> > > + * This function assumes the channels are packed together, i.e. a color channel
> > > + * comes immediately after another in the memory. And therefore, this function
> > > + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> > > + */
> > > +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > > +				int x, int y)
> > > +{
> > > +	size_t offset = pixel_offset(frame_info, x, y);
> > > +
> > > +	return (u8 *)frame_info->map[0].vaddr + offset;
> > > +}
> > > +
> > > +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > > +{
> > > +	int x_src = frame_info->src.x1 >> 16;
> > > +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> > > +
> > > +	return packed_pixels_addr(frame_info, x_src, y_src);
> > > +}
> > > +
> > > +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> > > +				 const struct vkms_frame_info *frame_info, int y)
> > > +{
> > > +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > +			    stage_buffer->n_pixels);
> > > +
> > > +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
> > > +		/*
> > > +		 * The 257 is the "conversion ratio". This number is obtained by the
> > > +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> > > +		 * the best color value in a pixel format with more possibilities.
> > > +		 * A similar idea applies to others RGB color conversions.
> > > +		 */
> > > +		out_pixels[x].a = (u16)src_pixels[3] * 257;
> > > +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> > > +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> > > +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> > > +	}
> > > +}
> > > +
> > > +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> > > +				 const struct vkms_frame_info *frame_info, int y)
> > > +{
> > > +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > +			    stage_buffer->n_pixels);
> > > +
> > > +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
> > > +		out_pixels[x].a = (u16)0xffff;
> > > +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> > > +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> > > +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> > > +	}
> > > +}
> > > +
> > > +/*
> > > + * The following  functions take an line of argb_u16 pixels from the
> > > + * src_buffer, convert them to a specific format, and store them in the
> > > + * destination.
> > > + *
> > > + * They are used in the `compose_active_planes` to convert and store a line
> > > + * from the src_buffer to the writeback buffer.
> > > + */
> > > +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
> > > +				 const struct line_buffer *src_buffer, int y)
> > > +{
> > > +	int x_dst = frame_info->dst.x1;
> > > +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > +			    src_buffer->n_pixels);
> > > +
> > > +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
> > > +		/*
> > > +		 * This sequence below is important because the format's byte order is
> > > +		 * in little-endian. In the case of the ARGB8888 the memory is
> > > +		 * organized this way:
> > > +		 *
> > > +		 * | Addr     | = blue channel
> > > +		 * | Addr + 1 | = green channel
> > > +		 * | Addr + 2 | = Red channel
> > > +		 * | Addr + 3 | = Alpha channel
> > > +		 */
> > > +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
> > > +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> > > +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> > > +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> > > +	}
> > > +}
> > > +
> > > +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> > > +				 const struct line_buffer *src_buffer, int y)
> > > +{
> > > +	int x_dst = frame_info->dst.x1;
> > > +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > +			    src_buffer->n_pixels);
> > > +
> > > +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
> > > +		dst_pixels[3] = 0xff;
> > > +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> > > +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> > > +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> > > +	}
> > > +}
> > > +
> > > +frame_to_line_func get_frame_to_line_function(u32 format)
> > > +{
> > > +	switch (format) {
> > > +	case DRM_FORMAT_ARGB8888:
> > > +		return &ARGB8888_to_argb_u16;
> > > +	case DRM_FORMAT_XRGB8888:
> > > +		return &XRGB8888_to_argb_u16;
> > > +	default:
> > > +		return NULL;
> > > +	}
> > > +}
> > > +
> > > +line_to_frame_func get_line_to_frame_function(u32 format)
> > > +{
> > > +	switch (format) {
> > > +	case DRM_FORMAT_ARGB8888:
> > > +		return &argb_u16_to_ARGB8888;
> > > +	case DRM_FORMAT_XRGB8888:
> > > +		return &argb_u16_to_XRGB8888;
> > > +	default:
> > > +		return NULL;
> > > +	}
> > > +}
> > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> > > new file mode 100644
> > > index 000000000000..053ca42d5b31
> > > --- /dev/null
> > > +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> > > @@ -0,0 +1,12 @@
> > > +// SPDX-License-Identifier: GPL-2.0+
> > > +
> > > +#ifndef _VKMS_FORMATS_H_
> > > +#define _VKMS_FORMATS_H_
> > > +
> > > +#include "vkms_drv.h"
> > > +
> > > +frame_to_line_func get_frame_to_line_function(u32 format);
> > > +
> > > +line_to_frame_func get_line_to_frame_function(u32 format);
> > > +
> > > +#endif /* _VKMS_FORMATS_H_ */
> > > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > > index 8adbfdc05e50..7a479a714565 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > > @@ -10,6 +10,7 @@
> > >   #include <drm/drm_plane_helper.h>
> > >   #include "vkms_drv.h"
> > > +#include "vkms_formats.h"
> > ^ this line no longer applies (needs to rebase), but I can manage it before apply to drm-misc-next
> 
> I did the rebase and I hadn't any issues.
> 
> I'm using `git://anongit.freedesktop.org/drm/drm-misc` remote. Should I be
> using another git remote for vkms?

hmmm... the repository is correct, maybe double check if you are rebase
on top of `drm-misc-next` branch and it is updated.

For reference, I tried to apply your series on top of this commit:
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ee50b00244086453dfb7076e4b80214948cd0507

Also, if you compare the line right above the `#include vkms_drv.h` here:
https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/vkms/vkms_plane.c#n9
from your diff and the current file, it is not the same.

Can you double check it, please?

Thanks,

Melissa
> 
> > >   static const u32 vkms_formats[] = {
> > >   	DRM_FORMAT_XRGB8888,
> > > @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > >   	struct drm_shadow_plane_state *shadow_plane_state;
> > >   	struct drm_framebuffer *fb = new_state->fb;
> > >   	struct vkms_frame_info *frame_info;
> > > +	u32 fmt = fb->format->format;
> > >   	if (!new_state->crtc || !fb)
> > >   		return;
> > > @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > >   	frame_info->offset = fb->offsets[0];
> > >   	frame_info->pitch = fb->pitches[0];
> > >   	frame_info->cpp = fb->format->cpp[0];
> > > +	vkms_plane_state->plane_read = get_frame_to_line_function(fmt);
> > >   }
> > >   static int vkms_plane_atomic_check(struct drm_plane *plane,
> > > diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> > > index c87f6c89e7b4..d2aabb52cb46 100644
> > > --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> > > +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> > > @@ -11,6 +11,7 @@
> > >   #include <drm/drm_gem_shmem_helper.h>
> > >   #include "vkms_drv.h"
> > > +#include "vkms_formats.h"
> > >   static const u32 vkms_wb_formats[] = {
> > >   	DRM_FORMAT_XRGB8888,
> > > @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
> > >   	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
> > >   	struct vkms_writeback_job *active_wb;
> > >   	struct vkms_frame_info *wb_frame_info;
> > > +	u32 wb_format = fb->format->format;
> > >   	if (!conn_state)
> > >   		return;
> > > @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
> > >   	crtc_state->wb_pending = true;
> > >   	spin_unlock_irq(&output->composer_lock);
> > >   	drm_writeback_queue_job(wb_conn, connector_state);
> > > +	active_wb->wb_write = get_line_to_frame_function(wb_format);
> > >   }
> > >   static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
> > > -- 
> > > 2.30.2
> > > 
> 

Re: [RESEND v6 2/9] drm: vkms: Rename `vkms_composer` to `vkms_frame_info`

Igor Matheus Andrade Torrente <igormtorrente@gmail.com>
Details
Message ID
<f360ee0f-6e34-7f32-7b4f-d608833a335e@gmail.com>
In-Reply-To
<20220822183735.fgddmurhgs472tz2@mail.igalia.com> (view parent)
DKIM signature
missing
Download raw message
On 8/22/22 15:37, Melissa Wen wrote:
> On 08/22, Igor Matheus Andrade Torrente wrote:
>> Hi Mellisa,
>>
>> On 8/20/22 08:00, Melissa Wen wrote:
>>> On 08/19, Igor Torrente wrote:
>>>> Changes the name of this struct to a more meaningful name.
>>>> A name that represents better what this struct is about.
>>>>
>>>> Composer is the code that do the compositing of the planes.
>>>> This struct contains information on the frame used in the output
>>>> composition. Thus, vkms_frame_info is a better name to represent
>>>> this.
>>>>
>>>> V5: Fix a commit message typo(Melissa Wen).
>>>>
>>>> Reviewed-by: Melissa Wen <mwen@igalia.com>
>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>>> ---
>>>>    drivers/gpu/drm/vkms/vkms_composer.c | 87 ++++++++++++++--------------
>>>>    drivers/gpu/drm/vkms/vkms_drv.h      |  6 +-
>>>>    drivers/gpu/drm/vkms/vkms_plane.c    | 38 ++++++------
>>>>    3 files changed, 66 insertions(+), 65 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> index 775b97766e08..0aded4e87e60 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> @@ -11,11 +11,11 @@
>>>>    #include "vkms_drv.h"
>>>>    static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>>>> -				 const struct vkms_composer *composer)
>>>> +				 const struct vkms_frame_info *frame_info)
>>>>    {
>>>>    	u32 pixel;
>>>> -	int src_offset = composer->offset + (y * composer->pitch)
>>>> -				      + (x * composer->cpp);
>>>> +	int src_offset = frame_info->offset + (y * frame_info->pitch)
>>>> +					    + (x * frame_info->cpp);
>>>>    	pixel = *(u32 *)&buffer[src_offset];
>>>> @@ -26,24 +26,24 @@ static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>>>>     * compute_crc - Compute CRC value on output frame
>>>>     *
>>>>     * @vaddr: address to final framebuffer
>>>> - * @composer: framebuffer's metadata
>>>> + * @frame_info: framebuffer's metadata
>>>>     *
>>>>     * returns CRC value computed using crc32 on the visible portion of
>>>>     * the final framebuffer at vaddr_out
>>>>     */
>>>>    static uint32_t compute_crc(const u8 *vaddr,
>>>> -			    const struct vkms_composer *composer)
>>>> +			    const struct vkms_frame_info *frame_info)
>>>>    {
>>>>    	int x, y;
>>>>    	u32 crc = 0, pixel = 0;
>>>> -	int x_src = composer->src.x1 >> 16;
>>>> -	int y_src = composer->src.y1 >> 16;
>>>> -	int h_src = drm_rect_height(&composer->src) >> 16;
>>>> -	int w_src = drm_rect_width(&composer->src) >> 16;
>>>> +	int x_src = frame_info->src.x1 >> 16;
>>>> +	int y_src = frame_info->src.y1 >> 16;
>>>> +	int h_src = drm_rect_height(&frame_info->src) >> 16;
>>>> +	int w_src = drm_rect_width(&frame_info->src) >> 16;
>>>>    	for (y = y_src; y < y_src + h_src; ++y) {
>>>>    		for (x = x_src; x < x_src + w_src; ++x) {
>>>> -			pixel = get_pixel_from_buffer(x, y, vaddr, composer);
>>>> +			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
>>>>    			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
>>>>    		}
>>>>    	}
>>>> @@ -98,8 +98,8 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>>>>     * blend - blend value at vaddr_src with value at vaddr_dst
>>>>     * @vaddr_dst: destination address
>>>>     * @vaddr_src: source address
>>>> - * @dst_composer: destination framebuffer's metadata
>>>> - * @src_composer: source framebuffer's metadata
>>>> + * @dst_frame_info: destination framebuffer's metadata
>>>> + * @src_frame_info: source framebuffer's metadata
>>>>     * @pixel_blend: blending equation based on plane format
>>>>     *
>>>>     * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
>>>> @@ -111,33 +111,33 @@ static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>>>>     * pixel color values
>>>>     */
>>>>    static void blend(void *vaddr_dst, void *vaddr_src,
>>>> -		  struct vkms_composer *dst_composer,
>>>> -		  struct vkms_composer *src_composer,
>>>> +		  struct vkms_frame_info *dst_frame_info,
>>>> +		  struct vkms_frame_info *src_frame_info,
>>>>    		  void (*pixel_blend)(const u8 *, u8 *))
>>>>    {
>>>>    	int i, j, j_dst, i_dst;
>>>>    	int offset_src, offset_dst;
>>>>    	u8 *pixel_dst, *pixel_src;
>>>> -	int x_src = src_composer->src.x1 >> 16;
>>>> -	int y_src = src_composer->src.y1 >> 16;
>>>> +	int x_src = src_frame_info->src.x1 >> 16;
>>>> +	int y_src = src_frame_info->src.y1 >> 16;
>>>> -	int x_dst = src_composer->dst.x1;
>>>> -	int y_dst = src_composer->dst.y1;
>>>> -	int h_dst = drm_rect_height(&src_composer->dst);
>>>> -	int w_dst = drm_rect_width(&src_composer->dst);
>>>> +	int x_dst = src_frame_info->dst.x1;
>>>> +	int y_dst = src_frame_info->dst.y1;
>>>> +	int h_dst = drm_rect_height(&src_frame_info->dst);
>>>> +	int w_dst = drm_rect_width(&src_frame_info->dst);
>>>>    	int y_limit = y_src + h_dst;
>>>>    	int x_limit = x_src + w_dst;
>>>>    	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
>>>>    		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
>>>> -			offset_dst = dst_composer->offset
>>>> -				     + (i_dst * dst_composer->pitch)
>>>> -				     + (j_dst++ * dst_composer->cpp);
>>>> -			offset_src = src_composer->offset
>>>> -				     + (i * src_composer->pitch)
>>>> -				     + (j * src_composer->cpp);
>>>> +			offset_dst = dst_frame_info->offset
>>>> +				     + (i_dst * dst_frame_info->pitch)
>>>> +				     + (j_dst++ * dst_frame_info->cpp);
>>>> +			offset_src = src_frame_info->offset
>>>> +				     + (i * src_frame_info->pitch)
>>>> +				     + (j * src_frame_info->cpp);
>>>>    			pixel_src = (u8 *)(vaddr_src + offset_src);
>>>>    			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
>>>> @@ -149,32 +149,33 @@ static void blend(void *vaddr_dst, void *vaddr_src,
>>>>    	}
>>>>    }
>>>> -static void compose_plane(struct vkms_composer *primary_composer,
>>>> -			  struct vkms_composer *plane_composer,
>>>> +static void compose_plane(struct vkms_frame_info *primary_plane_info,
>>>> +			  struct vkms_frame_info *plane_frame_info,
>>>>    			  void *vaddr_out)
>>>>    {
>>>> -	struct drm_framebuffer *fb = &plane_composer->fb;
>>>> +	struct drm_framebuffer *fb = &plane_frame_info->fb;
>>>>    	void *vaddr;
>>>>    	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>>>> -	if (WARN_ON(iosys_map_is_null(&plane_composer->map[0])))
>>>> +	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
>>> ^ here you are reintroducing an error that we were checking primary
>>> plane repeatedly, instead of plane_composer (renamed to:
>>> plane_frame_info here). The issue is fixed in a following patch of this
>>> series when you decouple check_iosys_map.
>>> But I don't mind fixing it before apply.
>> Should I simply delete this line in the patch? Or there's something
>> else to do?
> 
> No, you just need to check the correct plane (plane_frame_info), that means:
> 
> - if (WARN_ON(iosys_map_is_null(&plane_composer->map[0])))
> + if (WARN_ON(iosys_map_is_null(&plane_frame_info->map[0])))
> 
> because here you are renaming `plane_composer` to `plane_frame_info`,
> and `primary_plane_info->map[0]` is already checked in the following
> compose_active_planes() function.

Ohh. Got it now!

Thanks!

> 
> Thanks,
> 
> Melissa
> 
>>
>>>
>>>>    		return;
>>>> -	vaddr = plane_composer->map[0].vaddr;
>>>> +	vaddr = plane_frame_info->map[0].vaddr;
>>>>    	if (fb->format->format == DRM_FORMAT_ARGB8888)
>>>>    		pixel_blend = &alpha_blend;
>>>>    	else
>>>>    		pixel_blend = &x_blend;
>>>> -	blend(vaddr_out, vaddr, primary_composer, plane_composer, pixel_blend);
>>>> +	blend(vaddr_out, vaddr, primary_plane_info,
>>>> +	      plane_frame_info, pixel_blend);
>>>>    }
>>>>    static int compose_active_planes(void **vaddr_out,
>>>> -				 struct vkms_composer *primary_composer,
>>>> +				 struct vkms_frame_info *primary_plane_info,
>>>>    				 struct vkms_crtc_state *crtc_state)
>>>>    {
>>>> -	struct drm_framebuffer *fb = &primary_composer->fb;
>>>> +	struct drm_framebuffer *fb = &primary_plane_info->fb;
>>>>    	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>>>>    	const void *vaddr;
>>>>    	int i;
>>>> @@ -187,10 +188,10 @@ static int compose_active_planes(void **vaddr_out,
>>>>    		}
>>>>    	}
>>>> -	if (WARN_ON(iosys_map_is_null(&primary_composer->map[0])))
>>>> +	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
>>>>    		return -EINVAL;
>>>> -	vaddr = primary_composer->map[0].vaddr;
>>>> +	vaddr = primary_plane_info->map[0].vaddr;
>>>>    	memcpy(*vaddr_out, vaddr, gem_obj->size);
>>>> @@ -199,8 +200,8 @@ static int compose_active_planes(void **vaddr_out,
>>>>    	 * ((primary <- overlay) <- cursor)
>>>>    	 */
>>>>    	for (i = 1; i < crtc_state->num_active_planes; i++)
>>>> -		compose_plane(primary_composer,
>>>> -			      crtc_state->active_planes[i]->composer,
>>>> +		compose_plane(primary_plane_info,
>>>> +			      crtc_state->active_planes[i]->frame_info,
>>>>    			      *vaddr_out);
>>>>    	return 0;
>>>> @@ -222,7 +223,7 @@ void vkms_composer_worker(struct work_struct *work)
>>>>    						composer_work);
>>>>    	struct drm_crtc *crtc = crtc_state->base.crtc;
>>>>    	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>>> -	struct vkms_composer *primary_composer = NULL;
>>>> +	struct vkms_frame_info *primary_plane_info = NULL;
>>>>    	struct vkms_plane_state *act_plane = NULL;
>>>>    	bool crc_pending, wb_pending;
>>>>    	void *vaddr_out = NULL;
>>>> @@ -250,16 +251,16 @@ void vkms_composer_worker(struct work_struct *work)
>>>>    	if (crtc_state->num_active_planes >= 1) {
>>>>    		act_plane = crtc_state->active_planes[0];
>>>>    		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>> -			primary_composer = act_plane->composer;
>>>> +			primary_plane_info = act_plane->frame_info;
>>>>    	}
>>>> -	if (!primary_composer)
>>>> +	if (!primary_plane_info)
>>>>    		return;
>>>>    	if (wb_pending)
>>>>    		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
>>>> -	ret = compose_active_planes(&vaddr_out, primary_composer,
>>>> +	ret = compose_active_planes(&vaddr_out, primary_plane_info,
>>>>    				    crtc_state);
>>>>    	if (ret) {
>>>>    		if (ret == -EINVAL && !wb_pending)
>>>> @@ -267,7 +268,7 @@ void vkms_composer_worker(struct work_struct *work)
>>>>    		return;
>>>>    	}
>>>> -	crc32 = compute_crc(vaddr_out, primary_composer);
>>>> +	crc32 = compute_crc(vaddr_out, primary_plane_info);
>>>>    	if (wb_pending) {
>>>>    		drm_writeback_signal_completion(&out->wb_connector, 0);
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_drv.h b/drivers/gpu/drm/vkms/vkms_drv.h
>>>> index 36fbab5989d1..5199c5f18e17 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_drv.h
>>>> +++ b/drivers/gpu/drm/vkms/vkms_drv.h
>>>> @@ -27,7 +27,7 @@ struct vkms_writeback_job {
>>>>    	struct iosys_map data[DRM_FORMAT_MAX_PLANES];
>>>>    };
>>>> -struct vkms_composer {
>>>> +struct vkms_frame_info {
>>>>    	struct drm_framebuffer fb;
>>>>    	struct drm_rect src, dst;
>>>>    	struct iosys_map map[DRM_FORMAT_MAX_PLANES];
>>>> @@ -39,11 +39,11 @@ struct vkms_composer {
>>>>    /**
>>>>     * vkms_plane_state - Driver specific plane state
>>>>     * @base: base plane state
>>>> - * @composer: data required for composing computation
>>>> + * @frame_info: data required for composing computation
>>>>     */
>>>>    struct vkms_plane_state {
>>>>    	struct drm_shadow_plane_state base;
>>>> -	struct vkms_composer *composer;
>>>> +	struct vkms_frame_info *frame_info;
>>>>    };
>>>>    struct vkms_plane {
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>>>> index d8eb674b49a6..fcae6c508f4b 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>>>> @@ -24,20 +24,20 @@ static struct drm_plane_state *
>>>>    vkms_plane_duplicate_state(struct drm_plane *plane)
>>>>    {
>>>>    	struct vkms_plane_state *vkms_state;
>>>> -	struct vkms_composer *composer;
>>>> +	struct vkms_frame_info *frame_info;
>>>>    	vkms_state = kzalloc(sizeof(*vkms_state), GFP_KERNEL);
>>>>    	if (!vkms_state)
>>>>    		return NULL;
>>>> -	composer = kzalloc(sizeof(*composer), GFP_KERNEL);
>>>> -	if (!composer) {
>>>> -		DRM_DEBUG_KMS("Couldn't allocate composer\n");
>>>> +	frame_info = kzalloc(sizeof(*frame_info), GFP_KERNEL);
>>>> +	if (!frame_info) {
>>>> +		DRM_DEBUG_KMS("Couldn't allocate frame_info\n");
>>>>    		kfree(vkms_state);
>>>>    		return NULL;
>>>>    	}
>>>> -	vkms_state->composer = composer;
>>>> +	vkms_state->frame_info = frame_info;
>>>>    	__drm_gem_duplicate_shadow_plane_state(plane, &vkms_state->base);
>>>> @@ -54,12 +54,12 @@ static void vkms_plane_destroy_state(struct drm_plane *plane,
>>>>    		/* dropping the reference we acquired in
>>>>    		 * vkms_primary_plane_update()
>>>>    		 */
>>>> -		if (drm_framebuffer_read_refcount(&vkms_state->composer->fb))
>>>> -			drm_framebuffer_put(&vkms_state->composer->fb);
>>>> +		if (drm_framebuffer_read_refcount(&vkms_state->frame_info->fb))
>>>> +			drm_framebuffer_put(&vkms_state->frame_info->fb);
>>>>    	}
>>>> -	kfree(vkms_state->composer);
>>>> -	vkms_state->composer = NULL;
>>>> +	kfree(vkms_state->frame_info);
>>>> +	vkms_state->frame_info = NULL;
>>>>    	__drm_gem_destroy_shadow_plane_state(&vkms_state->base);
>>>>    	kfree(vkms_state);
>>>> @@ -99,7 +99,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>>>    	struct vkms_plane_state *vkms_plane_state;
>>>>    	struct drm_shadow_plane_state *shadow_plane_state;
>>>>    	struct drm_framebuffer *fb = new_state->fb;
>>>> -	struct vkms_composer *composer;
>>>> +	struct vkms_frame_info *frame_info;
>>>>    	if (!new_state->crtc || !fb)
>>>>    		return;
>>>> @@ -107,15 +107,15 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>>>    	vkms_plane_state = to_vkms_plane_state(new_state);
>>>>    	shadow_plane_state = &vkms_plane_state->base;
>>>> -	composer = vkms_plane_state->composer;
>>>> -	memcpy(&composer->src, &new_state->src, sizeof(struct drm_rect));
>>>> -	memcpy(&composer->dst, &new_state->dst, sizeof(struct drm_rect));
>>>> -	memcpy(&composer->fb, fb, sizeof(struct drm_framebuffer));
>>>> -	memcpy(&composer->map, &shadow_plane_state->data, sizeof(composer->map));
>>>> -	drm_framebuffer_get(&composer->fb);
>>>> -	composer->offset = fb->offsets[0];
>>>> -	composer->pitch = fb->pitches[0];
>>>> -	composer->cpp = fb->format->cpp[0];
>>>> +	frame_info = vkms_plane_state->frame_info;
>>>> +	memcpy(&frame_info->src, &new_state->src, sizeof(struct drm_rect));
>>>> +	memcpy(&frame_info->dst, &new_state->dst, sizeof(struct drm_rect));
>>>> +	memcpy(&frame_info->fb, fb, sizeof(struct drm_framebuffer));
>>>> +	memcpy(&frame_info->map, &shadow_plane_state->data, sizeof(frame_info->map));
>>>> +	drm_framebuffer_get(&frame_info->fb);
>>>> +	frame_info->offset = fb->offsets[0];
>>>> +	frame_info->pitch = fb->pitches[0];
>>>> +	frame_info->cpp = fb->format->cpp[0];
>>>>    }
>>>>    static int vkms_plane_atomic_check(struct drm_plane *plane,
>>>> -- 
>>>> 2.30.2
>>>>
>>

Re: [RESEND v6 6/9] drm: vkms: Refactor the plane composer to accept new formats

Igor Matheus Andrade Torrente <igormtorrente@gmail.com>
Details
Message ID
<b98412ed-9ae7-49fa-bdb8-53e589d14945@gmail.com>
In-Reply-To
<20220822190110.u4evrujigrrcp3ud@mail.igalia.com> (view parent)
DKIM signature
missing
Download raw message
On 8/22/22 16:01, Melissa Wen wrote:
> On 08/22, Igor Matheus Andrade Torrente wrote:
>> Hi Melissa,
>>
>> On 8/20/22 07:51, Melissa Wen wrote:
>>> On 08/19, Igor Torrente wrote:
>>>> Currently the blend function only accepts XRGB_8888 and ARGB_8888
>>>> as a color input.
>>>>
>>>> This patch refactors all the functions related to the plane composition
>>>> to overcome this limitation.
>>>>
>>>> The pixels blend is done using the new internal format. And new handlers
>>>> are being added to convert a specific format to/from this internal format.
>>>>
>>>> So the blend operation depends on these handlers to convert to this common
>>>> format. The blended result, if necessary, is converted to the writeback
>>>> buffer format.
>>>>
>>>> This patch introduces three major differences to the blend function.
>>>> 1 - All the planes are blended at once.
>>>> 2 - The blend calculus is done as per line instead of per pixel.
>>>> 3 - It is responsible to calculates the CRC and writing the writeback
>>>> buffer(if necessary).
>>>>
>>>> These changes allow us to allocate way less memory in the intermediate
>>>> buffer to compute these operations. Because now we don't need to
>>>> have the entire intermediate image lines at once, just one line is
>>>> enough.
>>>>
>>>> | Memory consumption (output dimensions) |
>>>> |:--------------------------------------:|
>>>> |       Current      |     This patch    |
>>>> |:------------------:|:-----------------:|
>>>> |   Width * Heigth   |     2 * Width     |
>>>>
>>>> Beyond memory, we also have a minor performance benefit from all
>>>> these changes. Results running the IGT[1] test
>>>> `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
>>>>
>>>> |                 Frametime                  |
>>>> |:------------------------------------------:|
>>>> |  Implementation |  Current  |  This commit |
>>>> |:---------------:|:---------:|:------------:|
>>>> | frametime range |  9~22 ms  |    5~17 ms   |
>>>> |     Average     |  11.4 ms  |    7.8 ms    |
>>>>
>>>> [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
>>>>
>>>> V2: Improves the performance drastically, by performing the operations
>>>>       per-line and not per-pixel(Pekka Paalanen).
>>>>       Minor improvements(Pekka Paalanen).
>>>> V3: Changes the code to blend the planes all at once. This improves
>>>>       performance, memory consumption, and removes much of the weirdness
>>>>       of the V2(Pekka Paalanen and me).
>>>>       Minor improvements(Pekka Paalanen and me).
>>>> V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
>>>> V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
>>>>       Several security/robustness improvents(Pekka Paalanen).
>>>>       Removes check_planes_x_bounds function and allows partial
>>>>       partly off-screen(Pekka Paalanen).
>>>> V6: Fix a mismatch of some variable sizes (Pekka Paalanen).
>>>>       Several minor improvements (Pekka Paalanen).
>>>>
>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>> Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
>>>> ---
>>>>    Documentation/gpu/vkms.rst            |   4 -
>>>>    drivers/gpu/drm/vkms/Makefile         |   1 +
>>>>    drivers/gpu/drm/vkms/vkms_composer.c  | 320 ++++++++++++--------------
>>>>    drivers/gpu/drm/vkms/vkms_formats.c   | 155 +++++++++++++
>>>>    drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
>>>>    drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
>>>>    drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
>>>>    7 files changed, 317 insertions(+), 181 deletions(-)
>>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
>>>>    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
>>>>
>>>> diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
>>>> index 973e2d43108b..a49e4ae92653 100644
>>>> --- a/Documentation/gpu/vkms.rst
>>>> +++ b/Documentation/gpu/vkms.rst
>>>> @@ -118,10 +118,6 @@ Add Plane Features
>>>>    There's lots of plane features we could add support for:
>>>> -- Clearing primary plane: clear primary plane before plane composition (at the
>>>> -  start) for correctness of pixel blend ops. It also guarantees alpha channel
>>>> -  is cleared in the target buffer for stable crc. [Good to get started]
>>>> -
>>>>    - ARGB format on primary plane: blend the primary plane into background with
>>>>      translucent alpha.
>>>> diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
>>>> index 72f779cbfedd..1b28a6a32948 100644
>>>> --- a/drivers/gpu/drm/vkms/Makefile
>>>> +++ b/drivers/gpu/drm/vkms/Makefile
>>>> @@ -3,6 +3,7 @@ vkms-y := \
>>>>    	vkms_drv.o \
>>>>    	vkms_plane.o \
>>>>    	vkms_output.o \
>>>> +	vkms_formats.o \
>>>>    	vkms_crtc.o \
>>>>    	vkms_composer.o \
>>>>    	vkms_writeback.o
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> index b9fb408e8973..5b1a8bdd8268 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_composer.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_composer.c
>>>> @@ -7,204 +7,188 @@
>>>>    #include <drm/drm_fourcc.h>
>>>>    #include <drm/drm_gem_framebuffer_helper.h>
>>>>    #include <drm/drm_vblank.h>
>>>> +#include <linux/minmax.h>
>>>>    #include "vkms_drv.h"
>>>> -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
>>>> -				 const struct vkms_frame_info *frame_info)
>>>> +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
>>>>    {
>>>> -	u32 pixel;
>>>> -	int src_offset = frame_info->offset + (y * frame_info->pitch)
>>>> -					    + (x * frame_info->cpp);
>>>> +	u32 new_color;
>>>> -	pixel = *(u32 *)&buffer[src_offset];
>>>> +	new_color = (src * 0xffff + dst * (0xffff - alpha));
>>>> -	return pixel;
>>>> +	return DIV_ROUND_CLOSEST(new_color, 0xffff);
>>>>    }
>>>>    /**
>>>> - * compute_crc - Compute CRC value on output frame
>>>> + * pre_mul_alpha_blend - alpha blending equation
>>>> + * @src_frame_info: source framebuffer's metadata
>>>> + * @stage_buffer: The line with the pixels from src_plane
>>>> + * @output_buffer: A line buffer that receives all the blends output
>>>>     *
>>>> - * @vaddr: address to final framebuffer
>>>> - * @frame_info: framebuffer's metadata
>>>> + * Using the information from the `frame_info`, this blends only the
>>>> + * necessary pixels from the `stage_buffer` to the `output_buffer`
>>>> + * using premultiplied blend formula.
>>>>     *
>>>> - * returns CRC value computed using crc32 on the visible portion of
>>>> - * the final framebuffer at vaddr_out
>>>> + * The current DRM assumption is that pixel color values have been already
>>>> + * pre-multiplied with the alpha channel values. See more
>>>> + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
>>>> + * completely opaque background.
>>>>     */
>>>> -static uint32_t compute_crc(const u8 *vaddr,
>>>> -			    const struct vkms_frame_info *frame_info)
>>>> +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
>>>> +				struct line_buffer *stage_buffer,
>>>> +				struct line_buffer *output_buffer)
>>>>    {
>>>> -	int x, y;
>>>> -	u32 crc = 0, pixel = 0;
>>>> -	int x_src = frame_info->src.x1 >> 16;
>>>> -	int y_src = frame_info->src.y1 >> 16;
>>>> -	int h_src = drm_rect_height(&frame_info->src) >> 16;
>>>> -	int w_src = drm_rect_width(&frame_info->src) >> 16;
>>>> -
>>>> -	for (y = y_src; y < y_src + h_src; ++y) {
>>>> -		for (x = x_src; x < x_src + w_src; ++x) {
>>>> -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
>>>> -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
>>>> -		}
>>>> +	int x_dst = frame_info->dst.x1;
>>>> +	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
>>>> +	struct pixel_argb_u16 *in = stage_buffer->pixels;
>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>> +			    stage_buffer->n_pixels);
>>>> +
>>>> +	for (int x = 0; x < x_limit; x++) {
>>>> +		out[x].a = (u16)0xffff;
>>>> +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
>>>> +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
>>>> +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
>>>>    	}
>>>> -
>>>> -	return crc;
>>>>    }
>>>> -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
>>>> +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
>>>>    {
>>>> -	u32 pre_blend;
>>>> -	u8 new_color;
>>>> -
>>>> -	pre_blend = (src * 255 + dst * (255 - alpha));
>>>> -
>>>> -	/* Faster div by 255 */
>>>> -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
>>>> +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
>>>> +		return true;
>>>> -	return new_color;
>>>> +	return false;
>>>>    }
>>>>    /**
>>>> - * alpha_blend - alpha blending equation
>>>> - * @argb_src: src pixel on premultiplied alpha mode
>>>> - * @argb_dst: dst pixel completely opaque
>>>> + * @wb_frame_info: The writeback frame buffer metadata
>>>> + * @crtc_state: The crtc state
>>>> + * @crc32: The crc output of the final frame
>>>> + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
>>>> + * @stage_buffer: The line with the pixels from plane being blend to the output
>>>>     *
>>>> - * blend pixels using premultiplied blend formula. The current DRM assumption
>>>> - * is that pixel color values have been already pre-multiplied with the alpha
>>>> - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
>>>> - * formula assumes a completely opaque background.
>>>> + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
>>>> + * from all planes, calculates the crc32 of the output from the former step,
>>>> + * and, if necessary, convert and store the output to the writeback buffer.
>>>>     */
>>>> -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
>>>> +static void blend(struct vkms_writeback_job *wb,
>>>> +		  struct vkms_crtc_state *crtc_state,
>>>> +		  u32 *crc32, struct line_buffer *stage_buffer,
>>>> +		  struct line_buffer *output_buffer, size_t row_size)
>>>>    {
>>>> -	u8 alpha;
>>>> +	struct vkms_plane_state **plane = crtc_state->active_planes;
>>>> +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
>>>> +	u32 n_active_planes = crtc_state->num_active_planes;
>>>> +
>>>> +	int y_dst = primary_plane_info->dst.y1;
>>>> +	int h_dst = drm_rect_height(&primary_plane_info->dst);
>>>> +	int y_limit = y_dst + h_dst;
>>>> +
>>>> +	for (size_t y = y_dst; y < y_limit; y++) {
>>>> +		plane[0]->plane_read(output_buffer, primary_plane_info, y);
>>>> +
>>>> +		/* If there are other planes besides primary, we consider the active
>>>> +		 * planes should be in z-order and compose them associatively:
>>>> +		 * ((primary <- overlay) <- cursor)
>>>> +		 */
>>>> +		for (size_t i = 1; i < n_active_planes; i++) {
>>>> +			if (!check_y_limit(plane[i]->frame_info, y))
>>>> +				continue;
>>>> +
>>>> +			plane[i]->plane_read(stage_buffer, plane[i]->frame_info, y);
>>>> +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
>>>> +					    output_buffer);
>>>> +		}
>>>> +
>>>> +		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
>>>> -	alpha = argb_src[3];
>>>> -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
>>>> -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
>>>> -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
>>>> +		if (wb)
>>>> +			wb->wb_write(&wb->wb_frame_info, output_buffer, y);
>>>> +	}
>>>>    }
>>>> -/**
>>>> - * x_blend - blending equation that ignores the pixel alpha
>>>> - *
>>>> - * overwrites RGB color value from src pixel to dst pixel.
>>>> - */
>>>> -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
>>>> +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
>>>> +			      struct vkms_writeback_job *active_wb)
>>>>    {
>>>> -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
>>>> +	struct vkms_plane_state **planes = crtc_state->active_planes;
>>>> +	u32 n_active_planes = crtc_state->num_active_planes;
>>>> +
>>>> +	for (size_t i = 0; i < n_active_planes; i++)
>>>> +		if (!planes[i]->plane_read)
>>>> +			return -1;
>>>> +
>>>> +	if (active_wb && !active_wb->wb_write)
>>>> +		return -1;
>>>> +
>>>> +	return 0;
>>>>    }
>>>> -/**
>>>> - * blend - blend value at vaddr_src with value at vaddr_dst
>>>> - * @vaddr_dst: destination address
>>>> - * @vaddr_src: source address
>>>> - * @dst_frame_info: destination framebuffer's metadata
>>>> - * @src_frame_info: source framebuffer's metadata
>>>> - * @pixel_blend: blending equation based on plane format
>>>> - *
>>>> - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
>>>> - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
>>>> - * and clearing alpha channel to an completely opaque background. This function
>>>> - * uses buffer's metadata to locate the new composite values at vaddr_dst.
>>>> - *
>>>> - * TODO: completely clear the primary plane (a = 0xff) before starting to blend
>>>> - * pixel color values
>>>> - */
>>>> -static void blend(void *vaddr_dst, void *vaddr_src,
>>>> -		  struct vkms_frame_info *dst_frame_info,
>>>> -		  struct vkms_frame_info *src_frame_info,
>>>> -		  void (*pixel_blend)(const u8 *, u8 *))
>>>> +static int compose_active_planes(struct vkms_writeback_job *active_wb,
>>>> +				 struct vkms_crtc_state *crtc_state,
>>>> +				 u32 *crc32)
>>>>    {
>>>> -	int i, j, j_dst, i_dst;
>>>> -	int offset_src, offset_dst;
>>>> -	u8 *pixel_dst, *pixel_src;
>>>> -
>>>> -	int x_src = src_frame_info->src.x1 >> 16;
>>>> -	int y_src = src_frame_info->src.y1 >> 16;
>>>> -
>>>> -	int x_dst = src_frame_info->dst.x1;
>>>> -	int y_dst = src_frame_info->dst.y1;
>>>> -	int h_dst = drm_rect_height(&src_frame_info->dst);
>>>> -	int w_dst = drm_rect_width(&src_frame_info->dst);
>>>> -
>>>> -	int y_limit = y_src + h_dst;
>>>> -	int x_limit = x_src + w_dst;
>>>> -
>>>> -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
>>>> -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
>>>> -			offset_dst = dst_frame_info->offset
>>>> -				     + (i_dst * dst_frame_info->pitch)
>>>> -				     + (j_dst++ * dst_frame_info->cpp);
>>>> -			offset_src = src_frame_info->offset
>>>> -				     + (i * src_frame_info->pitch)
>>>> -				     + (j * src_frame_info->cpp);
>>>> -
>>>> -			pixel_src = (u8 *)(vaddr_src + offset_src);
>>>> -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
>>>> -			pixel_blend(pixel_src, pixel_dst);
>>>> -			/* clearing alpha channel (0xff)*/
>>>> -			pixel_dst[3] = 0xff;
>>>> -		}
>>>> -		i_dst++;
>>>> +	size_t line_width, pixel_size = sizeof(struct pixel_argb_u16);
>>>> +	struct vkms_frame_info *primary_plane_info = NULL;
>>>> +	struct line_buffer output_buffer, stage_buffer;
>>>> +	struct vkms_plane_state *act_plane = NULL;
>>>> +	int ret = 0;
>>>> +
>>>> +	/*
>>>> +	 * This check exists so we can call `crc32_le` for the entire line
>>>> +	 * instead doing it for each channel of each pixel in case
>>>> +	 * `struct `pixel_argb_u16` had any gap added by the compiler
>>>> +	 * between the struct fields.
>>>> +	 */
>>>> +	static_assert(sizeof(struct pixel_argb_u16) == 8);
>>>> +
>>>> +	if (crtc_state->num_active_planes >= 1) {
>>>> +		act_plane = crtc_state->active_planes[0];
>>>> +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>> +			primary_plane_info = act_plane->frame_info;
>>>>    	}
>>>> -}
>>>> -static void compose_plane(struct vkms_frame_info *primary_plane_info,
>>>> -			  struct vkms_frame_info *plane_frame_info,
>>>> -			  void *vaddr_out)
>>>> -{
>>>> -	struct drm_framebuffer *fb = plane_frame_info->fb;
>>>> -	void *vaddr;
>>>> -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
>>>> +	if (!primary_plane_info)
>>>> +		return -EINVAL;
>>>>    	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
>>>> -		return;
>>>> +		return -EINVAL;
>>>> -	vaddr = plane_frame_info->map[0].vaddr;
>>>> +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
>>>> +		return -EINVAL;
>>>> -	if (fb->format->format == DRM_FORMAT_ARGB8888)
>>>> -		pixel_blend = &alpha_blend;
>>>> -	else
>>>> -		pixel_blend = &x_blend;
>>>> +	line_width = drm_rect_width(&primary_plane_info->dst);
>>>> +	stage_buffer.n_pixels = line_width;
>>>> +	output_buffer.n_pixels = line_width;
>>>> -	blend(vaddr_out, vaddr, primary_plane_info,
>>>> -	      plane_frame_info, pixel_blend);
>>>> -}
>>>> +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>> +	if (!stage_buffer.pixels) {
>>>> +		DRM_ERROR("Cannot allocate memory for the output line buffer");
>>>> +		return -ENOMEM;
>>>> +	}
>>>> -static int compose_active_planes(void **vaddr_out,
>>>> -				 struct vkms_frame_info *primary_plane_info,
>>>> -				 struct vkms_crtc_state *crtc_state)
>>>> -{
>>>> -	struct drm_framebuffer *fb = primary_plane_info->fb;
>>>> -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
>>>> -	const void *vaddr;
>>>> -	int i;
>>>> -
>>>> -	if (!*vaddr_out) {
>>>> -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
>>>> -		if (!*vaddr_out) {
>>>> -			DRM_ERROR("Cannot allocate memory for output frame.");
>>>> -			return -ENOMEM;
>>>> -		}
>>>> +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
>>>> +	if (!output_buffer.pixels) {
>>>> +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
>>>> +		ret = -ENOMEM;
>>>> +		goto free_stage_buffer;
>>>>    	}
>>>> -	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
>>>> -		return -EINVAL;
>>>> +	if (active_wb) {
>>>> +		struct vkms_frame_info *wb_frame_info = &active_wb->wb_frame_info;
>>>> -	vaddr = primary_plane_info->map[0].vaddr;
>>>> +		wb_frame_info->src = primary_plane_info->src;
>>>> +		wb_frame_info->dst = primary_plane_info->dst;
>>>> +	}
>>>> -	memcpy(*vaddr_out, vaddr, gem_obj->size);
>>>> +	blend(active_wb, crtc_state, crc32, &stage_buffer,
>>>> +	      &output_buffer, line_width * pixel_size);
>>>> -	/* If there are other planes besides primary, we consider the active
>>>> -	 * planes should be in z-order and compose them associatively:
>>>> -	 * ((primary <- overlay) <- cursor)
>>>> -	 */
>>>> -	for (i = 1; i < crtc_state->num_active_planes; i++)
>>>> -		compose_plane(primary_plane_info,
>>>> -			      crtc_state->active_planes[i]->frame_info,
>>>> -			      *vaddr_out);
>>>> +	kvfree(output_buffer.pixels);
>>>> +free_stage_buffer:
>>>> +	kvfree(stage_buffer.pixels);
>>>> -	return 0;
>>>> +	return ret;
>>>>    }
>>>>    /**
>>>> @@ -222,13 +206,11 @@ void vkms_composer_worker(struct work_struct *work)
>>>>    						struct vkms_crtc_state,
>>>>    						composer_work);
>>>>    	struct drm_crtc *crtc = crtc_state->base.crtc;
>>>> +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
>>>>    	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
>>>> -	struct vkms_frame_info *primary_plane_info = NULL;
>>>> -	struct vkms_plane_state *act_plane = NULL;
>>>>    	bool crc_pending, wb_pending;
>>>> -	void *vaddr_out = NULL;
>>>> -	u32 crc32 = 0;
>>>>    	u64 frame_start, frame_end;
>>>> +	u32 crc32 = 0;
>>>>    	int ret;
>>>>    	spin_lock_irq(&out->composer_lock);
>>>> @@ -248,35 +230,19 @@ void vkms_composer_worker(struct work_struct *work)
>>>>    	if (!crc_pending)
>>>>    		return;
>>>> -	if (crtc_state->num_active_planes >= 1) {
>>>> -		act_plane = crtc_state->active_planes[0];
>>>> -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
>>>> -			primary_plane_info = act_plane->frame_info;
>>>> -	}
>>>> -
>>>> -	if (!primary_plane_info)
>>>> -		return;
>>>> -
>>>>    	if (wb_pending)
>>>> -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
>>>> +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
>>>> +	else
>>>> +		ret = compose_active_planes(NULL, crtc_state, &crc32);
>>>> -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
>>>> -				    crtc_state);
>>>> -	if (ret) {
>>>> -		if (ret == -EINVAL && !wb_pending)
>>>> -			kvfree(vaddr_out);
>>>> +	if (ret)
>>>>    		return;
>>>> -	}
>>>> -
>>>> -	crc32 = compute_crc(vaddr_out, primary_plane_info);
>>>>    	if (wb_pending) {
>>>>    		drm_writeback_signal_completion(&out->wb_connector, 0);
>>>>    		spin_lock_irq(&out->composer_lock);
>>>>    		crtc_state->wb_pending = false;
>>>>    		spin_unlock_irq(&out->composer_lock);
>>>> -	} else {
>>>> -		kvfree(vaddr_out);
>>>>    	}
>>>>    	/*
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> new file mode 100644
>>>> index 000000000000..ca4bfcac686b
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.c
>>>> @@ -0,0 +1,155 @@
>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>> +
>>>> +#include <drm/drm_rect.h>
>>>> +#include <linux/minmax.h>
>>>> +
>>>> +#include "vkms_formats.h"
>>>> +
>>>> +static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
>>>> +{
>>>> +	return frame_info->offset + (y * frame_info->pitch)
>>>> +				  + (x * frame_info->cpp);
>>>> +}
>>>> +
>>>> +/*
>>>> + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
>>>> + *
>>>> + * @frame_info: Buffer metadata
>>>> + * @x: The x(width) coordinate of the 2D buffer
>>>> + * @y: The y(Heigth) coordinate of the 2D buffer
>>>> + *
>>>> + * Takes the information stored in the frame_info, a pair of coordinates, and
>>>> + * returns the address of the first color channel.
>>>> + * This function assumes the channels are packed together, i.e. a color channel
>>>> + * comes immediately after another in the memory. And therefore, this function
>>>> + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
>>>> + */
>>>> +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
>>>> +				int x, int y)
>>>> +{
>>>> +	size_t offset = pixel_offset(frame_info, x, y);
>>>> +
>>>> +	return (u8 *)frame_info->map[0].vaddr + offset;
>>>> +}
>>>> +
>>>> +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
>>>> +{
>>>> +	int x_src = frame_info->src.x1 >> 16;
>>>> +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
>>>> +
>>>> +	return packed_pixels_addr(frame_info, x_src, y_src);
>>>> +}
>>>> +
>>>> +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>>> +				 const struct vkms_frame_info *frame_info, int y)
>>>> +{
>>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>> +			    stage_buffer->n_pixels);
>>>> +
>>>> +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
>>>> +		/*
>>>> +		 * The 257 is the "conversion ratio". This number is obtained by the
>>>> +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
>>>> +		 * the best color value in a pixel format with more possibilities.
>>>> +		 * A similar idea applies to others RGB color conversions.
>>>> +		 */
>>>> +		out_pixels[x].a = (u16)src_pixels[3] * 257;
>>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>>> +	}
>>>> +}
>>>> +
>>>> +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
>>>> +				 const struct vkms_frame_info *frame_info, int y)
>>>> +{
>>>> +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
>>>> +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>> +			    stage_buffer->n_pixels);
>>>> +
>>>> +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
>>>> +		out_pixels[x].a = (u16)0xffff;
>>>> +		out_pixels[x].r = (u16)src_pixels[2] * 257;
>>>> +		out_pixels[x].g = (u16)src_pixels[1] * 257;
>>>> +		out_pixels[x].b = (u16)src_pixels[0] * 257;
>>>> +	}
>>>> +}
>>>> +
>>>> +/*
>>>> + * The following  functions take an line of argb_u16 pixels from the
>>>> + * src_buffer, convert them to a specific format, and store them in the
>>>> + * destination.
>>>> + *
>>>> + * They are used in the `compose_active_planes` to convert and store a line
>>>> + * from the src_buffer to the writeback buffer.
>>>> + */
>>>> +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
>>>> +				 const struct line_buffer *src_buffer, int y)
>>>> +{
>>>> +	int x_dst = frame_info->dst.x1;
>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>> +			    src_buffer->n_pixels);
>>>> +
>>>> +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>> +		/*
>>>> +		 * This sequence below is important because the format's byte order is
>>>> +		 * in little-endian. In the case of the ARGB8888 the memory is
>>>> +		 * organized this way:
>>>> +		 *
>>>> +		 * | Addr     | = blue channel
>>>> +		 * | Addr + 1 | = green channel
>>>> +		 * | Addr + 2 | = Red channel
>>>> +		 * | Addr + 3 | = Alpha channel
>>>> +		 */
>>>> +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
>>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>>> +	}
>>>> +}
>>>> +
>>>> +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
>>>> +				 const struct line_buffer *src_buffer, int y)
>>>> +{
>>>> +	int x_dst = frame_info->dst.x1;
>>>> +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
>>>> +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
>>>> +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
>>>> +			    src_buffer->n_pixels);
>>>> +
>>>> +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
>>>> +		dst_pixels[3] = 0xff;
>>>> +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
>>>> +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
>>>> +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
>>>> +	}
>>>> +}
>>>> +
>>>> +frame_to_line_func get_frame_to_line_function(u32 format)
>>>> +{
>>>> +	switch (format) {
>>>> +	case DRM_FORMAT_ARGB8888:
>>>> +		return &ARGB8888_to_argb_u16;
>>>> +	case DRM_FORMAT_XRGB8888:
>>>> +		return &XRGB8888_to_argb_u16;
>>>> +	default:
>>>> +		return NULL;
>>>> +	}
>>>> +}
>>>> +
>>>> +line_to_frame_func get_line_to_frame_function(u32 format)
>>>> +{
>>>> +	switch (format) {
>>>> +	case DRM_FORMAT_ARGB8888:
>>>> +		return &argb_u16_to_ARGB8888;
>>>> +	case DRM_FORMAT_XRGB8888:
>>>> +		return &argb_u16_to_XRGB8888;
>>>> +	default:
>>>> +		return NULL;
>>>> +	}
>>>> +}
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
>>>> new file mode 100644
>>>> index 000000000000..053ca42d5b31
>>>> --- /dev/null
>>>> +++ b/drivers/gpu/drm/vkms/vkms_formats.h
>>>> @@ -0,0 +1,12 @@
>>>> +// SPDX-License-Identifier: GPL-2.0+
>>>> +
>>>> +#ifndef _VKMS_FORMATS_H_
>>>> +#define _VKMS_FORMATS_H_
>>>> +
>>>> +#include "vkms_drv.h"
>>>> +
>>>> +frame_to_line_func get_frame_to_line_function(u32 format);
>>>> +
>>>> +line_to_frame_func get_line_to_frame_function(u32 format);
>>>> +
>>>> +#endif /* _VKMS_FORMATS_H_ */
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
>>>> index 8adbfdc05e50..7a479a714565 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_plane.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_plane.c
>>>> @@ -10,6 +10,7 @@
>>>>    #include <drm/drm_plane_helper.h>
>>>>    #include "vkms_drv.h"
>>>> +#include "vkms_formats.h"
>>> ^ this line no longer applies (needs to rebase), but I can manage it before apply to drm-misc-next
>>
>> I did the rebase and I hadn't any issues.
>>
>> I'm using `git://anongit.freedesktop.org/drm/drm-misc` remote. Should I be
>> using another git remote for vkms?
> 
> hmmm... the repository is correct, maybe double check if you are rebase
> on top of `drm-misc-next` branch and it is updated.
> 
> For reference, I tried to apply your series on top of this commit:
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ee50b00244086453dfb7076e4b80214948cd0507
> 
> Also, if you compare the line right above the `#include vkms_drv.h` here:
> https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/vkms/vkms_plane.c#n9
> from your diff and the current file, it is not the same.
> 
> Can you double check it, please?

Everything seems right. Hopefully we will not have this issue again in V7.

> 
> Thanks,
> 
> Melissa
>>
>>>>    static const u32 vkms_formats[] = {
>>>>    	DRM_FORMAT_XRGB8888,
>>>> @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>>>    	struct drm_shadow_plane_state *shadow_plane_state;
>>>>    	struct drm_framebuffer *fb = new_state->fb;
>>>>    	struct vkms_frame_info *frame_info;
>>>> +	u32 fmt = fb->format->format;
>>>>    	if (!new_state->crtc || !fb)
>>>>    		return;
>>>> @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
>>>>    	frame_info->offset = fb->offsets[0];
>>>>    	frame_info->pitch = fb->pitches[0];
>>>>    	frame_info->cpp = fb->format->cpp[0];
>>>> +	vkms_plane_state->plane_read = get_frame_to_line_function(fmt);
>>>>    }
>>>>    static int vkms_plane_atomic_check(struct drm_plane *plane,
>>>> diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
>>>> index c87f6c89e7b4..d2aabb52cb46 100644
>>>> --- a/drivers/gpu/drm/vkms/vkms_writeback.c
>>>> +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
>>>> @@ -11,6 +11,7 @@
>>>>    #include <drm/drm_gem_shmem_helper.h>
>>>>    #include "vkms_drv.h"
>>>> +#include "vkms_formats.h"
>>>>    static const u32 vkms_wb_formats[] = {
>>>>    	DRM_FORMAT_XRGB8888,
>>>> @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>>>    	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
>>>>    	struct vkms_writeback_job *active_wb;
>>>>    	struct vkms_frame_info *wb_frame_info;
>>>> +	u32 wb_format = fb->format->format;
>>>>    	if (!conn_state)
>>>>    		return;
>>>> @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
>>>>    	crtc_state->wb_pending = true;
>>>>    	spin_unlock_irq(&output->composer_lock);
>>>>    	drm_writeback_queue_job(wb_conn, connector_state);
>>>> +	active_wb->wb_write = get_line_to_frame_function(wb_format);
>>>>    }
>>>>    static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
>>>> -- 
>>>> 2.30.2
>>>>
>>

Re: [RESEND v6 6/9] drm: vkms: Refactor the plane composer to accept new formats

Melissa Wen <mwen@igalia.com>
Details
Message ID
<20220822203351.cjgv3qkv7dqgx6om@mail.igalia.com>
In-Reply-To
<b98412ed-9ae7-49fa-bdb8-53e589d14945@gmail.com> (view parent)
DKIM signature
missing
Download raw message
On 08/22, Igor Matheus Andrade Torrente wrote:
> On 8/22/22 16:01, Melissa Wen wrote:
> > On 08/22, Igor Matheus Andrade Torrente wrote:
> > > Hi Melissa,
> > > 
> > > On 8/20/22 07:51, Melissa Wen wrote:
> > > > On 08/19, Igor Torrente wrote:
> > > > > Currently the blend function only accepts XRGB_8888 and ARGB_8888
> > > > > as a color input.
> > > > > 
> > > > > This patch refactors all the functions related to the plane composition
> > > > > to overcome this limitation.
> > > > > 
> > > > > The pixels blend is done using the new internal format. And new handlers
> > > > > are being added to convert a specific format to/from this internal format.
> > > > > 
> > > > > So the blend operation depends on these handlers to convert to this common
> > > > > format. The blended result, if necessary, is converted to the writeback
> > > > > buffer format.
> > > > > 
> > > > > This patch introduces three major differences to the blend function.
> > > > > 1 - All the planes are blended at once.
> > > > > 2 - The blend calculus is done as per line instead of per pixel.
> > > > > 3 - It is responsible to calculates the CRC and writing the writeback
> > > > > buffer(if necessary).
> > > > > 
> > > > > These changes allow us to allocate way less memory in the intermediate
> > > > > buffer to compute these operations. Because now we don't need to
> > > > > have the entire intermediate image lines at once, just one line is
> > > > > enough.
> > > > > 
> > > > > | Memory consumption (output dimensions) |
> > > > > |:--------------------------------------:|
> > > > > |       Current      |     This patch    |
> > > > > |:------------------:|:-----------------:|
> > > > > |   Width * Heigth   |     2 * Width     |
> > > > > 
> > > > > Beyond memory, we also have a minor performance benefit from all
> > > > > these changes. Results running the IGT[1] test
> > > > > `igt@kms_cursor_crc@pipe-a-cursor-512x512-onscreen` ten times:
> > > > > 
> > > > > |                 Frametime                  |
> > > > > |:------------------------------------------:|
> > > > > |  Implementation |  Current  |  This commit |
> > > > > |:---------------:|:---------:|:------------:|
> > > > > | frametime range |  9~22 ms  |    5~17 ms   |
> > > > > |     Average     |  11.4 ms  |    7.8 ms    |
> > > > > 
> > > > > [1] IGT commit id: bc3f6833a12221a46659535dac06ebb312490eb4
> > > > > 
> > > > > V2: Improves the performance drastically, by performing the operations
> > > > >       per-line and not per-pixel(Pekka Paalanen).
> > > > >       Minor improvements(Pekka Paalanen).
> > > > > V3: Changes the code to blend the planes all at once. This improves
> > > > >       performance, memory consumption, and removes much of the weirdness
> > > > >       of the V2(Pekka Paalanen and me).
> > > > >       Minor improvements(Pekka Paalanen and me).
> > > > > V4: Rebase the code and adapt it to the new NUM_OVERLAY_PLANES constant.
> > > > > V5: Minor checkpatch fixes and the removal of TO-DO item(Melissa Wen).
> > > > >       Several security/robustness improvents(Pekka Paalanen).
> > > > >       Removes check_planes_x_bounds function and allows partial
> > > > >       partly off-screen(Pekka Paalanen).
> > > > > V6: Fix a mismatch of some variable sizes (Pekka Paalanen).
> > > > >       Several minor improvements (Pekka Paalanen).
> > > > > 
> > > > > Reported-by: kernel test robot <lkp@intel.com>
> > > > > Signed-off-by: Igor Torrente <igormtorrente@gmail.com>
> > > > > ---
> > > > >    Documentation/gpu/vkms.rst            |   4 -
> > > > >    drivers/gpu/drm/vkms/Makefile         |   1 +
> > > > >    drivers/gpu/drm/vkms/vkms_composer.c  | 320 ++++++++++++--------------
> > > > >    drivers/gpu/drm/vkms/vkms_formats.c   | 155 +++++++++++++
> > > > >    drivers/gpu/drm/vkms/vkms_formats.h   |  12 +
> > > > >    drivers/gpu/drm/vkms/vkms_plane.c     |   3 +
> > > > >    drivers/gpu/drm/vkms/vkms_writeback.c |   3 +
> > > > >    7 files changed, 317 insertions(+), 181 deletions(-)
> > > > >    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.c
> > > > >    create mode 100644 drivers/gpu/drm/vkms/vkms_formats.h
> > > > > 
> > > > > diff --git a/Documentation/gpu/vkms.rst b/Documentation/gpu/vkms.rst
> > > > > index 973e2d43108b..a49e4ae92653 100644
> > > > > --- a/Documentation/gpu/vkms.rst
> > > > > +++ b/Documentation/gpu/vkms.rst
> > > > > @@ -118,10 +118,6 @@ Add Plane Features
> > > > >    There's lots of plane features we could add support for:
> > > > > -- Clearing primary plane: clear primary plane before plane composition (at the
> > > > > -  start) for correctness of pixel blend ops. It also guarantees alpha channel
> > > > > -  is cleared in the target buffer for stable crc. [Good to get started]
> > > > > -
> > > > >    - ARGB format on primary plane: blend the primary plane into background with
> > > > >      translucent alpha.
> > > > > diff --git a/drivers/gpu/drm/vkms/Makefile b/drivers/gpu/drm/vkms/Makefile
> > > > > index 72f779cbfedd..1b28a6a32948 100644
> > > > > --- a/drivers/gpu/drm/vkms/Makefile
> > > > > +++ b/drivers/gpu/drm/vkms/Makefile
> > > > > @@ -3,6 +3,7 @@ vkms-y := \
> > > > >    	vkms_drv.o \
> > > > >    	vkms_plane.o \
> > > > >    	vkms_output.o \
> > > > > +	vkms_formats.o \
> > > > >    	vkms_crtc.o \
> > > > >    	vkms_composer.o \
> > > > >    	vkms_writeback.o
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_composer.c b/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > index b9fb408e8973..5b1a8bdd8268 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_composer.c
> > > > > @@ -7,204 +7,188 @@
> > > > >    #include <drm/drm_fourcc.h>
> > > > >    #include <drm/drm_gem_framebuffer_helper.h>
> > > > >    #include <drm/drm_vblank.h>
> > > > > +#include <linux/minmax.h>
> > > > >    #include "vkms_drv.h"
> > > > > -static u32 get_pixel_from_buffer(int x, int y, const u8 *buffer,
> > > > > -				 const struct vkms_frame_info *frame_info)
> > > > > +static u16 pre_mul_blend_channel(u16 src, u16 dst, u16 alpha)
> > > > >    {
> > > > > -	u32 pixel;
> > > > > -	int src_offset = frame_info->offset + (y * frame_info->pitch)
> > > > > -					    + (x * frame_info->cpp);
> > > > > +	u32 new_color;
> > > > > -	pixel = *(u32 *)&buffer[src_offset];
> > > > > +	new_color = (src * 0xffff + dst * (0xffff - alpha));
> > > > > -	return pixel;
> > > > > +	return DIV_ROUND_CLOSEST(new_color, 0xffff);
> > > > >    }
> > > > >    /**
> > > > > - * compute_crc - Compute CRC value on output frame
> > > > > + * pre_mul_alpha_blend - alpha blending equation
> > > > > + * @src_frame_info: source framebuffer's metadata
> > > > > + * @stage_buffer: The line with the pixels from src_plane
> > > > > + * @output_buffer: A line buffer that receives all the blends output
> > > > >     *
> > > > > - * @vaddr: address to final framebuffer
> > > > > - * @frame_info: framebuffer's metadata
> > > > > + * Using the information from the `frame_info`, this blends only the
> > > > > + * necessary pixels from the `stage_buffer` to the `output_buffer`
> > > > > + * using premultiplied blend formula.
> > > > >     *
> > > > > - * returns CRC value computed using crc32 on the visible portion of
> > > > > - * the final framebuffer at vaddr_out
> > > > > + * The current DRM assumption is that pixel color values have been already
> > > > > + * pre-multiplied with the alpha channel values. See more
> > > > > + * drm_plane_create_blend_mode_property(). Also, this formula assumes a
> > > > > + * completely opaque background.
> > > > >     */
> > > > > -static uint32_t compute_crc(const u8 *vaddr,
> > > > > -			    const struct vkms_frame_info *frame_info)
> > > > > +static void pre_mul_alpha_blend(struct vkms_frame_info *frame_info,
> > > > > +				struct line_buffer *stage_buffer,
> > > > > +				struct line_buffer *output_buffer)
> > > > >    {
> > > > > -	int x, y;
> > > > > -	u32 crc = 0, pixel = 0;
> > > > > -	int x_src = frame_info->src.x1 >> 16;
> > > > > -	int y_src = frame_info->src.y1 >> 16;
> > > > > -	int h_src = drm_rect_height(&frame_info->src) >> 16;
> > > > > -	int w_src = drm_rect_width(&frame_info->src) >> 16;
> > > > > -
> > > > > -	for (y = y_src; y < y_src + h_src; ++y) {
> > > > > -		for (x = x_src; x < x_src + w_src; ++x) {
> > > > > -			pixel = get_pixel_from_buffer(x, y, vaddr, frame_info);
> > > > > -			crc = crc32_le(crc, (void *)&pixel, sizeof(u32));
> > > > > -		}
> > > > > +	int x_dst = frame_info->dst.x1;
> > > > > +	struct pixel_argb_u16 *out = output_buffer->pixels + x_dst;
> > > > > +	struct pixel_argb_u16 *in = stage_buffer->pixels;
> > > > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > > > +			    stage_buffer->n_pixels);
> > > > > +
> > > > > +	for (int x = 0; x < x_limit; x++) {
> > > > > +		out[x].a = (u16)0xffff;
> > > > > +		out[x].r = pre_mul_blend_channel(in[x].r, out[x].r, in[x].a);
> > > > > +		out[x].g = pre_mul_blend_channel(in[x].g, out[x].g, in[x].a);
> > > > > +		out[x].b = pre_mul_blend_channel(in[x].b, out[x].b, in[x].a);
> > > > >    	}
> > > > > -
> > > > > -	return crc;
> > > > >    }
> > > > > -static u8 blend_channel(u8 src, u8 dst, u8 alpha)
> > > > > +static bool check_y_limit(struct vkms_frame_info *frame_info, int y)
> > > > >    {
> > > > > -	u32 pre_blend;
> > > > > -	u8 new_color;
> > > > > -
> > > > > -	pre_blend = (src * 255 + dst * (255 - alpha));
> > > > > -
> > > > > -	/* Faster div by 255 */
> > > > > -	new_color = ((pre_blend + ((pre_blend + 257) >> 8)) >> 8);
> > > > > +	if (y >= frame_info->dst.y1 && y < frame_info->dst.y2)
> > > > > +		return true;
> > > > > -	return new_color;
> > > > > +	return false;
> > > > >    }
> > > > >    /**
> > > > > - * alpha_blend - alpha blending equation
> > > > > - * @argb_src: src pixel on premultiplied alpha mode
> > > > > - * @argb_dst: dst pixel completely opaque
> > > > > + * @wb_frame_info: The writeback frame buffer metadata
> > > > > + * @crtc_state: The crtc state
> > > > > + * @crc32: The crc output of the final frame
> > > > > + * @output_buffer: A buffer of a row that will receive the result of the blend(s)
> > > > > + * @stage_buffer: The line with the pixels from plane being blend to the output
> > > > >     *
> > > > > - * blend pixels using premultiplied blend formula. The current DRM assumption
> > > > > - * is that pixel color values have been already pre-multiplied with the alpha
> > > > > - * channel values. See more drm_plane_create_blend_mode_property(). Also, this
> > > > > - * formula assumes a completely opaque background.
> > > > > + * This function blends the pixels (Using the `pre_mul_alpha_blend`)
> > > > > + * from all planes, calculates the crc32 of the output from the former step,
> > > > > + * and, if necessary, convert and store the output to the writeback buffer.
> > > > >     */
> > > > > -static void alpha_blend(const u8 *argb_src, u8 *argb_dst)
> > > > > +static void blend(struct vkms_writeback_job *wb,
> > > > > +		  struct vkms_crtc_state *crtc_state,
> > > > > +		  u32 *crc32, struct line_buffer *stage_buffer,
> > > > > +		  struct line_buffer *output_buffer, size_t row_size)
> > > > >    {
> > > > > -	u8 alpha;
> > > > > +	struct vkms_plane_state **plane = crtc_state->active_planes;
> > > > > +	struct vkms_frame_info *primary_plane_info = plane[0]->frame_info;
> > > > > +	u32 n_active_planes = crtc_state->num_active_planes;
> > > > > +
> > > > > +	int y_dst = primary_plane_info->dst.y1;
> > > > > +	int h_dst = drm_rect_height(&primary_plane_info->dst);
> > > > > +	int y_limit = y_dst + h_dst;
> > > > > +
> > > > > +	for (size_t y = y_dst; y < y_limit; y++) {
> > > > > +		plane[0]->plane_read(output_buffer, primary_plane_info, y);
> > > > > +
> > > > > +		/* If there are other planes besides primary, we consider the active
> > > > > +		 * planes should be in z-order and compose them associatively:
> > > > > +		 * ((primary <- overlay) <- cursor)
> > > > > +		 */
> > > > > +		for (size_t i = 1; i < n_active_planes; i++) {
> > > > > +			if (!check_y_limit(plane[i]->frame_info, y))
> > > > > +				continue;
> > > > > +
> > > > > +			plane[i]->plane_read(stage_buffer, plane[i]->frame_info, y);
> > > > > +			pre_mul_alpha_blend(plane[i]->frame_info, stage_buffer,
> > > > > +					    output_buffer);
> > > > > +		}
> > > > > +
> > > > > +		*crc32 = crc32_le(*crc32, (void *)output_buffer->pixels, row_size);
> > > > > -	alpha = argb_src[3];
> > > > > -	argb_dst[0] = blend_channel(argb_src[0], argb_dst[0], alpha);
> > > > > -	argb_dst[1] = blend_channel(argb_src[1], argb_dst[1], alpha);
> > > > > -	argb_dst[2] = blend_channel(argb_src[2], argb_dst[2], alpha);
> > > > > +		if (wb)
> > > > > +			wb->wb_write(&wb->wb_frame_info, output_buffer, y);
> > > > > +	}
> > > > >    }
> > > > > -/**
> > > > > - * x_blend - blending equation that ignores the pixel alpha
> > > > > - *
> > > > > - * overwrites RGB color value from src pixel to dst pixel.
> > > > > - */
> > > > > -static void x_blend(const u8 *xrgb_src, u8 *xrgb_dst)
> > > > > +static int check_format_funcs(struct vkms_crtc_state *crtc_state,
> > > > > +			      struct vkms_writeback_job *active_wb)
> > > > >    {
> > > > > -	memcpy(xrgb_dst, xrgb_src, sizeof(u8) * 3);
> > > > > +	struct vkms_plane_state **planes = crtc_state->active_planes;
> > > > > +	u32 n_active_planes = crtc_state->num_active_planes;
> > > > > +
> > > > > +	for (size_t i = 0; i < n_active_planes; i++)
> > > > > +		if (!planes[i]->plane_read)
> > > > > +			return -1;
> > > > > +
> > > > > +	if (active_wb && !active_wb->wb_write)
> > > > > +		return -1;
> > > > > +
> > > > > +	return 0;
> > > > >    }
> > > > > -/**
> > > > > - * blend - blend value at vaddr_src with value at vaddr_dst
> > > > > - * @vaddr_dst: destination address
> > > > > - * @vaddr_src: source address
> > > > > - * @dst_frame_info: destination framebuffer's metadata
> > > > > - * @src_frame_info: source framebuffer's metadata
> > > > > - * @pixel_blend: blending equation based on plane format
> > > > > - *
> > > > > - * Blend the vaddr_src value with the vaddr_dst value using a pixel blend
> > > > > - * equation according to the supported plane formats DRM_FORMAT_(A/XRGB8888)
> > > > > - * and clearing alpha channel to an completely opaque background. This function
> > > > > - * uses buffer's metadata to locate the new composite values at vaddr_dst.
> > > > > - *
> > > > > - * TODO: completely clear the primary plane (a = 0xff) before starting to blend
> > > > > - * pixel color values
> > > > > - */
> > > > > -static void blend(void *vaddr_dst, void *vaddr_src,
> > > > > -		  struct vkms_frame_info *dst_frame_info,
> > > > > -		  struct vkms_frame_info *src_frame_info,
> > > > > -		  void (*pixel_blend)(const u8 *, u8 *))
> > > > > +static int compose_active_planes(struct vkms_writeback_job *active_wb,
> > > > > +				 struct vkms_crtc_state *crtc_state,
> > > > > +				 u32 *crc32)
> > > > >    {
> > > > > -	int i, j, j_dst, i_dst;
> > > > > -	int offset_src, offset_dst;
> > > > > -	u8 *pixel_dst, *pixel_src;
> > > > > -
> > > > > -	int x_src = src_frame_info->src.x1 >> 16;
> > > > > -	int y_src = src_frame_info->src.y1 >> 16;
> > > > > -
> > > > > -	int x_dst = src_frame_info->dst.x1;
> > > > > -	int y_dst = src_frame_info->dst.y1;
> > > > > -	int h_dst = drm_rect_height(&src_frame_info->dst);
> > > > > -	int w_dst = drm_rect_width(&src_frame_info->dst);
> > > > > -
> > > > > -	int y_limit = y_src + h_dst;
> > > > > -	int x_limit = x_src + w_dst;
> > > > > -
> > > > > -	for (i = y_src, i_dst = y_dst; i < y_limit; ++i) {
> > > > > -		for (j = x_src, j_dst = x_dst; j < x_limit; ++j) {
> > > > > -			offset_dst = dst_frame_info->offset
> > > > > -				     + (i_dst * dst_frame_info->pitch)
> > > > > -				     + (j_dst++ * dst_frame_info->cpp);
> > > > > -			offset_src = src_frame_info->offset
> > > > > -				     + (i * src_frame_info->pitch)
> > > > > -				     + (j * src_frame_info->cpp);
> > > > > -
> > > > > -			pixel_src = (u8 *)(vaddr_src + offset_src);
> > > > > -			pixel_dst = (u8 *)(vaddr_dst + offset_dst);
> > > > > -			pixel_blend(pixel_src, pixel_dst);
> > > > > -			/* clearing alpha channel (0xff)*/
> > > > > -			pixel_dst[3] = 0xff;
> > > > > -		}
> > > > > -		i_dst++;
> > > > > +	size_t line_width, pixel_size = sizeof(struct pixel_argb_u16);
> > > > > +	struct vkms_frame_info *primary_plane_info = NULL;
> > > > > +	struct line_buffer output_buffer, stage_buffer;
> > > > > +	struct vkms_plane_state *act_plane = NULL;
> > > > > +	int ret = 0;
> > > > > +
> > > > > +	/*
> > > > > +	 * This check exists so we can call `crc32_le` for the entire line
> > > > > +	 * instead doing it for each channel of each pixel in case
> > > > > +	 * `struct `pixel_argb_u16` had any gap added by the compiler
> > > > > +	 * between the struct fields.
> > > > > +	 */
> > > > > +	static_assert(sizeof(struct pixel_argb_u16) == 8);
> > > > > +
> > > > > +	if (crtc_state->num_active_planes >= 1) {
> > > > > +		act_plane = crtc_state->active_planes[0];
> > > > > +		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> > > > > +			primary_plane_info = act_plane->frame_info;
> > > > >    	}
> > > > > -}
> > > > > -static void compose_plane(struct vkms_frame_info *primary_plane_info,
> > > > > -			  struct vkms_frame_info *plane_frame_info,
> > > > > -			  void *vaddr_out)
> > > > > -{
> > > > > -	struct drm_framebuffer *fb = plane_frame_info->fb;
> > > > > -	void *vaddr;
> > > > > -	void (*pixel_blend)(const u8 *p_src, u8 *p_dst);
> > > > > +	if (!primary_plane_info)
> > > > > +		return -EINVAL;
> > > > >    	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> > > > > -		return;
> > > > > +		return -EINVAL;
> > > > > -	vaddr = plane_frame_info->map[0].vaddr;
> > > > > +	if (WARN_ON(check_format_funcs(crtc_state, active_wb)))
> > > > > +		return -EINVAL;
> > > > > -	if (fb->format->format == DRM_FORMAT_ARGB8888)
> > > > > -		pixel_blend = &alpha_blend;
> > > > > -	else
> > > > > -		pixel_blend = &x_blend;
> > > > > +	line_width = drm_rect_width(&primary_plane_info->dst);
> > > > > +	stage_buffer.n_pixels = line_width;
> > > > > +	output_buffer.n_pixels = line_width;
> > > > > -	blend(vaddr_out, vaddr, primary_plane_info,
> > > > > -	      plane_frame_info, pixel_blend);
> > > > > -}
> > > > > +	stage_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> > > > > +	if (!stage_buffer.pixels) {
> > > > > +		DRM_ERROR("Cannot allocate memory for the output line buffer");
> > > > > +		return -ENOMEM;
> > > > > +	}
> > > > > -static int compose_active_planes(void **vaddr_out,
> > > > > -				 struct vkms_frame_info *primary_plane_info,
> > > > > -				 struct vkms_crtc_state *crtc_state)
> > > > > -{
> > > > > -	struct drm_framebuffer *fb = primary_plane_info->fb;
> > > > > -	struct drm_gem_object *gem_obj = drm_gem_fb_get_obj(fb, 0);
> > > > > -	const void *vaddr;
> > > > > -	int i;
> > > > > -
> > > > > -	if (!*vaddr_out) {
> > > > > -		*vaddr_out = kvzalloc(gem_obj->size, GFP_KERNEL);
> > > > > -		if (!*vaddr_out) {
> > > > > -			DRM_ERROR("Cannot allocate memory for output frame.");
> > > > > -			return -ENOMEM;
> > > > > -		}
> > > > > +	output_buffer.pixels = kvmalloc(line_width * pixel_size, GFP_KERNEL);
> > > > > +	if (!output_buffer.pixels) {
> > > > > +		DRM_ERROR("Cannot allocate memory for intermediate line buffer");
> > > > > +		ret = -ENOMEM;
> > > > > +		goto free_stage_buffer;
> > > > >    	}
> > > > > -	if (WARN_ON(iosys_map_is_null(&primary_plane_info->map[0])))
> > > > > -		return -EINVAL;
> > > > > +	if (active_wb) {
> > > > > +		struct vkms_frame_info *wb_frame_info = &active_wb->wb_frame_info;
> > > > > -	vaddr = primary_plane_info->map[0].vaddr;
> > > > > +		wb_frame_info->src = primary_plane_info->src;
> > > > > +		wb_frame_info->dst = primary_plane_info->dst;
> > > > > +	}
> > > > > -	memcpy(*vaddr_out, vaddr, gem_obj->size);
> > > > > +	blend(active_wb, crtc_state, crc32, &stage_buffer,
> > > > > +	      &output_buffer, line_width * pixel_size);
> > > > > -	/* If there are other planes besides primary, we consider the active
> > > > > -	 * planes should be in z-order and compose them associatively:
> > > > > -	 * ((primary <- overlay) <- cursor)
> > > > > -	 */
> > > > > -	for (i = 1; i < crtc_state->num_active_planes; i++)
> > > > > -		compose_plane(primary_plane_info,
> > > > > -			      crtc_state->active_planes[i]->frame_info,
> > > > > -			      *vaddr_out);
> > > > > +	kvfree(output_buffer.pixels);
> > > > > +free_stage_buffer:
> > > > > +	kvfree(stage_buffer.pixels);
> > > > > -	return 0;
> > > > > +	return ret;
> > > > >    }
> > > > >    /**
> > > > > @@ -222,13 +206,11 @@ void vkms_composer_worker(struct work_struct *work)
> > > > >    						struct vkms_crtc_state,
> > > > >    						composer_work);
> > > > >    	struct drm_crtc *crtc = crtc_state->base.crtc;
> > > > > +	struct vkms_writeback_job *active_wb = crtc_state->active_writeback;
> > > > >    	struct vkms_output *out = drm_crtc_to_vkms_output(crtc);
> > > > > -	struct vkms_frame_info *primary_plane_info = NULL;
> > > > > -	struct vkms_plane_state *act_plane = NULL;
> > > > >    	bool crc_pending, wb_pending;
> > > > > -	void *vaddr_out = NULL;
> > > > > -	u32 crc32 = 0;
> > > > >    	u64 frame_start, frame_end;
> > > > > +	u32 crc32 = 0;
> > > > >    	int ret;
> > > > >    	spin_lock_irq(&out->composer_lock);
> > > > > @@ -248,35 +230,19 @@ void vkms_composer_worker(struct work_struct *work)
> > > > >    	if (!crc_pending)
> > > > >    		return;
> > > > > -	if (crtc_state->num_active_planes >= 1) {
> > > > > -		act_plane = crtc_state->active_planes[0];
> > > > > -		if (act_plane->base.base.plane->type == DRM_PLANE_TYPE_PRIMARY)
> > > > > -			primary_plane_info = act_plane->frame_info;
> > > > > -	}
> > > > > -
> > > > > -	if (!primary_plane_info)
> > > > > -		return;
> > > > > -
> > > > >    	if (wb_pending)
> > > > > -		vaddr_out = crtc_state->active_writeback->data[0].vaddr;
> > > > > +		ret = compose_active_planes(active_wb, crtc_state, &crc32);
> > > > > +	else
> > > > > +		ret = compose_active_planes(NULL, crtc_state, &crc32);
> > > > > -	ret = compose_active_planes(&vaddr_out, primary_plane_info,
> > > > > -				    crtc_state);
> > > > > -	if (ret) {
> > > > > -		if (ret == -EINVAL && !wb_pending)
> > > > > -			kvfree(vaddr_out);
> > > > > +	if (ret)
> > > > >    		return;
> > > > > -	}
> > > > > -
> > > > > -	crc32 = compute_crc(vaddr_out, primary_plane_info);
> > > > >    	if (wb_pending) {
> > > > >    		drm_writeback_signal_completion(&out->wb_connector, 0);
> > > > >    		spin_lock_irq(&out->composer_lock);
> > > > >    		crtc_state->wb_pending = false;
> > > > >    		spin_unlock_irq(&out->composer_lock);
> > > > > -	} else {
> > > > > -		kvfree(vaddr_out);
> > > > >    	}
> > > > >    	/*
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.c b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > new file mode 100644
> > > > > index 000000000000..ca4bfcac686b
> > > > > --- /dev/null
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_formats.c
> > > > > @@ -0,0 +1,155 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0+
> > > > > +
> > > > > +#include <drm/drm_rect.h>
> > > > > +#include <linux/minmax.h>
> > > > > +
> > > > > +#include "vkms_formats.h"
> > > > > +
> > > > > +static size_t pixel_offset(const struct vkms_frame_info *frame_info, int x, int y)
> > > > > +{
> > > > > +	return frame_info->offset + (y * frame_info->pitch)
> > > > > +				  + (x * frame_info->cpp);
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * packed_pixels_addr - Get the pointer to pixel of a given pair of coordinates
> > > > > + *
> > > > > + * @frame_info: Buffer metadata
> > > > > + * @x: The x(width) coordinate of the 2D buffer
> > > > > + * @y: The y(Heigth) coordinate of the 2D buffer
> > > > > + *
> > > > > + * Takes the information stored in the frame_info, a pair of coordinates, and
> > > > > + * returns the address of the first color channel.
> > > > > + * This function assumes the channels are packed together, i.e. a color channel
> > > > > + * comes immediately after another in the memory. And therefore, this function
> > > > > + * doesn't work for YUV with chroma subsampling (e.g. YUV420 and NV21).
> > > > > + */
> > > > > +static void *packed_pixels_addr(const struct vkms_frame_info *frame_info,
> > > > > +				int x, int y)
> > > > > +{
> > > > > +	size_t offset = pixel_offset(frame_info, x, y);
> > > > > +
> > > > > +	return (u8 *)frame_info->map[0].vaddr + offset;
> > > > > +}
> > > > > +
> > > > > +static void *get_packed_src_addr(const struct vkms_frame_info *frame_info, int y)
> > > > > +{
> > > > > +	int x_src = frame_info->src.x1 >> 16;
> > > > > +	int y_src = y - frame_info->dst.y1 + (frame_info->src.y1 >> 16);
> > > > > +
> > > > > +	return packed_pixels_addr(frame_info, x_src, y_src);
> > > > > +}
> > > > > +
> > > > > +static void ARGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> > > > > +				 const struct vkms_frame_info *frame_info, int y)
> > > > > +{
> > > > > +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> > > > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > > > +			    stage_buffer->n_pixels);
> > > > > +
> > > > > +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
> > > > > +		/*
> > > > > +		 * The 257 is the "conversion ratio". This number is obtained by the
> > > > > +		 * (2^16 - 1) / (2^8 - 1) division. Which, in this case, tries to get
> > > > > +		 * the best color value in a pixel format with more possibilities.
> > > > > +		 * A similar idea applies to others RGB color conversions.
> > > > > +		 */
> > > > > +		out_pixels[x].a = (u16)src_pixels[3] * 257;
> > > > > +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> > > > > +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> > > > > +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> > > > > +	}
> > > > > +}
> > > > > +
> > > > > +static void XRGB8888_to_argb_u16(struct line_buffer *stage_buffer,
> > > > > +				 const struct vkms_frame_info *frame_info, int y)
> > > > > +{
> > > > > +	struct pixel_argb_u16 *out_pixels = stage_buffer->pixels;
> > > > > +	u8 *src_pixels = get_packed_src_addr(frame_info, y);
> > > > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > > > +			    stage_buffer->n_pixels);
> > > > > +
> > > > > +	for (size_t x = 0; x < x_limit; x++, src_pixels += 4) {
> > > > > +		out_pixels[x].a = (u16)0xffff;
> > > > > +		out_pixels[x].r = (u16)src_pixels[2] * 257;
> > > > > +		out_pixels[x].g = (u16)src_pixels[1] * 257;
> > > > > +		out_pixels[x].b = (u16)src_pixels[0] * 257;
> > > > > +	}
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * The following  functions take an line of argb_u16 pixels from the
> > > > > + * src_buffer, convert them to a specific format, and store them in the
> > > > > + * destination.
> > > > > + *
> > > > > + * They are used in the `compose_active_planes` to convert and store a line
> > > > > + * from the src_buffer to the writeback buffer.
> > > > > + */
> > > > > +static void argb_u16_to_ARGB8888(struct vkms_frame_info *frame_info,
> > > > > +				 const struct line_buffer *src_buffer, int y)
> > > > > +{
> > > > > +	int x_dst = frame_info->dst.x1;
> > > > > +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > > > +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> > > > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > > > +			    src_buffer->n_pixels);
> > > > > +
> > > > > +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
> > > > > +		/*
> > > > > +		 * This sequence below is important because the format's byte order is
> > > > > +		 * in little-endian. In the case of the ARGB8888 the memory is
> > > > > +		 * organized this way:
> > > > > +		 *
> > > > > +		 * | Addr     | = blue channel
> > > > > +		 * | Addr + 1 | = green channel
> > > > > +		 * | Addr + 2 | = Red channel
> > > > > +		 * | Addr + 3 | = Alpha channel
> > > > > +		 */
> > > > > +		dst_pixels[3] = DIV_ROUND_CLOSEST(in_pixels[x].a, 257);
> > > > > +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> > > > > +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> > > > > +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> > > > > +	}
> > > > > +}
> > > > > +
> > > > > +static void argb_u16_to_XRGB8888(struct vkms_frame_info *frame_info,
> > > > > +				 const struct line_buffer *src_buffer, int y)
> > > > > +{
> > > > > +	int x_dst = frame_info->dst.x1;
> > > > > +	u8 *dst_pixels = packed_pixels_addr(frame_info, x_dst, y);
> > > > > +	struct pixel_argb_u16 *in_pixels = src_buffer->pixels;
> > > > > +	int x_limit = min_t(size_t, drm_rect_width(&frame_info->dst),
> > > > > +			    src_buffer->n_pixels);
> > > > > +
> > > > > +	for (size_t x = 0; x < x_limit; x++, dst_pixels += 4) {
> > > > > +		dst_pixels[3] = 0xff;
> > > > > +		dst_pixels[2] = DIV_ROUND_CLOSEST(in_pixels[x].r, 257);
> > > > > +		dst_pixels[1] = DIV_ROUND_CLOSEST(in_pixels[x].g, 257);
> > > > > +		dst_pixels[0] = DIV_ROUND_CLOSEST(in_pixels[x].b, 257);
> > > > > +	}
> > > > > +}
> > > > > +
> > > > > +frame_to_line_func get_frame_to_line_function(u32 format)
> > > > > +{
> > > > > +	switch (format) {
> > > > > +	case DRM_FORMAT_ARGB8888:
> > > > > +		return &ARGB8888_to_argb_u16;
> > > > > +	case DRM_FORMAT_XRGB8888:
> > > > > +		return &XRGB8888_to_argb_u16;
> > > > > +	default:
> > > > > +		return NULL;
> > > > > +	}
> > > > > +}
> > > > > +
> > > > > +line_to_frame_func get_line_to_frame_function(u32 format)
> > > > > +{
> > > > > +	switch (format) {
> > > > > +	case DRM_FORMAT_ARGB8888:
> > > > > +		return &argb_u16_to_ARGB8888;
> > > > > +	case DRM_FORMAT_XRGB8888:
> > > > > +		return &argb_u16_to_XRGB8888;
> > > > > +	default:
> > > > > +		return NULL;
> > > > > +	}
> > > > > +}
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_formats.h b/drivers/gpu/drm/vkms/vkms_formats.h
> > > > > new file mode 100644
> > > > > index 000000000000..053ca42d5b31
> > > > > --- /dev/null
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_formats.h
> > > > > @@ -0,0 +1,12 @@
> > > > > +// SPDX-License-Identifier: GPL-2.0+
> > > > > +
> > > > > +#ifndef _VKMS_FORMATS_H_
> > > > > +#define _VKMS_FORMATS_H_
> > > > > +
> > > > > +#include "vkms_drv.h"
> > > > > +
> > > > > +frame_to_line_func get_frame_to_line_function(u32 format);
> > > > > +
> > > > > +line_to_frame_func get_line_to_frame_function(u32 format);
> > > > > +
> > > > > +#endif /* _VKMS_FORMATS_H_ */
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_plane.c b/drivers/gpu/drm/vkms/vkms_plane.c
> > > > > index 8adbfdc05e50..7a479a714565 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_plane.c
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_plane.c
> > > > > @@ -10,6 +10,7 @@
> > > > >    #include <drm/drm_plane_helper.h>
> > > > >    #include "vkms_drv.h"
> > > > > +#include "vkms_formats.h"
> > > > ^ this line no longer applies (needs to rebase), but I can manage it before apply to drm-misc-next
> > > 
> > > I did the rebase and I hadn't any issues.
> > > 
> > > I'm using `git://anongit.freedesktop.org/drm/drm-misc` remote. Should I be
> > > using another git remote for vkms?
> > 
> > hmmm... the repository is correct, maybe double check if you are rebase
> > on top of `drm-misc-next` branch and it is updated.
> > 
> > For reference, I tried to apply your series on top of this commit:
> > https://cgit.freedesktop.org/drm/drm-misc/commit/?id=ee50b00244086453dfb7076e4b80214948cd0507
> > 
> > Also, if you compare the line right above the `#include vkms_drv.h` here:
> > https://cgit.freedesktop.org/drm/drm-misc/tree/drivers/gpu/drm/vkms/vkms_plane.c#n9
> > from your diff and the current file, it is not the same.
> > 
> > Can you double check it, please?
> 
> Everything seems right. Hopefully we will not have this issue again in V7.

Great! Yeah, maybe the point here is just that your git version is
smarter than mine to manage differences :) Thanks for checking it.

Melissa
> 
> > 
> > Thanks,
> > 
> > Melissa
> > > 
> > > > >    static const u32 vkms_formats[] = {
> > > > >    	DRM_FORMAT_XRGB8888,
> > > > > @@ -100,6 +101,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > > > >    	struct drm_shadow_plane_state *shadow_plane_state;
> > > > >    	struct drm_framebuffer *fb = new_state->fb;
> > > > >    	struct vkms_frame_info *frame_info;
> > > > > +	u32 fmt = fb->format->format;
> > > > >    	if (!new_state->crtc || !fb)
> > > > >    		return;
> > > > > @@ -116,6 +118,7 @@ static void vkms_plane_atomic_update(struct drm_plane *plane,
> > > > >    	frame_info->offset = fb->offsets[0];
> > > > >    	frame_info->pitch = fb->pitches[0];
> > > > >    	frame_info->cpp = fb->format->cpp[0];
> > > > > +	vkms_plane_state->plane_read = get_frame_to_line_function(fmt);
> > > > >    }
> > > > >    static int vkms_plane_atomic_check(struct drm_plane *plane,
> > > > > diff --git a/drivers/gpu/drm/vkms/vkms_writeback.c b/drivers/gpu/drm/vkms/vkms_writeback.c
> > > > > index c87f6c89e7b4..d2aabb52cb46 100644
> > > > > --- a/drivers/gpu/drm/vkms/vkms_writeback.c
> > > > > +++ b/drivers/gpu/drm/vkms/vkms_writeback.c
> > > > > @@ -11,6 +11,7 @@
> > > > >    #include <drm/drm_gem_shmem_helper.h>
> > > > >    #include "vkms_drv.h"
> > > > > +#include "vkms_formats.h"
> > > > >    static const u32 vkms_wb_formats[] = {
> > > > >    	DRM_FORMAT_XRGB8888,
> > > > > @@ -123,6 +124,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
> > > > >    	struct drm_framebuffer *fb = connector_state->writeback_job->fb;
> > > > >    	struct vkms_writeback_job *active_wb;
> > > > >    	struct vkms_frame_info *wb_frame_info;
> > > > > +	u32 wb_format = fb->format->format;
> > > > >    	if (!conn_state)
> > > > >    		return;
> > > > > @@ -140,6 +142,7 @@ static void vkms_wb_atomic_commit(struct drm_connector *conn,
> > > > >    	crtc_state->wb_pending = true;
> > > > >    	spin_unlock_irq(&output->composer_lock);
> > > > >    	drm_writeback_queue_job(wb_conn, connector_state);
> > > > > +	active_wb->wb_write = get_line_to_frame_function(wb_format);
> > > > >    }
> > > > >    static const struct drm_connector_helper_funcs vkms_wb_conn_helper_funcs = {
> > > > > -- 
> > > > > 2.30.2
> > > > > 
> > > 
> 
Reply to thread Export thread (mbox)