Authentication-Results: mail-b.sr.ht; dkim=pass header.d=ptrk.io header.i=@ptrk.io Received: from mail.ptrk.io (mail.ptrk.io [163.172.91.119]) by mail-b.sr.ht (Postfix) with ESMTPS id 346AD11EEB5 for <~ne02ptzero/libfloat@lists.sr.ht>; Wed, 1 Jun 2022 07:29:30 +0000 (UTC) Received: from authenticated-user (mail.ptrk.io [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mail.ptrk.io (Postfix) with ESMTPSA id B91D7C4EFA; Wed, 1 Jun 2022 07:29:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ptrk.io; s=mail; t=1654068566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=23m8Y4OIyTaAOH505m92jHUnyKo+w7rGA/cX+jsX9Kw=; b=kTIbVkAH8yYhAKdnx7R6EnpKW/5xs0XKI7c8lWCt0h0VGWJmE9EOPM5Rxx9R/dYXXmbBaF APBNImbL//e2HZIrnJVtdGLnmRXCZjMvF8pOc4bdKdKUtYntH4jFO6u2MpUHtpMKLcbi2y mF60T3hWiTM1r2kc2dUXbHkgN13034g= Date: Wed, 1 Jun 2022 07:29:27 +0000 (UTC) From: Patrik Cyvoct To: Louis Solofrizzo Cc: ~ne02ptzero/libfloat@lists.sr.ht, fflorensa@scaleway.com, pcyvoct@scaleway.com Message-ID: In-Reply-To: <20220531111717.3996173-1-lsolofrizzo@scaleway.com> References: <20220531111717.3996173-1-lsolofrizzo@scaleway.com> Subject: Re: [PATCH] log: Try to fix stuck replication on some cases MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Correlation-ID: ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=ptrk.io; s=mail; t=1654068566; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=23m8Y4OIyTaAOH505m92jHUnyKo+w7rGA/cX+jsX9Kw=; b=vOKneX7x4PzGrDjP+x21xi1YtYwI4ZlOvKI2LP6E3c2FgYkZoAsXPnUYTe/UYxDThxQvQw YoYS//ImGkAVlGXkTOdz6hRLaINfdgs9+U7+lckdIYu5J0U8i1KCRmkRvu0kzt4Ood1PJT DAa13lVHLc2kCHtDmyWvZlVQ7YxtKXg= ARC-Seal: i=1; s=mail; d=ptrk.io; t=1654068566; a=rsa-sha256; cv=none; b=Ci+WcZvjo3vzPhYdSOBkldvYK54aj/08Vw8aAzeEc7YPhMsUxulMAsEngxodNaHgvgYtkR mEJ/y4BpQ4lm0ZwRJ6AHC4M5HWycrEBTgutyypPQyviv4u3TAFbIIlMPlBX2wIO4NQfkyT 65eQ9tZ7vQ69nRWPel9BZLVwmMYRo2U= ARC-Authentication-Results: i=1; mail.ptrk.io; auth=pass smtp.mailfrom=patrik@ptrk.io LG May 31, 2022 13:17:28 Louis Solofrizzo : > Signed-off-by: Louis Solofrizzo > --- > log.c=C2=A0 | 24 ++++++++++++++++++++++-- > node.h |=C2=A0 3 +++ > 2 files changed, 25 insertions(+), 2 deletions(-) > > diff --git a/log.c b/log.c > index d7cd67a..d761f14 100644 > --- a/log.c > +++ b/log.c > @@ -660,8 +660,28 @@ void libfloat_append_entries_response(libfloat_ctx_t= *ctx, libfloat_rpc_append_e > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 return; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = } > > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ERROR= (ctx, "libfloat_append_entries_response: node %d: received current_index (%= u) older than replicated_log (%u)", > -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 node->id, resp->current_index, node->replicated_log); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (n= ode->announced_log =3D=3D resp->current_index) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 node->announced_log_count++; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 if (node->announced_log_count =3D=3D 20) > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* We have 20 consecutive hertbe= ats telling the same story, let's believe it */ > + > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 node->announced_log_count =3D 0; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 node->next_log_to_send =3D max(r= esp->current_index + 1, 1); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 node->replicated_log =3D resp->c= urrent_index; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 libfloat_send_append_entries(ctx= , node, false); > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 } > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 else > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 { > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 node->announced_log_count =3D 0; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 node->announced_log =3D resp->current_index; > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > + > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ERROR= (ctx, "libfloat_append_entries_response: node %d: received current_index (%= u) older than replicated_log (%u) (count=3D%lu)", > +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 node->id, resp->current_index, node->replicated_log, nod= e->announced_log_count); > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 = return; > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } > > diff --git a/node.h b/node.h > index f49ce9c..9f0bed8 100644 > --- a/node.h > +++ b/node.h > @@ -7,6 +7,9 @@ typedef struct { > =C2=A0=C2=A0=C2=A0=C2=A0 libfloat_entry_id_t next_log_to_send;=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 /*!< Next log to send to this node */ > =C2=A0=C2=A0=C2=A0=C2=A0 libfloat_entry_id_t replicated_log;=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /*!< Last known replicated log of this= node */ > > +=C2=A0=C2=A0=C2=A0 libfloat_entry_id_t announced_log; > +=C2=A0=C2=A0=C2=A0 uint64_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0 announced_log_count; > + > =C2=A0=C2=A0=C2=A0=C2=A0 uint8_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 has_voted_for_me=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 : 1; > =C2=A0=C2=A0=C2=A0=C2=A0 uint8_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 is_up_to_date=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 : 1; > > -- > 2.36.1