AlmaLinux 9.1 Problem on MDADM raid1

newer
How to Design a Classic Sportswear...

older
dnf update Alma 8 KO about GPG Keys

Alessandro Baggi

3 Jan 2023 3 Jan '23

5:16 a.m.

Hi Jack, I'm sorry to bother you during holidays. I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration: - /boot/efi on md125 - swap on md126 - / on md127 disks are 2 SSD MLC Type. After the installation, if I reboot the system I get: "md: md125 stopped" (it is printed many times like in a loop) alternated with: "systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)" and the system hangs on this loop until I cut the power. I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k. I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables. Also on the second workstation the problem is found. I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1 I tried also with 8.7. No problem here. I tried also RockyLinux 9.1 and got the same problem but with different messages: "block device autoconfig is deprecated and will removed" alternated with: "blkdev_get_no_open: 270 callbacks suppressed." To stop the machine I need to cut the power. I tried also Debian 11.5 without problems. So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1 There is a way to fix this or I should wait an upgrade? Thank you in advance.

Show replies by date

Stephen John Smoogen

3 Jan 3 Jan

7:04 a.m.

On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

...

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

...

disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

Robert Moskowitz

7:12 a.m.

This is software RAID1, not hardware RAID1? Would hardware hit the same problem? the Terramaster looks like it is hardware RAID? On 1/3/23 08:04, Stephen John Smoogen wrote:

...

On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com <http://bugzilla.redhat.com> and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list --users@lists.almalinux.org To unsubscribe send an email tousers-leave@lists.almalinux.org

Stephen John Smoogen

8:15 a.m.

On Tue, 3 Jan 2023 at 08:12, Robert Moskowitz <rgm@htt-consult.com> wrote:

...

This is software RAID1, not hardware RAID1?

Would hardware hit the same problem? the Terramaster looks like it is hardware RAID?

Hardware RAID for ESP partitions requires that the EUFI firmware to understand how to talk to the hardware raid device. If the EUFI doesn't have the right driver it will try to talk to the hardware raid as a raw device in a different way and it goes to pot. [You would think that server hardware would be built to do this out of the box, but I have had a couple where hardware raid was only supported to boot non-EUFI or required the EUFI to be on a separate drive.]

...

On 1/3/23 08:04, Stephen John Smoogen wrote:

On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

...
Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

...
disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

Robert Moskowitz

8:22 a.m.

On 1/3/23 09:15, Stephen John Smoogen wrote:

...

On Tue, 3 Jan 2023 at 08:12, Robert Moskowitz <rgm@htt-consult.com> wrote:

This is software RAID1, not hardware RAID1?

Would hardware hit the same problem? the Terramaster looks like it is hardware RAID?

Hardware RAID for ESP partitions requires that the EUFI firmware to understand how to talk to the hardware raid device. If the EUFI doesn't have the right driver it will try to talk to the hardware raid as a raw device in a different way and it goes to pot. [You would think that server hardware would be built to do this out of the box, but I have had a couple where hardware raid was only supported to boot non-EUFI or required the EUFI to be on a separate drive.]

Argh. I want to buy a small RAID platform for my mail server which really needs updating. After what just happened on my NAS, I am totally sold on RAID. And for my small size, RAID1 is ok. So I can't buy a decent small business turnkey mail server, need to build it. CentOS is gone, that means AlmaLinux and probably iRedMail... But if I get the wrong box that won't give me AL supporting RAID, I have shot a few hundred. Sigh.

...

On 1/3/23 08:04, Stephen John Smoogen wrote:

...
On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com <http://bugzilla.redhat.com> and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list --users@lists.almalinux.org To unsubscribe send an email tousers-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

Robert Moskowitz

3:47 p.m.

I looked some more at this Terramaster RAID box and it is a closed system with its own OS. Not something I can install my own OS on. :( So I ask: is there a "bare metal" RAID1 box out there I can install AlmaLinux on? On 1/3/23 09:15, Stephen John Smoogen wrote:

...

On Tue, 3 Jan 2023 at 08:12, Robert Moskowitz <rgm@htt-consult.com> wrote:

This is software RAID1, not hardware RAID1?

Would hardware hit the same problem? the Terramaster looks like it is hardware RAID?

Hardware RAID for ESP partitions requires that the EUFI firmware to understand how to talk to the hardware raid device. If the EUFI doesn't have the right driver it will try to talk to the hardware raid as a raw device in a different way and it goes to pot. [You would think that server hardware would be built to do this out of the box, but I have had a couple where hardware raid was only supported to boot non-EUFI or required the EUFI to be on a separate drive.]

On 1/3/23 08:04, Stephen John Smoogen wrote:

...
On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com <http://bugzilla.redhat.com> and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list --users@lists.almalinux.org To unsubscribe send an email tousers-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

Christopher Cox

5:55 p.m.

On 1/3/23 15:47, Robert Moskowitz wrote:

...

I looked some more at this Terramaster RAID box and it is a closed system with its own OS. Not something I can install my own OS on. :(

So I ask:

is there a "bare metal" RAID1 box out there I can install AlmaLinux on?

Accordance makes standalone internal RAID 1 subsystems (OS independent). I've used them in the very very distant past. https://www.accordancesystems.com/prod/products IMHO, that may be what you really need. The ARAID system will appear as a single drive. It is a "closed" firmware, like most anything of this type. At the end of the day, you probably don't care. It's a "singular drive" that just so happens to mirror to two drives.

...

On 1/3/23 09:15, Stephen John Smoogen wrote:

...
On Tue, 3 Jan 2023 at 08:12, Robert Moskowitz <rgm@htt-consult.com> wrote:

This is software RAID1, not hardware RAID1?

Would hardware hit the same problem? the Terramaster looks like it is hardware RAID?

Hardware RAID for ESP partitions requires that the EUFI firmware to understand how to talk to the hardware raid device. If the EUFI doesn't have the right driver it will try to talk to the hardware raid as a raw device in a different way and it goes to pot. [You would think that server hardware would be built to do this out of the box, but I have had a couple where hardware raid was only supported to boot non-EUFI or required the EUFI to be on a separate drive.]

On 1/3/23 08:04, Stephen John Smoogen wrote:

...
On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com <http://bugzilla.redhat.com> and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list --users@lists.almalinux.org To unsubscribe send an email tousers-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

Alessandro Baggi

8:45 a.m.

Hi, I have problem on MDADM software RAID. Il 03/01/23 14:12, Robert Moskowitz ha scritto:

...

This is software RAID1, not hardware RAID1?

Would hardware hit the same problem? the Terramaster looks like it is hardware RAID?

On 1/3/23 08:04, Stephen John Smoogen wrote:

...
On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com <http://bugzilla.redhat.com> and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list --users@lists.almalinux.org To unsubscribe send an email tousers-leave@lists.almalinux.org

Robert Moskowitz

5 Jan 5 Jan

7:21 a.m.

I just discovered that the HP Proliant gen8 does not have UEFI whereas the gen10 does. So should I skip the older gen8 boxen and go with the gen10? Or is it better, for RAID to avoid UEFI and get the older, cheaper gen8? thanks On 1/3/23 08:04, Stephen John Smoogen wrote:

...

On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com <http://bugzilla.redhat.com> and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list --users@lists.almalinux.org To unsubscribe send an email tousers-leave@lists.almalinux.org

Stephen John Smoogen

7:31 a.m.

On Thu, 5 Jan 2023 at 08:21, Robert Moskowitz <rgm@htt-consult.com> wrote:

...

I just discovered that the HP Proliant gen8 does not have UEFI whereas the gen10 does.

So should I skip the older gen8 boxen and go with the gen10?

Or is it better, for RAID to avoid UEFI and get the older, cheaper gen8?

Hardware which doesn't support UEFI is probably going to have issues with EL8 or EL9 kernel in other ways (aka older megaraid or similar controller no longer supported) etc. Going from the web pages on HP ( https://techlibrary.hpe.com/us/en/enterprise/servers/supportmatrix/redhat_li... ), the Gen8 only supports RHEL 6 and RHEL 7. I am going to bet everything from network card to hard drive controller is EOL in EL8 and above on a Gen8.

...

thanks

On 1/3/23 08:04, Stephen John Smoogen wrote:

On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

...
Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

trimmed bottom.

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

Robert Moskowitz

7:57 a.m.

Thanks. This saves me time. But will cost more. :) One to further digging. On 1/5/23 08:31, Stephen John Smoogen wrote:

...

On Thu, 5 Jan 2023 at 08:21, Robert Moskowitz <rgm@htt-consult.com> wrote:

I just discovered that the HP Proliant gen8 does not have UEFI whereas the gen10 does.

So should I skip the older gen8 boxen and go with the gen10?

Or is it better, for RAID to avoid UEFI and get the older, cheaper gen8?

Hardware which doesn't support UEFI is probably going to have issues with EL8 or EL9 kernel in other ways (aka older megaraid or similar controller no longer supported) etc. Going from the web pages on HP ( https://techlibrary.hpe.com/us/en/enterprise/servers/supportmatrix/redhat_li... ), the Gen8 only supports RHEL 6 and RHEL 7. I am going to bet everything from network card to hard drive controller is EOL in EL8 and above on a Gen8.

thanks

On 1/3/23 08:04, Stephen John Smoogen wrote:

...
On Tue, 3 Jan 2023 at 06:19, Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 - swap on md126 - / on md127

My limited understanding is that RAID on EFI has been something of a hack as the backing store that EFI uses is a slightly modified VFAT. What happens is that there is some code to 'clone' the data across but it isn't really RAID1. My guess is that something in the 9.1 kernel broke that hack. Could you try CentOS Stream 9 kernel (you can install that with your existing Alma or Rocky system) and see if the problem still occurs? If it does then it is a bug that needs to be tracked upstream at bugzilla.redhat.com <http://bugzilla.redhat.com> and if it doesn't then it should have been fixed in an upcoming kernel. You could continue to then use the CS9 kernel until whatever works in Alma/Rocky 9

trimmed bottom.

-- Stephen J Smoogen. Let us be kind to one another, for most of us are fighting a hard battle. -- Ian MacClaren

_______________________________________________ AlmaLinux Users mailing list --users@lists.almalinux.org To unsubscribe send an email tousers-leave@lists.almalinux.org

Adrian Sevcenco

4 Jan 4 Jan

3:54 p.m.

On 03.01.2023 13:16, Alessandro Baggi wrote:

...

Hi Jack, I'm sorry to bother you during holidays.

I encountered a strange problem installing AlmaLinux 9.1 on a RAID1 (MDADM) with the current configuration:

- /boot/efi on md125 Note that any EFI modification by BIOS firmware (it can happen) or by EFI utilities (like firmware update, running memtest and saving report) will corrupt and degrade your raid. Moreover note that chainloading in grub does not work with virtual devices (like md raid).

by far the easiest and safest way is to have individual ESP partitions and let's say they are mounted to /boot/efi and /boot/efi2 just make a systemd unit with the form of: root@hal: ~ # cat /etc/systemd/system/esp_sync.service [Unit] Description=Sync ESP1 to ESP2 DefaultDependencies=no ConditionPathIsDirectory=/boot/efi/EFI/ ConditionPathIsDirectory=/boot/efi2/EFI/ After=final.target [Service] Type=oneshot ExecStart=/usr/bin/cp -af /boot/efi/EFI /boot/efi2/ [Install] WantedBy=multi-user.target (you could also use rsync if it's guaranteed to be present) and attach to this service a timer with [Timer] OnStartupSec=40 and/or a path unit with a specification like [Path] Unit=esp_sync.service PathModified=/boot/efi/EFI/almalinux also make sure that /boot/efi/EFI/almalinux/grub.cfg is a stub with a content like: [root@fst09 ~]# cat /boot/efi/EFI/almalinux/grub.cfg search --no-floppy --fs-uuid --set=dev f9c0f1b7-7f36-4b80-9563-6b2702b14c19 set prefix=($dev)/boot/grub2 export $prefix configfile $prefix/grub.cfg (you get the UUID from blkid output for the md device) this way, content of ESP is minimal and rarely changes AND you have a ESP fallback in the way of second ESP you can put ",nofail,errors=continue" for the mounting of ESP within system as ESP is not really needed for system running. HTH, Adrian

...

- swap on md126 - / on md127

disks are 2 SSD MLC Type.

After the installation, if I reboot the system I get:

"md: md125 stopped" (it is printed many times like in a loop)

alternated with:

"systemd-shutdown[1]: Not all MD devices stopped, 1 left Stopping MD Devices Stopping MD /dev/md125 (9:125)"

and the system hangs on this loop until I cut the power.

I encountered this issue on my Workstation with Asus Prime Z490-A / i9-10850k.

I tried with another workstation that runs on Asus Prime Z370-A / i7 8700K, to exclude bad SATA controller and bad cables.

Also on the second workstation the problem is found.

I tried to replicate this using 9.0 ISO. The problem does not occur until I update to 9.1

I tried also with 8.7. No problem here.

I tried also RockyLinux 9.1 and got the same problem but with different messages:

"block device autoconfig is deprecated and will removed"

alternated with:

"blkdev_get_no_open: 270 callbacks suppressed."

To stop the machine I need to cut the power.

I tried also Debian 11.5 without problems.

So seems that the problem is 9.1 related. Actually I can't test the same with RHEL 9.1 but probably the problem will occour also on RHEL 9.1

There is a way to fix this or I should wait an upgrade?

Thank you in advance. _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

-- ---------------------------------------------- Adrian Sevcenco, Ph.D. | Institute of Space Science - ISS, Romania | adrian.sevcenco at {cern.ch,spacescience.ro} | ----------------------------------------------

Robert 'Bobby' Zenz

6 Jan 6 Jan

3:01 a.m.

I'm seeing the same on a newly installed 9.1 installation. However, my setup is different: * /dev/md127 on / type xfs * /dev/sda2 on /boot type xfs * /dev/sda1 on /boot/efi type vfat * /dev/md125 on /home type xfs * /dev/md126 on /var type xfs Most curiously, the message I receive is *also* about md125 not being able to be stopped. So I don't think this is related to the UEFI partition at all, as I don't have those as RAID. The system is compromised of two 4TB Seagate IronWolf harddisks which are partitioned as seen above. Is there more information that I can and should provide?

Bruce Ferrell

3:39 a.m.

On 1/6/23 1:01 AM, Robert 'Bobby' Zenz wrote:

...

I'm seeing the same on a newly installed 9.1 installation. However, my setup is different:

* /dev/md127 on / type xfs * /dev/sda2 on /boot type xfs * /dev/sda1 on /boot/efi type vfat * /dev/md125 on /home type xfs * /dev/md126 on /var type xfs

Most curiously, the message I receive is *also* about md125 not being able to be stopped. So I don't think this is related to the UEFI partition at all, as I don't have those as RAID.

The system is compromised of two 4TB Seagate IronWolf harddisks which are partitioned as seen above.

Is there more information that I can and should provide? _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

Bobby, The *exact* error thrown would be incredibly helpful What is the output of these commands: cat /proc/mdstat mdadm --detail /dev/md125 This link: https://www.ducea.com/2009/03/08/mdadm-cheat-sheet/ will give you more information about what to expect to see and why

Robert 'Bobby' Zenz

5:49 a.m.

Error message is during shutdown as in the original mail:

...

md: md125 stopped

Or close enough, I currently can't stop the system. The message is spammed as fast as possible at as it seems. Output of the commands is as follow: # cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sda3[0] sdb1[1] 3756352512 blocks super 1.2 [2/2] [UU] bitmap: 2/28 pages [8KB], 65536KB chunk md126 : active raid1 sda5[0] sdb3[1] 58592256 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md127 : active raid1 sda4[0] sdb2[1] 73399296 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none> # mdadm --detail /dev/md125 /dev/md125: Version : 1.2 Creation Time : Sat Nov 19 14:58:50 2022 Raid Level : raid1 Array Size : 3756352512 (3.50 TiB 3.85 TB) Used Dev Size : 3756352512 (3.50 TiB 3.85 TB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Jan 6 11:21:50 2023 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : arkham:home (local to host arkham) UUID : ad23388f:4e227a6c:2b3d141a:7a5f2338 Events : 9621 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 17 1 active sync /dev/sdb1 As I've said, I currently can't stop the system to see if it still happens. But I've seen it at least twice after the initial setup (and after the RAID had finished its initial sync).

Alessandro Baggi

9 Jan 9 Jan

8:59 a.m.

Hi Bobby, I tried also RHEL 9.1 and got the same problem. Il 06/01/23 12:49, Robert 'Bobby' Zenz ha scritto:

...

Error message is during shutdown as in the original mail:

...
md: md125 stopped

Or close enough, I currently can't stop the system. The message is spammed as fast as possible at as it seems.

Output of the commands is as follow:

# cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sda3[0] sdb1[1] 3756352512 blocks super 1.2 [2/2] [UU] bitmap: 2/28 pages [8KB], 65536KB chunk

md126 : active raid1 sda5[0] sdb3[1] 58592256 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk

md127 : active raid1 sda4[0] sdb2[1] 73399296 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk

unused devices: <none>

# mdadm --detail /dev/md125 /dev/md125: Version : 1.2 Creation Time : Sat Nov 19 14:58:50 2022 Raid Level : raid1 Array Size : 3756352512 (3.50 TiB 3.85 TB) Used Dev Size : 3756352512 (3.50 TiB 3.85 TB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent

Intent Bitmap : Internal

Update Time : Fri Jan 6 11:21:50 2023 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0

Consistency Policy : bitmap

Name : arkham:home (local to host arkham) UUID : ad23388f:4e227a6c:2b3d141a:7a5f2338 Events : 9621

Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 17 1 active sync /dev/sdb1

As I've said, I currently can't stop the system to see if it still happens. But I've seen it at least twice after the initial setup (and after the RAID had finished its initial sync). _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

Alessandro Baggi

11 Jan 11 Jan

8:25 a.m.

Hi, I tried also CentOS Stream 9 and got the same problem. Il 09/01/23 15:59, Alessandro Baggi ha scritto:

...

Hi Bobby, I tried also RHEL 9.1 and got the same problem.

Il 06/01/23 12:49, Robert 'Bobby' Zenz ha scritto:

...
Error message is during shutdown as in the original mail:

> md: md125 stopped

Or close enough, I currently can't stop the system. The message is spammed as fast as possible at as it seems.

Output of the commands is as follow:

# cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sda3[0] sdb1[1] 3756352512 blocks super 1.2 [2/2] [UU] bitmap: 2/28 pages [8KB], 65536KB chunk md126 : active raid1 sda5[0] sdb3[1] 58592256 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md127 : active raid1 sda4[0] sdb2[1] 73399296 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none> # mdadm --detail /dev/md125 /dev/md125: Version : 1.2 Creation Time : Sat Nov 19 14:58:50 2022 Raid Level : raid1 Array Size : 3756352512 (3.50 TiB 3.85 TB) Used Dev Size : 3756352512 (3.50 TiB 3.85 TB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Jan 6 11:21:50 2023 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : arkham:home (local to host arkham) UUID : ad23388f:4e227a6c:2b3d141a:7a5f2338 Events : 9621 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 17 1 active sync /dev/sdb1

As I've said, I currently can't stop the system to see if it still happens. But I've seen it at least twice after the initial setup (and after the RAID had finished its initial sync). _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

Akemi Yagi

22 Feb 22 Feb

1:50 p.m.

Can you test-install elrepo's kernel-ml? It is currently at version 6.2.0. https://elrepo.org/linux/kernel/el9/x86_64/RPMS/ Akemi On Wed, Jan 11, 2023 at 6:28 AM Alessandro Baggi <alessandro.baggi@gmail.com> wrote:

...

Hi, I tried also CentOS Stream 9 and got the same problem.

Il 09/01/23 15:59, Alessandro Baggi ha scritto:

...
Hi Bobby, I tried also RHEL 9.1 and got the same problem.

Il 06/01/23 12:49, Robert 'Bobby' Zenz ha scritto:

...
Error message is during shutdown as in the original mail:

...
md: md125 stopped

Or close enough, I currently can't stop the system. The message is spammed as fast as possible at as it seems.

Output of the commands is as follow:

# cat /proc/mdstat Personalities : [raid1] md125 : active raid1 sda3[0] sdb1[1] 3756352512 blocks super 1.2 [2/2] [UU] bitmap: 2/28 pages [8KB], 65536KB chunk md126 : active raid1 sda5[0] sdb3[1] 58592256 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk md127 : active raid1 sda4[0] sdb2[1] 73399296 blocks super 1.2 [2/2] [UU] bitmap: 1/1 pages [4KB], 65536KB chunk unused devices: <none> # mdadm --detail /dev/md125 /dev/md125: Version : 1.2 Creation Time : Sat Nov 19 14:58:50 2022 Raid Level : raid1 Array Size : 3756352512 (3.50 TiB 3.85 TB) Used Dev Size : 3756352512 (3.50 TiB 3.85 TB) Raid Devices : 2 Total Devices : 2 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Fri Jan 6 11:21:50 2023 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Consistency Policy : bitmap Name : arkham:home (local to host arkham) UUID : ad23388f:4e227a6c:2b3d141a:7a5f2338 Events : 9621 Number Major Minor RaidDevice State 0 8 3 0 active sync /dev/sda3 1 8 17 1 active sync /dev/sdb1

As I've said, I currently can't stop the system to see if it still happens. But I've seen it at least twice after the initial setup (and after the RAID had finished its initial sync). _______________________________________________ AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

AlmaLinux Users mailing list -- users@lists.almalinux.org To unsubscribe send an email to users-leave@lists.almalinux.org

bw9677249＠gmail.com

17 Feb 17 Feb

7:04 a.m.

Fantastic post, many thanks for sharing, Visit <a href="https://www.slikmagazine.com/">Slik Magazine</a> to read more informative contents like, News, Business, Fashion, and Travel.

500

Age (days ago)

910

Last active (days ago)

List overview

Download

18 comments

9 participants

participants (9)

Adrian Sevcenco
Akemi Yagi
Alessandro Baggi
Bruce Ferrell
bw9677249＠gmail.com
Christopher Cox
Robert 'Bobby' Zenz
Robert Moskowitz
Stephen John Smoogen

AlmaLinux 9.1 Problem on MDADM raid1

tags

participants (9)