14:03:41 <andreasn> #startmeeting Cockpit weekly meeting 2016-12-05 14:03:41 <zodbot> Meeting started Mon Dec 5 14:03:41 2016 UTC. The chair is andreasn. Information about MeetBot at http://wiki.debian.org/MeetBot. 14:03:41 <zodbot> Useful Commands: #action #agreed #halp #info #idea #link #topic. 14:03:41 <zodbot> The meeting name has been set to 'cockpit_weekly_meeting_2016-12-05' 14:03:44 <andreasn> .hello andreasn 14:03:45 <zodbot> andreasn: andreasn 'Andreas Nilsson' <anilsson@redhat.com> 14:04:10 <dperpeet> .hello dperpeet 14:04:11 <zodbot> dperpeet: dperpeet 'None' <dperpeet@redhat.com> 14:05:04 <andreasn> #topic Agenda 14:05:43 <mvollmer> * Network checkpoints status 14:06:10 <andreasn> * NFS Server 14:07:13 <andreasn> maybe that's it. Ok, lets run with that 14:07:21 <andreasn> #topic Network checkpoints status 14:07:33 <mvollmer> okay 14:07:45 <mvollmer> so we had checkpoints for some time now 14:07:58 <mvollmer> and people are running into the "edge cases" 14:08:08 <andreasn> what kind of edge cases? 14:08:09 <mvollmer> which are of course some people's main case 14:08:31 <mvollmer> such as making changes that take longer than we allow 14:08:45 <andreasn> when does that happen? 14:08:46 <mvollmer> cockpit gives up after 15 seconds 14:08:59 <mvollmer> the one real case I have seen is where DHCP takes 40 seconds 14:09:06 <andreasn> ah, I see 14:09:07 <dperpeet> in retrospect, that does seem like a tight window for some cases 14:09:30 <mvollmer> yes, we should look at tuning the timeouts 14:09:48 <dperpeet> I would like to discuss a usability aspect of this 14:09:53 <mvollmer> what I have been looking at first is to have a path through the UI that works for any change, no matter how slow 14:09:55 <andreasn> so at 15 seconds it times out and rolls back, just because the DHCP is slow? 14:09:57 <dperpeet> but let me know when you have talked about what you wanted to say first 14:10:16 <dperpeet> andreasn, correct 14:10:25 <mvollmer> it's a tradeoff 14:10:51 <mvollmer> but since the UI is pretty clear about is going on during a checkpoint 14:11:01 <mvollmer> ("Testing connection") 14:11:18 <mvollmer> we can make that time longer without confusing people too much 14:11:39 <andreasn> is one minute a resonable time? 14:11:41 <mvollmer> and if people give up at that point and reload or just go away 14:11:46 <mvollmer> that's harmless 14:11:58 <mvollmer> since the rollback happens in any case 14:12:13 <mvollmer> andreasn, I have no idea 14:12:26 <dperpeet> well, that leads into my thought: can we trigger a rollback early? 14:12:26 <mvollmer> people are talking about DHCP taking several minutes 14:12:47 <dperpeet> i.e. more like an "undo" 14:13:01 <mvollmer> dperpeet, no, we are disconnected at that point 14:13:19 <dperpeet> how about we allow overriding the rollback 14:13:25 <dperpeet> but only make that work on purpose 14:13:27 <mvollmer> let me describe one more thing 14:13:30 <dperpeet> ok 14:13:31 <dperpeet> :) 14:13:52 <andreasn> but is the option between showing "testing connection" for everyone for 5 minutes, and having it fail for those with slow DHCP servers? 14:14:06 <mvollmer> so, one idea is to let the rollback happen, and then the user gets an opportunity to make the change anyway 14:14:18 <mvollmer> that was in the original design 14:14:39 <mvollmer> andreasn, yes 14:14:41 <dperpeet> mvollmer, that sounds good 14:14:48 <dperpeet> combined with a reasonable timeout on the first attempt 14:14:54 <dperpeet> I don't think anything >30 seconds makes sense there 14:15:16 <dperpeet> I seriously consider that my connection has died after 10-15 seconds 14:15:24 <dperpeet> I don't think I would wait for a minute 14:15:41 <mvollmer> except if you know that your DHCP is broken and is slow 14:15:49 <dperpeet> right 14:15:56 <dperpeet> but then I could have it fail 14:16:01 <dperpeet> double check my dhcp 14:16:10 <dperpeet> and tell cockpit to go without a rollback 14:16:13 <dperpeet> or specify a custom timeout 14:16:16 <dperpeet> on that second try 14:16:26 <dperpeet> the "try anyway" could have an input for a timeout 14:16:26 <mvollmer> uhh 14:17:12 <mvollmer> dperpeet, good idea, but that's appraoching a spaceship cockpit, no? 14:17:21 <dperpeet> hm 14:17:31 <mvollmer> one more thing: 14:17:58 <mvollmer> if a rollback is really slow, cockpit used to timeout, and you wouldn't get a second try 14:18:00 <andreasn> setting the timeout time feels very fiddly, because I assume there is no good way to measure the speed of the DHCP-server 14:18:17 <dperpeet> I believe that valid workflows should have precedence 14:18:26 <dperpeet> so if you do the right thing, even if it's slow, it should work 14:18:47 <dperpeet> on the other hand, you could say that preserving the connection is also pretty important 14:18:57 <mvollmer> we could just increase the rollback timeout with every try 14:19:00 <dperpeet> because otherwise you might not be able to "get back in" 14:19:04 <mvollmer> so first try, 15 seconds 14:19:13 <mvollmer> if the user presses "Do it anyway" 14:19:14 <dperpeet> but that runs into cockpit disconnect, right? 14:19:20 <mvollmer> we use a timeout of 90 seconds 14:19:39 <andreasn> could you have the system just increase the time between several tests by itself? 14:19:55 <dperpeet> andreasn, I don't think that's a good idea 14:20:01 <andreasn> it starts by testing for 15 seconds. Realizes things doesn't work, starts a 30 sec test etc. 14:20:05 <dperpeet> usually it's probably actually a wrong setting 14:20:10 <dperpeet> you'd just be disconnected for ages 14:20:13 <andreasn> right 14:20:15 <mvollmer> andreasn, it's not the time between testing the connection, but before rolling back the change 14:20:22 <andreasn> ah, I see 14:20:27 <andreasn> sorry for the confusion 14:20:40 <dperpeet> mvollmer, I like increasing on the second try 14:20:47 <dperpeet> and just give up if 90 seconds don't work 14:20:58 <dperpeet> or keep increasing, but display that time 14:21:01 <dperpeet> so the user knows what to expect 14:21:13 <mvollmer> we could have three tries: one with 15 seconds 14:21:29 <mvollmer> fails -> "This didn't work, would you like to try again and wait a bit longer?" 14:21:48 <mvollmer> also fails -> "This didn't work, would you like to do it without any timeout" 14:22:23 <andreasn> "Test again" 14:22:49 <andreasn> so that would make it a 3rd button? 14:24:27 <mvollmer> no, a second dialog 14:24:35 <mvollmer> with a slightly different wording 14:24:57 <mvollmer> the one we have says "This will disconnect you" 14:25:15 <mvollmer> the new one would say "This looks like it might disconnect you" 14:25:28 <mvollmer> not sure if this is worth it 14:25:37 <andreasn> it's worth a shot 14:25:55 <dperpeet> the difference should be obvious 14:25:59 <andreasn> but how does the system know what dialog to trigger? 14:26:01 <mvollmer> let's recap why we don't just increase the timeout to 5 minutes 14:26:27 <mvollmer> andreasn, they would always come in order, first the weak one, then the hard one 14:26:41 <andreasn> but how do you trigger the hard one? 14:26:41 <dperpeet> if something is obviously broken, we don't want to wait for a long time to have the system roll back 14:26:51 <mvollmer> we want people to know that there is a good chance that the connection comes back 14:27:03 <dperpeet> mvollmer, my comment was for the recap 14:27:22 <mvollmer> dperpeet, can you repeat? 14:27:32 <dperpeet> if something is obviously broken, we don't want to wait for a long time to have the system roll back 14:27:46 <mvollmer> and why not? :-) 14:27:51 <dperpeet> therefore we shouldn't have a very long timeout to begin with 14:28:06 <mvollmer> because people would panic and start driving to the datacenter 14:28:13 <dperpeet> why should I have to wait minutes if I hit a wrong button 14:28:25 <mvollmer> as punishment? 14:28:38 <dperpeet> ... 14:28:44 <dperpeet> I'll let andreas answer that one 14:28:49 <dperpeet> why we don't want to punish users 14:28:53 <mvollmer> this should be rare, and if you really switch off the wrong interface, it doesn't matter much to wait a few minutes 14:29:00 <dperpeet> I think it does 14:29:06 <mvollmer> if you know what is going on 14:29:10 <dperpeet> I expect a web ui to be responsive 14:29:15 <andreasn> so five minutes is a long time to wait just because you hit the wrong button 14:29:28 <andreasn> in every single case 14:29:38 <mvollmer> what about 90 seconds? 14:30:04 <andreasn> the whole interaction would feel long, unresponsive, it would give the feeling that the server is annoying and is working against you 14:30:09 * mvollmer is devils advocate 14:30:23 <dperpeet> I don't think anything >30 seconds is good for the first try 14:30:39 <mvollmer> so we would lose the "this is awesome" feeling 14:30:45 <andreasn> yes 14:30:56 <mvollmer> okay, I am happy to hear that 14:31:05 <mvollmer> i agree, of course 14:31:23 <andreasn> is there an issue open about this? 14:31:47 <mvollmer> yes and no 14:32:03 <mvollmer> people have trouble with the 15 second timeout 14:32:18 <mvollmer> and they are asking for a 90 second timeout 14:32:39 <andreasn> that's a minute and a half, right? 14:32:43 <andreasn> hm 14:32:44 <dperpeet> mvollmer, mod_proxy proxy workers have a default disconnect timeout of 30 seconds, fyi 14:33:19 <dperpeet> mvollmer, and the apache server I think defaults to 60 seconds 14:33:19 <mvollmer> I am offering this: https://github.com/cockpit-project/cockpit/pull/5472 14:33:35 <mvollmer> NetworkManager times out DHCP after 45 seconds 14:34:20 <mvollmer> so, this is confusing (or I make it confusing) 14:34:28 <mvollmer> thanks for the feedback on the timout tuning 14:34:44 <mvollmer> let's take the rest off-line if there are more question 14:34:46 <mvollmer> okay? 14:34:48 <dperpeet> agreed 14:34:51 <andreasn> sounds good 14:34:59 <andreasn> tricky questions, but good to discuss 14:35:00 <mvollmer> one more thing 14:35:30 <mvollmer> checkpoints have some bugs that make it look as if Cockpit simply misconfigures everything 14:35:40 <mvollmer> so the rollback is not perfect 14:36:05 <mvollmer> so I'll switch checkpoints off for complicated things like creating bonds 14:36:16 <dperpeet> yes, I think that's good for the time being 14:36:45 <dperpeet> although we need to consider how we make users aware which action is rollback protected and which ones aren't 14:36:59 <dperpeet> or at least make sure we don't make it sound like everything will be rolled back 14:36:59 <mvollmer> do we? 14:37:39 <mvollmer> there is no indication of checkpoints/rollback in the UI until they actually hit 14:38:26 <mvollmer> or do you think that people will get used to having their asses saved that they get careless? 14:38:27 <dperpeet> it's ok right noiw 14:38:29 <dperpeet> now 14:38:39 <dperpeet> we just have to watch release note wording 14:38:48 <mvollmer> alright, yes. 14:39:00 <dperpeet> what we wrote so far works 14:39:07 <dperpeet> I'm just saying to keep it in mind 14:39:09 <andreasn> it's pretty cool 14:39:15 <dperpeet> yup 14:40:22 <mvollmer> topic timeout? 14:40:37 <andreasn> hahaha 14:40:39 <andreasn> yes 14:40:51 <andreasn> #topic NFS Server 14:41:25 <github> [cockpit] mvollmer opened pull request #5554: test: Fix race in check-storage-luks (master...storaged-fix-luks-password-race) https://git.io/v18Xd 14:41:39 <andreasn> so based on the work made by dperpeet, sgallagh and others in the Fedora Server Group, I started working on the Cockpit part of it 14:41:41 <andreasn> https://docs.google.com/document/d/1jLyKsECdHdlKltmHGgf_-iOKj-hj4Qjbh5Zgm7a-eMc/edit 14:41:56 <andreasn> https://github.com/cockpit-project/cockpit/wiki/Feature:-NFS-Server 14:42:08 <andreasn> mostly collecting prior art right now 14:43:09 <andreasn> if anyone know of any good NFS UIs, feel free to add them to that page 14:43:24 <andreasn> going to distill the requirements into stories next 14:43:33 <andreasn> and I think that was it on that 14:43:53 <larsu> haha, "good nfs uis" 14:44:13 <dperpeet> :) 14:44:18 <andreasn> file sharing UIs is a better description maybe 14:44:31 <andreasn> there are some NAS ones that are pretty all right 14:44:40 <larsu> right 14:45:08 <larsu> just making a little stab towards the complexity of nfs :) 14:45:12 <dperpeet> heh 14:45:26 <dperpeet> I hope we can make it work without using the wizard pattern 14:45:35 <larsu> haha 14:45:36 <andreasn> me too 14:45:44 <andreasn> oh yes, nfs specs, ugh :) 14:46:46 <dperpeet> andreasn, do you think we can make this work in iterations? 14:46:56 <andreasn> probably 14:47:00 <dperpeet> or do you want to have a pretty good overall picture early on 14:47:15 <andreasn> overall is good to have from a design perspective 14:47:26 <andreasn> but implementation can happen in steps 14:47:29 <dperpeet> yeah, I agree - we shouldn't miss any big stuff 14:47:51 <dperpeet> you can probably ping the server list pretty early with a first iteration 14:47:58 <dperpeet> so we can run it by everyone 14:48:04 <dperpeet> and see if we missed anything obvious 14:48:11 <andreasn> sounds good 14:48:51 <andreasn> all right, I think that's it for that 14:48:55 <andreasn> #topic Open floor 14:49:10 <mvollmer> tomorrow is holiday in Finland 14:49:21 <mvollmer> 99th birthday of the country 14:50:34 <andreasn> nice 14:50:40 <andreasn> happy birthday, Finland 14:51:36 <andreasn> all right, I guess that's all 14:51:49 <andreasn> thanks everyone! 14:51:53 <dperpeet> thanks! 14:52:52 <andreasn> #endmeeting